Databricks
9 TopicsAnnouncing the new Databricks Job activity in ADF!
We’re excited to announce that Azure Data Factory now supports the orchestration of Databricks Jobs! Databrick Jobs allow you to schedule and orchestrate a task or multiple tasks in a workflow in your Databricks workspace. Since any operation in Databricks can be a task, this means you can now run anything in Databricks via ADF, such as serverless jobs, SQL tasks, Delta Live Tables, batch inferencing with model serving endpoints, or automatically publishing and refreshing semantic models in the Power BI service. And with this new update, you’ll be able to trigger these workflows from your Azure Data Factory pipelines. To make use of this new activity, you’ll find a new Databricks activity under the Databricks activity group called Job. Once you’ve added the Job activity (Preview) to your pipeline canvas, you can connect to your Databricks workspace and configure the settings to select your Databricks job, allowing you to run the Job from your pipeline. We also know that allowing parameterization in your pipelines is important as it allows you to create generic reusable pipeline models. ADF continues to provide support for these patterns and is excited to extend this capability to the new Databricks Job activity. Under the settings of your Job activity, you’ll also be able to configure and set parameters to send to your Databricks job, allowing maximum flexibility and power for your orchestration jobs. To learn more, read Azure Databricks activity - Microsoft Fabric | Microsoft Learn. Have any questions or feedback? Leave a comment below!5KViews1like2CommentsAzure Databricks - SQL query - Configuration not available
I spun up a FINO's Legend Studio instance locally, and I was able to establish a connectivity between the application and my Azure Databricks resource. However, when I run a SQL query from Legend Studio, which is supposed to execute on Databricks, I get a "Configuration legend_databricks_http_path is not available" error from Databricks: By going to the "Query History" on Azure Databricks, I can confirm Legend Studio is reaching Databricks, but this is responding with the error mentioned above. The "See error" button doesn't provide any additional error details. Is anyone familiar with the "Configuration is not available" type of error in Azure Databricks SQL queries?Solved283Views0likes2CommentsData archiving of delta table in Azure Databricks
Hi all, Currently I am researching on data archiving for delta table data on Azure platform as there is data retention policy within the company. I have studied the documentation from Databricks official (https://docs.databricks.com/en/optimizations/archive-delta.html) which is about archival support in Databricks. It said "If you enable this setting without having lifecycle policies set for your cloud object storage, Databricks still ignores files based on this specified threshold, but no data is archived." Therefore, I am thinking how to configure the lifecycle policy in azure storage account. I have read the documentation on Microsoft official (https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview) Let say the delta table data are stored in "test-container/sales" and there are lots of "part-xxxx.snappy.parquet" data file stored in that folder. Should I simply specify "tierToArchive", "daysAfterCreationGreaterThan: 1825", "prefixMatch: ["test-container/sales"]? However, I am worried that will this archive mechanism impact on normal delta table operation? Besides, I am worried that what if the parquet data file moved to archive tier contains both data created before 5 years and after 5 years, it is possible? Will it by chance move data earlier to archive tier before 5 years? Highly appreciate if someone could help me out with the questions above. Thanks in advance.310Views0likes1CommentHarnessing Retail Data with Azure: Integrating Blob Storage and Databricks for Advanced Analytics
Learn how a retail company leverages Azure Blob Storage and Azure Databricks to store, process, and analyze its massive sales data. You will see how the company uses PySpark to transform data into insights that help them optimize their product strategy and marketing campaigns. You will also find some learning resources to help you get started with data engineering on Microsoft Azure.1.5KViews0likes0CommentsEmpowering Startups: The Introductory Guide to Databricks for Entrepreneur's Data-Driven Success
Unlock the key to entrepreneurial success with Databricks—a journey where data empowers startups to thrive. Get ready to embark on a transformative quest for data-driven excellence!3.4KViews2likes0CommentsLoading Parquet and Delta files into Azure Synapse using ADB or Azure Synapse?
I have a below case scenario. We are using Azure Databricks to pull data from several sources and generate the Parquet and Delta files and loaded them into our ADLS Gen2 Containers. We are now planning to create our data warehouse inside Azure Synapse SQL Pools, where we will create external tables for dimension tables which will use delta files and hash distributed fact tables using Parquet files. Now, the question is, to automate this data warehousing loading activity, which method is better? Is it better to use Azure Databricks to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools (or) is it better to use Azure Synapse to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools. Please help.681Views0likes1CommentTrain your Model on Spark/Databricks, score it on ADX
Are you using Spark/Databricks to build Machine Learning models? Do you need to score new data that is streamed into Azure Data Explorer? If this is your scenario please read on! In this blog we show how to train an ML model on Azure Databricks, export it to ADX, and score new samples directly on ADX, in near real time, using inline Python code embedded in KQL query.6.2KViews2likes4CommentsGetting started on Azure
I work with large dataset and I am just getting started on learning Azure. I am famaliar with Python and Powerbi. I am planning to integrate Synapse and Databricks for anaalytics and visualisation using Powerbi. What books do you recommend for me to understand these modules?1.2KViews0likes1Comment