azure databricks

95 Topics

How Great Engineers Make Architectural Decisions — ADRs, Trade-offs, and an ATAM-Lite Checklist
Why Decision-Making Matters Without a shared framework, context fades and teams' re-debate old choices. ADRs solve that by recording the why behind design decisions — what problem we solved, what options we considered, and what trade-offs we accepted. A good ADR: Lives next to the code in your repo. Explains reasoning in plain language. Survives personnel changes and version history. Think of it as your team’s engineering memory. The Five Pillars of Trade-offs At Microsoft, we frame every major design discussion using the Azure Well-Architected pillars: Reliability – Will the system recover gracefully from failures? Performance Efficiency – Can it meet latency and throughput targets? Cost Optimization – Are we using resources efficiently? Security – Are we minimizing blast radius and exposure? Operational Excellence – Can we deploy, monitor, and fix quickly? No decision optimizes all five. Great engineers make conscious trade-offs — and document them. A Practical Decision Flow Step What to Do Output 1. Frame It Clarify the problem, constraints, and quality goals (SLOs, cost caps). Problem statement 2. List Options Identify 2-4 realistic approaches. Options list 3. Score Trade-offs Use a Decision Matrix to rate options (1–5) against pillars. Table of scores 4. ATAM-Lite Review List scenarios, identify sensitivity points (small changes with big impact) and risks. Risk notes 5. Record It as an ADR Capture everything in one markdown doc beside the code. ADR file Example: Adding a Read-Through Cache Decision: Add a Redis cache in front of Cosmos DB to reduce read latency. Context: Average P95 latency from DB is 80 ms; target is < 15 ms. Options: A) Query DB directly B) Add read-through cache using Redis Trade-offs Performance: + Massive improvement in read speed. Cost: + Fewer RU/s on Cosmos DB. Reliability: − Risk of stale data if cache invalidation fails. Operational: + Added complexity for monitoring and TTLs. Templates You Can Re-use ADR Template # ADR-001: Add Read-through Cache in Front of Cosmos DB Status: Accepted Date: 2025-10-21 Context: High read latency; P95 = 80ms, target <15ms Options: A) Direct DB reads B) Redis cache for hot keys ✅ Decision: Adopt Redis cache for performance and cost optimization. Consequences: - Improved read latency and reduced RU/s cost - Risk of data staleness during cache invalidation - Added operational complexity Links: PR#3421, Design Doc #204, Azure Monitor dashboard Decision Matrix Example Pillar Weight Option A Option B Notes Reliability 5 3 4 Redis clustering handles failover Performance 4 2 5 In-memory reads Cost 3 4 5 Reduced RU/s Security 4 4 4 Same auth posture Operational Excellence 3 4 3 More moving parts Weighted total = Σ(weight × score) → best overall score wins. Team Guidelines Create a /docs/adr folder in each repo. One ADR per significant change; supersede old ones instead of editing history. Link ADRs in design reviews and PRs. Revisit when constraints change (incidents, new SLOs, cost shifts). Publish insights as follow-up blogs to grow shared knowledge. Why It Works This practice connects the theory of trade-offs with Microsoft’s engineering culture of reliability and transparency. It improves onboarding, enables faster design reviews, and builds a traceable record of engineering evolution. Join the Conversation Have you tried ADRs or other decision frameworks in your projects? Share your experience in the comments or link to your own public templates — let’s make architectural reasoning part of our shared language.
Antony_nganga
Oct 21, 2025 Place Azure Architecture Blog
355Views
0likes
0Comments
Secure Delta Sharing Between Databricks Workspaces Using NCC and Private Endpoints
This guide walks you through the steps to share Delta tables between two Databricks workspaces (NorthCentral and SouthCentral) and configure Network Connectivity Configuration (NCC) for a Serverless Warehouse. These steps ensure secure data sharing and connectivity for your workloads. Part 1: Delta Sharing Between Workspaces Access Delta Shares From your NorthCentral Workspace, go to Catalog. Hover over Delta Shares Received. When the icon appears, click it. → This will redirect you to the Delta Sharing page. Create a New Recipient On the Delta Sharing page, click Shared by me. Click New Recipient. Fill in the details: Recipient Name: (Enter your recipient name) Recipient Type: Select Databricks Sharing Identifier: azure:southcentralus:3035j6je88e8-91-434a-9aca-e6da87c1e882 To get the sharing identifier using a notebook or Databricks SQL query: (SQL) SELECT CURRENT_METASTORE(); Click Create. Share Data Click "Share Data". Enter a Share Name. Select the data assets you want to share. Note: Please disable History for the selected data assets, as the current data snapshot. Disabling the History option on the Delta Share will simplify the share and prevent unnecessary access to historical versions. Additionally, review whether you can further simplify your share by partitioning the data where appropriate. Add the recipient's name you created earlier. Click Share Data. Add Recipient From the newly created share, click Add Recipient. Select your South-Central Workspace Metastore ID. South-CentralWorkspace In your South-Central Workspace, navigate to the Delta Sharing page. Under Shared with me tab, locate your newly created share and click on it. Add the share to a catalog in Unity Catalog. Part 2: Enable NCC for Serverless Warehouse 6. Add Network Connectivity Configuration (NCC) Go to the Databricks Account Console: https://accounts.azuredatabricks.net/ Navigate to Cloud resources, click Add Network Connectivity Configuration. Fill in the required fields and create a new NCC for SouthCentral. 7. Associate NCC with Workspace In the Account Console, go to Workspaces. Select your SouthCentral workspace, click Update Workspace. From the Network Connectivity Configuration dropdown, select the NCC you just created. 8. Add Private Endpoint Rule In Cloud resources, select your NCC, select Private Endpoint Rules and click Add Private Endpoint Rule. Provide: Resource ID: Enter your Storage Account Resource ID in NorthCentral. Note: This can be found in your storage account (NorthCentral). Click on “JSON View” top right. Azure Subresource type: dfs & blob. 9. Approve Pending Connection Go to your NorthCentral Storage Account, Networking, Private Endpoints. You will see a Pending connection from Databricks. Approve the connection and you will see the Connection status in your Account Console as ESTABLISHED. You will now see your share listed under “Delta Shares Received” Note: If you cannot view your share, run the following SQL command: GRANT USE_PROVIDER ON METASTORE TO `username@xxxx.com`.
Rafia_Aqil
Oct 18, 2025 Place Analytics on Azure Blog
244Views
1like
0Comments
SAP Business Data Cloud Connect with Azure Databricks is now generally available
We are excited to share that SAP Business Data Cloud (SAP BDC) Connect for Azure Databricks is generally available. With this announcement, Azure Databricks customers like you, can connect your SAP BDC environment to your existing Azure Databricks instance – without copying the data – to enable bi-directional, live data sharing. Connecting SAP data with other enterprise data prevents governance risk, compliance gaps, and data silos. In addition, maintenance costs are also reduced and manual building of semantics is no longer needed. SAP data products can now be shared directly via Delta Sharing into your existing Azure Databricks instances ensuring complete context for your business. You can now unify your data estate across Azure Databricks and SAP BDC This makes it easier for you to: Enforce governance Power analytics, data warehousing, BI and AI Connecting SAP BDC to Azure Databricks is simple, secure, and fast. The connection is trusted and requires approval from both platforms to enable bi-directional sharing of data products. Once approved, data products in SAP BDC can be directly mounted into Azure Databricks Unity Catalog and are treated like other assets shared using Delta sharing. As a result, your teams can query, analyze, and gather insights on SAP data in addition to your existing business data in one unified way. Instead of spending time gathering the data in once place, your teams can instead focus on unlocking insights from this unified data quickly and securely. This launch complements SAP Databricks in SAP BDC running on Azure that enables AI, ML, data engineering, and data warehousing capabilities directly inside your SAP environment. We have expanded the list of supported regions for SAP Databricks on SAP BDC running on Azure. To learn more with SAP BDC Connect with Azure Databricks review documentation and get started today.
AnaviNahar
Oct 15, 2025 Place Analytics on Azure Blog
1KViews
1like
0Comments
How Azure NetApp Files Object REST API powers Azure and ISV Data and AI services – on YOUR data
This article introduces the Azure NetApp Files Object REST API, a transformative solution for enterprises seeking seamless, real-time integration between their data and Azure's advanced analytics and AI services. By enabling direct, secure access to enterprise data—without costly transfers or duplication—the Object REST API accelerates innovation, streamlines workflows, and enhances operational efficiency. With S3-compatible object storage support, it empowers organizations to make faster, data-driven decisions while maintaining compliance and data security. Discover how this new capability unlocks business potential and drives a new era of productivity in the cloud.
GeertVanTeylingen
Oct 14, 2025 Place Azure Architecture Blog
411Views
0likes
0Comments
Data Vault 2.0 Warehouse Automation on Azure
This is the series of 'Blog Articles' on the topic "Data Vault 2.0 on Azure" where we start from 'What?' and then slowly dwell into 'How To?' implement DV 2.0 on Azure Data Platform Technologies.
Naveed-Hussain
Oct 10, 2025 Place Analytics on Azure Blog
9.1KViews
0likes
1Comment
Secure Medallion Architecture Pattern on Azure Databricks (Part I)
This article presents a security-first pattern for Azure Databricks: a Medallion Architecture where Bronze, Silver and Gold each run as their Lakeflow Job and cluster, orchestrated by a parent job. Run-as identities are Microsoft Entra service principals; storage access is governed via Unity Catalog External Locations backed by the Access Connector’s managed identity. Least-privilege is enforced with cluster policies and UC grants. Prefer managed tables to unlock Predictive Optimisation, Automatic liquid clustering and Automatic statistics. Secrets live in Azure Key Vault and are read at runtime. Monitor reliability and cost with system tables and Jobs UI. Part II covers more low-level concepts and CI/CD.
mscagliola
Oct 08, 2025 Place Analytics on Azure Blog
669Views
5likes
0Comments
Securing Azure Databricks Serverless: Practical Guide to Private Link Integration
The Challenge: Outbound Control in a Serverless World Serverless compute resources run in the serverless compute plane, and is managed by Microsoft, for ease of use. Databricks serverless provides hassle-free compute for running notebooks, jobs, and pipelines and by default, outbound traffic can reach the internet and other networks freely. One of the most common security requirements for customers in the financial and government sectors is to have the ability to retain network paths within a private network for their users to access their data and for system integration. Solution Objective Enforce deny-by-default posture: Control outbound access with granular precision by enabling a deny-by-default policy for internet. By default, all outbound access is blocked unless explicitly allowed via Private Endpoint Rules. Control outbound connections: by specifying allowed locations, connections, FQDN Enforce the traffic to go over the customer network for traffic controls and inspection Solution Overview The solution is designed to route Databricks Serverless outbound traffic to the customer customer-managed Policy Enforcement Point (e.g. Azure Firewall), to allow the customer to securely connect to services hosted on the cloud without exposing the data to the public internet. Essentially, it establishes a private, secure connection between Databricks Control Plane to the customer virtual network. Pre-Requisites Azure Firewall Deploy an Azure Firewall if you don’t already have one. Virtual Networks and Subnets Create a VNET for Databricks and Load Balancer deployment. Set up subnets for the Azure Standard Load Balancer frontend (e.g., 10.0.2.0/26) and backend (e.g., 10.0.2.64/26). Enable Private Endpoint network policy for Network Security Groups and Route Tables on the backend subnet. VNET Peering Peer the Databricks VNET with your hub VNET to allow secure routing. Azure Databricks Workspace Deploy an Azure Databricks workspace if you don’t have one. Follow the official Azure Databricks documentation for detailed steps on creating workspaces and private endpoints Summary of Steps Deploy Azure Firewall and Networking Set up an Azure Firewall and create the necessary virtual networks (VNets) and subnets for your environment. Peer the Databricks VNet with your hub VNet to enable secure routing. Configure the Azure Load Balancer Create an internal Standard Load Balancer. Set up frontend and backend pools using NICs (not IP addresses). Add load balancing rules and configure a health probe (typically HTTP on port 8082). Create a Private Link Service Deploy the Private Link Service behind the load balancer. Associate it with the correct frontend and backend subnets. Set Up Route Tables Create route tables to direct backend VM traffic to the Azure Firewall. Ensure the route tables are associated with the correct subnets (e.g., backend subnet for the router VM). Deploy and Configure the Router VM Deploy a Linux VM to act as a router. Enable IP forwarding on the VM and in Azure settings. Configure IPTables for NAT and traffic forwarding. Install and configure NGINX to serve as a health probe for the load balancer. Configure Network Security Groups (NSGs) Set up NSGs to allow necessary traffic (SSH, load balancer, HTTP/HTTPS, health probe) to and from the router VM. Configure Azure Firewall Application Rules Define application rules to allow outbound access only to approved FQDNs (e.g., microsoft.com). Block all other outbound traffic by default. Configure Databricks Account Portal Enable outbound (serverless) Azure Private Link to customer-managed resources in the Databricks Account Portal. Create Network Connectivity Configurations (NCCs) and attach them to your workspaces. Add private endpoint rules for each Azure resource you want to allow access to. Approve Private Endpoints In the Azure Portal, approve the private endpoint connections created by Databricks for your resources. Troubleshooting Use tools like netstat, conntrack, and tcpdump on the router VM to diagnose routing issues. Double-check route table and NSG associations. Validate private endpoint rule configurations in both Databricks and the Azure Portal. References Serverless compute plane networking - Azure Databricks | Microsoft Learn Configure private connectivity to Azure resources - Azure Databricks | Microsoft Learn Key Takeaway This solution enforces a deny-by-default posture for outbound traffic from Azure Databricks Serverless, only allowing explicitly approved connections via Private Endpoints. All traffic is routed through your network for inspection and control, helping you meet strict compliance and security requirements. Ready to Get Started? Securing your Databricks Serverless environment doesn’t have to be daunting. With Azure Private Link, Azure Firewall, and a smart configuration, you get the best of both worlds: agility and airtight security. For more details, check out the official Azure Databricks documentation and start building your secure analytics platform today. Questions or want to share your experience? Drop a comment below or reach out to the Azure Databricks community.
alescardoso
Sep 25, 2025 Place Analytics on Azure Blog
321Views
0likes
0Comments
Approaches to Integrating Azure Databricks with Microsoft Fabric: The Better Together Story!
Azure Databricks and Microsoft Fabric can be combined to create a unified and scalable analytics ecosystem. This document outlines eight distinct integration approaches, each accompanied by step-by-step implementation guidance and key design considerations. These methods are not prescriptive—your cloud architecture team can choose the integration strategy that best aligns with your organization’s governance model, workload requirements and platform preferences. Whether you prioritize centralized orchestration, direct data access, or seamless reporting, the flexibility of these options allows you to tailor the solution to your specific needs.
Rafia_Aqil
Sep 19, 2025 Place Analytics on Azure Blog
1.2KViews
6likes
1Comment
Announcing the new Databricks Job activity in ADF!
We’re excited to announce that Azure Data Factory now supports the orchestration of Databricks Jobs! Databrick Jobs allow you to schedule and orchestrate a task or multiple tasks in a workflow in your Databricks workspace. Since any operation in Databricks can be a task, this means you can now run anything in Databricks via ADF, such as serverless jobs, SQL tasks, Delta Live Tables, batch inferencing with model serving endpoints, or automatically publishing and refreshing semantic models in the Power BI service. And with this new update, you’ll be able to trigger these workflows from your Azure Data Factory pipelines. To make use of this new activity, you’ll find a new Databricks activity under the Databricks activity group called Job. Once you’ve added the Job activity (Preview) to your pipeline canvas, you can connect to your Databricks workspace and configure the settings to select your Databricks job, allowing you to run the Job from your pipeline. We also know that allowing parameterization in your pipelines is important as it allows you to create generic reusable pipeline models. ADF continues to provide support for these patterns and is excited to extend this capability to the new Databricks Job activity. Under the settings of your Job activity, you’ll also be able to configure and set parameters to send to your Databricks job, allowing maximum flexibility and power for your orchestration jobs. To learn more, read Azure Databricks activity - Microsoft Fabric | Microsoft Learn. Have any questions or feedback? Leave a comment below!
Noelle_Li
Sep 16, 2025 Place Azure Data Factory Blog
5.1KViews
1like
2Comments
General Availability: Automatic Identity Management (AIM) for Entra ID on Azure Databricks
In February, we announced that Automatic Identity Management in public preview and loved to hear your overwhelmingly positive feedback. Prior to public preview, you either had to set up an Entra Enterprise Application or involve an Azure Databricks account admin to import the appropriate groups. This required manual steps whether it was adding or removing users with organizational changes, maintaining scripts, or requiring additional Entra or SCIM configuration. Identity management was thus cumbersome and required management overhead. Today, we are excited to announce that Automatic Identity management (AIM) for Entra ID on Azure Databricks is generally available. This means no manual user setup is needed and you can instantly add users to your workspace(s). Users, groups, and service principals from Microsoft Entra ID are automatically available within Azure Databricks, including support for nested groups and dashboards. This native integration is one of the many reasons Databricks runs best on Azure. Here are some addition ways AIM could benefit you and your organization: Seamlessly share dashboards You can share AI/BI dashboards with any user, service principal, or group in Microsoft Entra ID immediately as these users are automatically added to the Azure Databricks account upon login. Members of Microsoft Entra ID who do not have access to the workspace are granted access to a view-only copy of a dashboard published with embedded credentials. This enables you to share dashboards with users outside your organization, too. To learn more, see share a dashboard. Updated defaults for new accounts All new Azure Databricks accounts have AIM enabled – no opt in or additional configuration required. For existing accounts, you can enable AIM with a single click in the Account Admin Console. Soon, we will also make this the default for existing accounts. Automation at scale enabled via APIs You can also register users, groups, or service principles in Microsoft Entra ID via APIs. Being able to do this programmatically enables the enterprise scale most of our customers need. You can also enable automation via scripts leveraging these APIs. Read the Databricks blog here and get started via documentation today!
AnaviNahar
Sep 10, 2025 Place Analytics on Azure Blog
1.3KViews
1like
0Comments