In today’s data-driven world, securing data pipelines is crucial to protect sensitive information and ensure compliance with regulatory requirements. Microsoft Fabric Data Factory experience (FDF) offers a robust set of security features to safeguard your data as it moves through various stages of your data workflows. In this post, we’ll explore key security features in FDF and demonstrate how to implement them with a practical example.
Key Security Features in Fabric Data Factory
Before diving into the implementation, let’s take a look at the primary security mechanisms that Fabric Data Factory provides:
- Authentication: Fabric data factory uses on Microsoft Entra ID to authenticate users (or service principals). When authenticated, users receive access tokens from Microsoft Entra ID. Fabric uses these tokens to perform operations in the context of the user.
- Authorization: All Fabric permissions are stored centrally by the metadata platform. Fabric services query the metadata platform on demand in order to retrieve authorization information and to authorize and validate user requests.
- Data Encryption:
Data at rest: All Fabric data stores are encrypted at rest by using Microsoft-managed keys. Fabric data includes customer data as well as system data and metadata.
Data at transit: Data in transit between Microsoft services is always encrypted with at least TLS 1.2. Fabric negotiates to TLS 1.3 whenever possible. Traffic between Microsoft services always routes over the Microsoft global network. - Managed Identities: A Fabric workspace identity is an automatically managed service principal that can be associated with a Fabric workspace. Fabric workspaces with a workspace identity can securely read or write to firewall-enabled Azure Data Lake Storage Gen2 accounts through trusted workspace access for OneLake shortcuts. Fabric items can use the identity when connecting to resources that support Microsoft Entra authentication. Fabric uses workspace identities to obtain Microsoft Entra tokens without the customer having to manage any credentials.
- Key Vault Integration: unfortunately, as of today Key vault integration is not supported in Dataflow Gen 2 / data pipeline connections in Fabric.
- Network Security: When you connect to Pipeline via private link, you can use the data pipeline to load data from any data source with public endpoints into a private-link-enabled Microsoft Fabric Lakehouse. Customers can also author and operationalize data pipelines with activities, including Notebook and Dataflow activities, using the private link. However, copying data from and into a Data Warehouse isn't currently possible when Fabric's private link is enabled.
Now, let’s walk through an example that demonstrates how to secure data pipeline in Fabric Data Factory (FDF).
Example: Securing a Data Pipeline in Fabric Data Factory
Scenario:
You are setting up a data pipeline that moves sensitive data from ADLS gen 2 to Fabric warehouse. To ensure that this pipeline is secure, you will:
- Use a managed identity to authenticate Fabric data factory
- Configure trusted workspace access in ADLS Gen2
- Copy data from ADLS to Fabric Lakehouse using secured pipeline
Prerequisites:
- Tools and Technologies Needed:
- Azure Data Lake Gen2 (ADLS) storage account.
- knowledge in Azure data factory.
- Fabric workspace.
Steps:
Step 1: Enable Managed Identity in Workspace level for Fabric Data Factory pipeline
Workspace identity can be created and deleted by workspace admins. The workspace identity has the workspace contributor role on the workspace.
Workspace identity is supported for authentication to target resources in connections. Only users with an admin, member, or contributor role in the workspace can configure the workspace identity for authentication in connections.
Managed identities allow Fabric Data Factory to securely authenticate to other Azure services without hardcoding credentials.
- Navigate to the workspace and open the workspace settings.
- Select the Workspace identity tab.
- Select the + Workspace identity button.
Once enabled, this identity can be used to access resources like Azure SQL Database securely.
Step 2: Configure trusted workspace access in ADLS Gen2
- Sign in to the Azure portal and go to Custom deployment.
- Choose Build your own template in the editor. For a sample ARM template that creates a resource instance rule, see ARM template sample.
- Create the resource instance rule in the editor. When done, choose Review + Create.
- On the Basics tab that appears, specify the required project and instance details. When done, choose Review + Create.
- On the Review + Create tab that appears, review the summary and then select Create. The rule will be submitted for deployment.
When deployment is complete, you'll be able to go to the resource.
Step 3: Create a pipeline to connect to ADLS gen2 and copy data to Fabric Lakehouse
with this pipeline, we will connect directly to a firewall-enabled ADLS Gen2 account that has trusted workspace access enabled.
- Navigate to your workspace, click on new item
- Create new pipeline
- In your pipeline, add Copy activity to the canvas
- In Copy activity Source Tab : Choose ADLS Gen2 as data source and connect to it
- In Destination tab, Connect to the lakehouse and select table
Step 4: Results
After copy activity finish running, you can view the data in your Lakehouse
Conclusion
- Securing your data pipelines in Azure Data Factory is essential to maintaining the integrity, confidentiality, and availability of your data. By leveraging features such as managed identities you can build a robust security framework around your data flows.
Do you have any other tips for securing Fabric Data Factory? Let me know in the comments!
- Follow me on LinkedIn: Sally Dabbah | LinkedIn