Author(s): Gilles L'Hérault is a Program Manager in Azure Synapse Customer Success Engineering (CSE) team.
It’s one thing to be a great Azure Service, it’s another to also be easy and efficient to manage. This is part one of our “Enterprise readiness” series for Azure Synapse Data Explorer. We will focus on how customers can leverage enterprise ready features to keep their data safe. Part two will cover how we help customers “keep the lights on” (i.e., Running operation costs) for their Azure Synapse Data Explorer investment while not overburdening their existing operational load.
Keeping the data safe
When working with Azure Synapse Data Explorer, it's important to keep in mind that the data received may contain sensitive, personally identifiable information (PII). However, the good news is that the service is equipped with a variety of built-in mechanisms that work together in a defense-in-depth approach to safeguard your data.
Figure 1 - Azure Synapse Data Explorer defense in depth
In addition to the built-in security mechanisms of Azure Synapse Data Explorer, the first line of defense in protecting your data is through operational excellence, often referred to as operational security (Ops Sec) by cybersecurity experts. This means ensuring that you have proper controls in place to manage access to the data and closely monitoring those with administrative rights. Even the strongest defense mechanisms can be compromised if access is granted to the wrong person or not properly monitored, so it's critical to have a strong operational security plan in place.
Security identities Hygiene
Hackers don’t break in, they log in! This is a reminder that any credential that has permission to see your data is a potential attack vector. Make sure every security principal that has access is regularly audited and expiration dates should be set on all privileged accounts.
Privileged identity management
You don’t need to be an admin at every hour of the day. We recommend that all security principals who have elevated permission either on the management plane of ADX or the Data Plane be subjected to Azure Privileged identity management (PIM) and elevated roles be granted on a “just-in-time” basis. Please ensure that you balance out this requirement with usability and user experience. If your administrators must elevate their privilege 100 times a day you will face so called “PIM-fatigue” issues which usually leads to bad behaviors.
Data cataloguing with Azure Purview
It’s difficult to govern data you do not know you have. Cataloging what is inside your databases should therefore be a priority. This will help identify, not only the data you have but how you should secure it. Check out the Azure Purview connector for data explorer, this will let you run data classifications to help identify potential sources of sensitive information and in turn allow you to make informed decisions on how to properly secure them.
The only data you cannot steal is data that no longer exists. Azure Synapse Data Explorer encourages you to set retention policies so that you consume resources more efficiently but there is also a security angle as well. If data has outlived its retention period based on business and compliance requirements, it’s a good idea to purge it from the system. Azure Synapse Data Explorer safely and permanently deletes data according to the policies you set.
Azure Synapse Data explorer can be deployed in a fully private network configuration using private endpoints. This allows you to deploy Azure Synapse Data Explorer in such a manner that only computers connected to your network can access the service. In addition, Azure Synapse Data Explorer will use our managed private endpoints to connect to ancillary services such as Event Hubs or Azure SQL Databases thus ensuring a private end to end connectivity.
We strongly recommend that the guidance found in our published Cloud Adoption Framework is implemented prior to deploying Azure Synapse Data Explorer in a private networking configuration to avoid manual labor and errors that are costly to remediate. Specifically, the guidance around deploying private endpoints at scale will prove invaluable and prevent bespoke configurations that are difficult to manage and remediate.
Azure Synapse Data explorer supports Azure AD conditional access policies. This means you can define policies that a user and the computer they use to access the service must meet in order to gain access. For example, one can setup a conditional access rule that will deny access to Azure Synapse data explorer if a user is not using multi-factor authentication.
When it comes to data persistence, Azure Synapse Data Explorer has you covered. Your data is stored in two places: hot data can be found on the local SSD drives of the individual nodes composing the clusters, while hot and cold data are also persisted in a Microsoft-managed blob storage. For added security, data in the SSD drives is encrypted using the disk encryption feature, while cold data in blob storage is automatically encrypted using Storage level encryption. Additionally, customers can use their own keys to manage encryption for cold data, providing an added layer of security. However, it's important to note that hot data encrypted using disk encryption is encrypted using a Microsoft-managed key.
For the most stringent compliance use case, double encryption can be activated. However, it's important to note that enabling double encryption may have a slight impact on performance, so it's important to plan accordingly and scale your cluster appropriately.
Data in flight is secured using HTTPS with TLS 1.2. Additionally, data can be obfuscated using row level security (see next section for details.)
Data access policies and authorizations
Least privilege access
It all starts by giving the least number of permissions for only the data you need to see. We recommend you create “access packages” that grants basic reader permissions on “general purpose” tables. That way you can easily grant access to the bulk of your data without fear of over-permission. Special Azure Active directory security groups should be created for management plane tasks such as scaling up or down a cluster. Ideally access to these groups is controlled through Azure Privileged identity management (PIM).
See the section below for the different RBAC roles available and take care to create the relevant access packages and audit these permissions regularly.
Role based access control (RBAC)
Azure Synapse Data explorer offers a range of built-in RBAC roles that can be used to create a comprehensive least privilege permission footprint.
It's important to carefully consider the scope of these roles before assigning them to users or groups. For example, some roles like "All database admin" are inherently scoped to the cluster itself, meaning that if you're assigned that role, you're an admin of any new database being created. Other roles like "database admin" are scoped to a specific database, allowing for more refined granularity. It's also worth noting that while "All" roles can be granted in the configuration blade on the Azure portal, more granular roles must be assigned at the data plane using KQL (Kusto Query Language) commands.
Like SQL Server Dynamic data masking, Azure Synapse Data explore can use row Level security policies to obfuscate the data. In essence, you can define a specific KQL command that will manipulate the data when you run the query to remove any sensitive information. This can also be used to filter data based on the identity of a caller.
This wraps us part one of this blog post but stay tuned for part two where we’ll dive into the other enterprise features that will help you managed, operate and generally “keep the lights on” with minimal effort.