Azure Synapse Analtyics
9 TopicsHow to access Azure PostgreSQL Flex Server from ADF Managed VNet using a Private End Point
Currently, a PostgreSQL flexible server doesn't support Azure Private Link. Instead, it uses virtual network injection to make the flexible server available within a virtual network. Access Azure PostgreSQL flexible server from an Azure Data Factory VNET using a private endpoint, allowing you to perform ELT between data sources.Data flow sink supports user db schema for staging in Azure Synapse and PostgreSQL connectors
To achieve the fastest loading speed for moving data into a data warehouse table, load data into a staging table. Consider that loading is usually a two-step process in which you first load to a staging table and then insert the data into a production data warehouse table. Loading to the staging table takes longer, but the second step of inserting the rows to the production table does not incur data movement across the distributions. Data flow sink transformation supports staging. By default, a temporary table will be created under the sink schema as staging. For Azure Synapse Analytics and Azure PostgreSQL, you can alternatively uncheck the Use sink schema option and instead, specify a schema name under which Data Factory will create a staging table to load upstream data and automatically clean them up upon completion. Make sure you have create table permission in the database and alter table permissions on the schema. Please follow links below for more details. User db schema for staging in Azure Synapse Analytics User db schema for staging in Azure PostgreSQL4.6KViews1like0CommentsConnecting Data Factory To Synapse Using A Private Endpoint
I am trying to setup a Linked Service in Data Factory to Synapse using a private endpoint IP address. When using the Data Factory wizard to set up a Linked Service to a Synapse, it generates this connection string: Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=workspace.sql.azuresynapse.net;Initial Catalog=database; This works fine. However, I am required to use the private endpoint IP addresses instead of the public hostname. The private endpoint has already been created. When replacing the hostname with the private IP address, it results in this error: Cannot connect to SQL Database. Please contact SQL server team for further support. Server: '10.10.10.10', Database: 'database', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access. A connection was successfully established with the server, but then an error occurred during the login process. The target principal name is incorrect. Following some advice on online, I have changed the connection string to remove the Encrypt=True; and added TrustServerCertificate=True. However, this results in another error: Cannot connect to SQL Database. Please contact SQL server team for further support. Server: '10.10.10.10', Database: 'database', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access. Cannot open server "10.10.10.10" requested by the login. The login failed. A user has been created in the Synapse database for the Data Factory System Managed Identity, and this works when using the public hostname, why does it not work when using the private endpoint IP address?1.1KViews0likes0CommentsF&O Dataverse Limitation?
Hello Azure Community, Quick question: If a customer has significant data volumes and feel that they may consume large amounts of data with F&O Dataverse, will they need to purchase additional Dataverse database to use Synapse? I reviewed the following MS Learn Docs and do not find information on limitations: https://learn.microsoft.com/en-us/power-apps/maker/data-platform/export-to-data-lake https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-synapse#prerequisites https://learn.microsoft.com/en-us/power-apps/maker/data-platform/export-data-lake-faq Thank you in advance for your assistance. LC429Views0likes0CommentsFetching JSON object from Cosmos DB in Azure Synapse Studio fails
Hi everyone! Facing an issue while fetching document from linked Cosmos DB in Azure Synapse Studio I've tried: fetching this object as: VARCHAR(MAX) NVARCHAR(MAX) identifying full path in WITH clause: failing_complex_object VARCHAR(MAX) '$.failing_complex_object' trying to decompose it in SELECT clause: ,JSON_QUERY([target_table].[failing_complex_object]) ,JSON_VALUE(failing_complex_object, '$.failing_complex_object') It always returns NULL. Complex objects without issue are bigger size and one of them is JSON object, the other one is JSON array. Complex object with issue is a valid JSON object. The question is: WHY two complex objects are fetching fine but this one fails. Data examples are below. Any help would be greatly appreciated. P.S. Due to NDA cannot share with you the actual DB data but I assure you that SCHEME is absolutely the same. P.S. P.S. Sorry for JSON and SQL examples. Couldn't load all the examples using "Insert code sample" tool. It didn't work. fetch example Document example { "field_01": "title", "field_02": "12345678", "complex_object_01": [ { "field_01": 1, "field_02": "data" }, { "field_01": 2, "field_02": "data" } ], "complex_object_02": { "rows": [ { "field_01": 0.1, "field_02": 0.0, "field_03": 0.1, "rowIndex": 0 } ] }, "failing_complex_object": { "data_01": { "field_01": 0, "field_02": 0 }, "data_02": { "field_01": 0, "field_02": 0 } } } script example SELECT field_01, field_02, complex_object_01, complex_object_02, failing_complex_object FROM OPENROWSET ( PROVIDER = 'CosmosDB', CONNECTION = '', OBJECT = 'target_table') WITH ( field_01 VARCHAR(MAX), field_02 VARCHAR(MAX), complex_object_01 VARCHAR(MAX), complex_object_02 VARCHAR(MAX), failing_complex_object VARCHAR(MAX) ) AS [target_table] WHERE field_01 = 'title' AND field_02 = '12345678'399Views0likes0CommentsRobust data protection features of Azure Synapse
Introduction Data serves as the vital essence of any organization. Whether you’re dealing with sensitive customer information, or financial records, safeguarding your data is non-negotiable. Many organizations face challenges such as: How do you protect the data if you don't know where it is? What level of protection is needed? —because some datasets require more protection than others. Azure Synapse Analytics offers powerful features to help you achieve this, ensuring confidentiality, integrity, and availability. In this blog, we’ll explore the Data Encryption capabilities integrated into Azure Synapse Analytics, discussing encryption techniques for data at rest and in transit, as well as approaches for detecting and categorizing sensitive data in your Synapse workspace. What is Data Discovery and Classification? Imagine your company that have massive amounts of information stored in their databases. But some of columns needs extra protection – like Social Security numbers or financial records. Manually finding this sensitive data is a time-consuming nightmare. Here's the good news: there's a better way! Azure Synapse offers a feature called Data Discovery that automates this process. How does Data Discovery work? Think of Data Discovery as a super-powered scanner. It automatically goes through every row and column of your data lake or databases, looking for patterns that might indicate sensitive information. Just like a smart assistant, it can identify potentially sensitive data and classify those columns for you. Once the data discovery process is complete, it provides classification recommendations based on a predefined set of patterns, keywords, and rules. These recommendations can then be reviewed, and then Sensitivity-classification labels can be applied to the appropriate columns. This process is known as Classification. What happen after classifying sensitivity labels on columns? Sensitivity-classification labels is a new metadata attributes that have been added to the SQL Server database engine. So, after classifying sensitivity labels on columns, the organization can leverage these labels to: implement fine-grained access controls. Only authorized person with the necessary clearance can access sensitive data. masking the sensitive data when accessed by users who do not have the necessary permissions, allowing them to see only anonymized versions of the data. monitoring of access and modification activities on sensitive data (Auditing access to sensitive data). Any unusual or unauthorized activities can be flagged for investigation. Steps for Discovering, Classifying or labelling columns that contain sensitive data in your database The classification includes two metadata attributes: Labels: The main classification attributes, used to define the sensitivity level of the data stored in the column. Information types: Attributes that provide more granular information about the type of data stored in the column. Step 1 -> Choose Information Protection policy based on your requirement SQL Information Protection policy is a built-in set of sensitivity labels and information types with discovery logic, which is native to the SQL logical server. You can also customize the policy, according to your organization's needs, for more information, see Customize the SQL information protection policy in Microsoft Defender for Cloud (Preview). Step 2 -> View and apply classification recommendations The classification engine automatically scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications. After accepting recommendation for columns by selecting the check box in the left column and then select Accept selected recommendations to apply the selected recommendations. You can also classify columns manually, as an alternative or in addition to the recommendation-based classification. To complete your classification, select Save in the Classification page. Note: There is another option for data discovery and classification, which is Microsoft Purview, which is a unified data governance solution that helps manage and govern on-premises, multi-cloud, and software-as-a-service (SaaS) data. It can automate data discovery, lineage identification, and data classification. By producing a unified map of data assets and their relationships, it makes data easily discoverable. Data Encryption Data encryption is a fundamental component of data security, ensuring that information is safeguarded both at rest and in transit. So, Azure Synapse take care of this responsibility for us. It leverages robust encryption technologies to protect data. Data at Rest Azure offers various methods of encryption across its different services. Azure Storage Encryption By default, Azure Storage encrypts all data at rest using server-side encryption (SSE). It's enabled for all storage types (including ADLS Gen2) and cannot be disabled. SSE uses AES 256 to encrypts and decrypts data transparently. AES 256 stands for 256-bit Advanced Encryption Standard. AES 256 is one of the strongest block ciphers available and is FIPS 140-2 compliant. Well, I know these sounds like some Hacking terms😅. But the platform itself manages the encryption key, so we don't need to understand these Hacking terms😅. Also, it forms the first layer of data encryption. This encryption applies to both user and system databases, including the master database. Note: For additional security, Azure offers the option of double encryption. Infrastructure encryption uses a platform-managed key in conjunction with the SSE key, encrypting data twice with two different encryption algorithms and keys. This provides an extra layer of protection, ensuring that data at rest is highly secure. Double the Protection with Transparent Data Encryption (TDE) It is an industrial methodology that encrypts the underlying files of the database and not the data itself. This adds a second layer of data encryption. TDE performs real-time I/O encryption and decryption of the data at the page level. Each page is decrypted when it's read into memory and then encrypted before being written to disk. TDE encrypts the storage of an entire database by using a symmetric key called the Database Encryption Key. Means when data is written to the database, it is organized into pages and then TDE encrypts each page using DEK before it is written to disk, that makes it unreadable without the key. And when a page is read from disk into memory, TDE decrypts it using the DEK, making the data readable for normal database operations. Why do we call it transparent? because the encryption and decryption processes are transparent to applications and users, they have no idea that the data is encrypted or not, the only way they would know if they don't have access to it. This is because encryption and decryption happen at the database engine level, without requiring application awareness or involvement. By default, TDE protects the database encryption key (DEK) with a built-in server certificate managed by Azure. However, organizations can opt for Bring Your Own Key (BYOK), that key can be securely stored in Azure Key Vault, offering enhanced control over encryption keys. Data in transit Data encryption in transit is equally crucial to protect sensitive information as it moves between clients and servers. Azure Synapse utilizes Transport Layer Security (TLS) to secure data in motion. Azure Synapse, dedicated SQL pool, and serverless SQL pool use the Tabular Data Stream (TDS) protocol to communicate between the SQL pool endpoint and a client machine. TDS depends on Transport Layer Security (TLS) for channel encryption, ensuring all data packets are secured and encrypted between endpoint and client machine. It uses a signed server certificate from the Certificate Authority (CA) used for TLS encryption, managed by Microsoft. Azure Synapse supports data encryption in transit with TLS v1.2, using AES 256 encryption.301Views0likes0CommentsWhat Synapse Serverless SQL pool authentication type for ADF Linked Service?
Hi, I'm relatively new to Azure Data Factory and require your guidance on how to successfully create/test a Linked Service to the Azure Synapse Analytics Serverless SQL pool. In the past, I've successfully created a Linked Service to a third-party (outside our domain) on-premises SQL Server through creating a self-hosted integration runtime on their box and then creating a Linked Service to use that. The Server Name, Database Name, Windows authentication, my username and password all configured by the third-party is what I entered into the Linked Service configuration boxes. All successfully tested. This third-party data was extracted and imported, via ADF Pipelines, into an Azure SQL Server database within our domain. Now I need to extract data from our own (hosted in our domain) Azure Synapse Analytics Serverless SQL pool database. My attempt is this, and it fails: 1) I create a 'Azure Synapse Analytics' Data Store Linked Service. 2) I select the 'AutoResolveIntegrationRuntime' as the runtime to use - I'm thinking this is correct as the Synapse source is within our domain (we're fully MS cloud based). 3) I select 'Enter manually' under the 'Account selection method'. 4) I've got the Azure Synapse Analytics Serverless SQL endpoint - which I place into the 'Fully qualified domain name' field. 5) I entered the data SQL Database name found under the 'SQL database' node/section present on the Data >> Workspace screen in Synapse. 6) I choose 'System-assigned managed identity' as the Authentication type - this is a guess and I was hoping it would recognised my username/account that I am building the Linked Service with, as that account also can query Synapse too and so has Synapse access. 7) I check the 'Trust server certification' box. All else is default. When I click test connection, it fails with the following message: "Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'xxxxxxxxxxxx-ondemand.sql.azuresynapse.net', Database: 'Synapse_Dynamics_data', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access. Login failed for user '<token-identified principal>'." I've reached out to our I.T. (who are novices with Synapse, ADF, etc.. even though they did install them in our domain) and they don't know how to help me. I'm hoping you can help. 1) Is choosing the 'Azure Synapse Analytics' the correct Data Store to chose when looking extract data from an Azure Synapse Serverless SQL pool SQL database? 2) Is using the AutoResolveIntegrationRuntime correct if Synapse is held within our domain? I've previously confirmed this runtime works (and still does) as when importing the third-party data I had to use that runtime to load the data to our Azure SQL Server database. 3) Have I populated the correct values for the 'Fully qualified domain name' and 'Database name' fields by entering the Azure Synapse Analytics Serverless SQL endpoint and subsequent SQL Database name, respectively? 4) Is choosing 'System-assigned managed identity' as the Authentication type correct? I'm guessing this could be the issue. I selected this as when loading the mentioned third-party data into the Azure SQL Server database, within our domain, this was the authentication type that was used (and works) and so I'm assuming it somehow recognises the user logged in and, through the magic of cloud authentication, says this user has the correct privileges (as I should have the correct privileges so say I.T.) so allow the Linked Service to work. Any guidance you can provide me will be much appreciated. Thanks.99Views0likes0Comments