Forum Widgets
Latest Discussions
Synapse Spark mssparkutils run multiple and log analytics
Hi all, Context: We have developed a solution using Synapse Notebooks as Fabric hasn't been approved for production yet. The initial approach was to orchestrate each notebook using data pipelines, but it would take c.2-3min to spin up the spark session on each one (46 Notebooks). The reason we wanted to use this approach is to capture the logs with log analytics for each one of the notebooks. The whole solution was taking an average of 1 h 40 min. Not ideal when the expectation is to refresh the data every 2 hours. Reviewing Microsoft documentation,Introduction to Microsoft Spark utilities - Azure Synapse Analytics | Microsoft Learn, one suggested way to reduce runtime was to use the mssparkutils.notebook.runtime() function, allowing us to run multiple notebooks within the same spark session and share computing resources. With this new approach, we designed a DAG that reduced our execution to 45 min, more aligned with the initial expectation of refresh scheduling we had in mind. Problem: With the new implementation, we orchestrate the execution of the notebooks using mssparkutils in one pipeline with one notebook and a trigger. This means that we have lost the ability to monitor individual notebooks with log analytics, as it only monitors the main pipeline/spark session/notebook and not all the executions within it. Has someone faced a similar issue? Is there a way to send to log analytics information regarding each notebook that is running inside the DAG in runmultiple? We want to monitor start time, end time, and status (queue, in progress, succeeded, failed) and capture errors if they occur. Thank you.VictorMed02Feb 09, 2024Copper Contributor1KViews2likes1CommentInvalid or unknown format type 'MISONCSV'. Dynamics F&O Export to Data Lake, CDM Util
To give some context: We have a data feed out of Microsoft Dynamics F&O, Exporting data tables to our Azure Data Lake Gen 2. This is then picked up by Synapse as tables. I created the tables using the CDM util. This has been working perfect for the last couple of months. As of today all tables are throwing this error: Invalid or unknown format type 'MISONCSV'. When I query, i.e SELECT * FROM [dbo].[Currency] The table comes from a CSV in the Datalake I tried opening it using OPENROWSET which has the same error. CSV looks fine to me and lines up with the cdm.json file. Can't find any mention of this error online so not really sure where to start looking.SolvedWSS12610Nov 28, 2022Copper Contributor2.7KViews2likes5CommentsNOT IN or IN Clause gives unexpected result Azure Synapse/Azure SQL Database
I have seen SQL developers using IN / NOT IN / NOT EXISTS conditions frequently in Azure Synapse/Azure SQL Database to filter out the rows which are not common in either of the tables using few joins and sub-queries. I made an observation recently that the NOT IN clause did not give expected number of rows for a query. For ex. If you expect an output of 3022 rows not in table T1 but the actual result you get is really unexpected ! Say you get an output result 0. Oops! How is it possible that you had to get an output of 3022 rows, but the actual windows shows you Zero result Expected : Actual Output : Question : Why there is a difference ? Answer : This is a limitation of the command when you have NULL record value in your table. Yes you read it correct, this is limitation. Please read out below : Demo : Case 1 where Not IN gives expected result : I introduced NULL values in one columns on which I have put NOT IN or IN join and see the difference in output just because of one Null Value : How to Identify the issue: Try to find if any of the columns have NULL values due to which the issue is being faced : Mitigation : 1. You can use NOT EXISTS instead which will work even after these constraints (NULL valued column) You need to use Syntax for NOT EXISTS : 2. You can eliminate NULLs. That's all Folks Cheers! Don't forget to comment/Ask Questions!Mukund_BhashkarMay 21, 2020Microsoft7.5KViews2likes2CommentsTrying to call Azure rest api by using managed identity in Azure synapse notebook but failed
I'm trying to call Azure rest api by using managed identity in Azure synapse notebook but get following error. As you can see, I already enabled the managed identity run on my notebook and the contributor role also assigned to MSI for the corresponding azure devops project. Not quite sure where is the issue. May be I used wrong scopes? Should I turn on something before execute the notebook? Thanks for your help in advance!Feng_Su_2525May 16, 2024Microsoft333Views1like0CommentsDelta Query Performance in Azure Synapse Serverless: Issues with Response Times using Serverless SQL
We are encountering significant performance issues with Delta query execution in Azure Synapse Serverless across multiple client setups. Specifically, we utilize views built on Delta tables within Synapse Serverless to deliver data via a Synapse on-demand SQL endpoint. When end users execute queries involving simple joins at this endpoint, we observe that the response times are 7-8 times slower compared to equivalent views created over Parquet files. Efforts to optimize have included optimizing & vacuuming the Delta tables and explicitly defining data types during view creation, which resulted in only marginal improvements. We suspect the issue might be related to Synapse utilizing an outdated version of the Delta reader. For comparison, using a Databricks SQL Warehouse dramatically enhances the query response times. However, this approach does not support querying via SQL Server Management Studio, which is a significant limitation for our end users. Has anyone else experienced similar performance challenges with Delta in Synapse Serverless? Any recommendations or workarounds would be greatly appreciated. Also, maintaining duplicate datasets in both Delta and Parquet formats solely to support SQL endpoints feels inefficient. Thoughts on this?RoelemanskiApr 30, 2024Copper Contributor595Views1like2CommentsSynapse link reset after database refresh
Hi We are using Azure Synapse link to export data from Dataverse to Datalake. The problem now is after our UAT (User Acceptance Testing) environment got refreshed from Production environment, the data in UAT environment got dumped into the Datalake for Production environment. So the question is what is the correct process for database refresh and restore the Azure Synapse link so we won't have data from Test environment exported to Production Datalake? Thank you in advance!CavewomanNov 29, 2023Copper Contributor957Views1like0CommentsIssue querying delta lake with synapse: The log file points to an non-existing file
I am getting the below error when querying a delta table, the log file has information about a non-existing file. This is due to a manual deletion of that file rather than deleting it from code. Is ther a command that can fix this issue? or something similar to DBCC on sql server? The table is very large, and I don't think re-creating it is an option. Cannot process the file "https://abcdev01.dfs.core.windows.net/serverdev01/FixDelta/Date=20230726/InstanceName=Mizuho/part-00005-3dcc3b04-f1bc-446a-b9b5-b82956ef300f.c000.snappy.parquet" because it does not exist or you don't have file access rights.anteriorOct 02, 2023Copper Contributor384Views1like0CommentsDoes anyone know why Data Explorer database (preview) is greyed out?
Does anyone know why Data Explorer database (preview) is greyed out for me?SolvedHamidBeeAug 04, 2023Brass Contributor693Views1like1Comment