synapse spark
69 TopicsImprove Spark pool utilization with Synapse Genie
Synapse Genie Framework improves Spark pool utilization by executing multiple Synapse notebooks on the same Spark pool instance. It considers the sequence and dependencies between notebook activities in an ETL pipeline, which results in higher usage of a full cluster for resources available in a Spark pool.12KViews18likes9CommentsThe best practices for organizing Synapse workspaces and lakehouses
While designing the Lakehouse solution, you should carefully organize your databases and tables based on the underlying folder structure. In this article, you will find some best practices and recommendations that can help you to organize your lakehouses if you are using Synapse Analytics workspace to implement them.38KViews16likes3CommentsSynapse – Data Lake vs. Delta Lake vs. Data Lakehouse
As a data engineer, we often hear terms like Data Lake, Delta Lake, and Data Lakehouse, which we might be confusing at times. In this blog we’ll demystify these terms and talk about the differences of each of the technologies and concepts, along with scenarios of usage for each.52KViews14likes0CommentsData mesh: A perspective on using Azure Synapse Analytics to build data products
This is a multi-part blog series, and it discusses various aspects of implementing data mesh architecture on Azure. This part focuses on data as a product principle and presents a perspective on using Azure Synapse Analytics as a data product. We discuss (at a high-level) data product functions & capabilities and apply that lens to Synapse Analytics. We discuss how workspaces can be partitioned to give domains scale and agility to build data products.18KViews11likes4CommentsThe Data Lakehouse, the Data Warehouse and a Modern Data platform architecture
There are two contradictory themes about how to build a modern data platform being proposed to data architects today. This article discusses why is there such a big disparity between two approaches, how we make sense of these competing patterns and why the modern data warehouse architecture provides a flexible and pragmatic approach.38KViews11likes3CommentsUsing OpenAI GPT in Synapse Analytics
Azure OpenAI hardly needs an introduction, but for those who managed to evade all tech new lately, let me give you a brief overview. Azure OpenAI is a suite of natural language processing (NLP) models developed by OpenAI. The models can be used in a very wide range of applications, including text generation, summarization and translation.19KViews8likes1CommentSynapse Spark - Encryption, Decryption and Data Masking
As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.9.4KViews7likes2Comments