Spark

8 Topics

How to Query Spark Tables from Serverless SQL Pools in Azure Synapse
Introduction Say goodbye to constantly running Spark clusters! With the shared metadata functionality, you can shut down your Spark pools while still be able to query your Spark external tables using Serverless SQL Pool. In this blog we dive into, how Serverless SQL Pool streamlines your data workflow by automatically synchronizing metadata from your Spark pools. Shared Metadata functionality Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Apache Spark pools and serverless SQL pool. When we create tables in Apache Spark Pool, whether managed or external, the Serverless SQL pool automatically synchronizes its metadata. This metadata synchronization automatically creates a corresponding external table in a serverless SQL pool database. Then after a short delay, we can see the table in our Serverless SQL pool. Creating a managed table in Spark and querying from Serverless SQL Pool Now we can shut down our Spark pools and still be able to query Spark external tables from Serverless SQL Pool. NOTE: Azure Synapse currently only shares managed and external Spark tables that store their data in Parquet, DELTA, or CSV format. Tables backed by other formats are not automatically synced. You may be able to sync such tables explicitly yourself as an external table in your own SQL database if the SQL engine supports the table's underlying format. Also, External tables created in Spark are not available in dedicated SQL pool databases. Why we get an error if you use dbo schema in Spark pool or if you don’t use dbo schema in Serverless SQL pool? The dbo schema (short for “database owner”) is the default schema in SQL Server and Azure Synapse SQL pools. Spark pool only supports user-defined schemas. Means, it does not recognize dbo as a valid schema name. While in Serverless SQL Pool, all the tables belong to the dbo schema, regardless of their original schema in Spark pool or other sources.
ayush9892
Jan 21, 2025 Place Educator Developer Blog
333Views
0likes
0Comments
Generating Test Data with Azure OpenAI GPT-3 in Spark:
Generating Test Data with Azure OpenAI GPT-3 in Spark: A Powerful Tool for Developers and Data Analysts. Creating test data is an important task for developers and data analysts alike. However, manually creating test data can be time-consuming and error-prone. In this video, Thomas and Stijn demonstrate how to generate test data using Azure OpenAI GPT-3 within Spark in Synapse Analytics.
Lee_Stott
Feb 26, 2023 Place Educator Developer Blog
8.1KViews
1like
0Comments
Microsoft Azure Data Bricks–Collaborative Apache Spark Analytics Platform
First published on MSDN on Apr 19, 2018 I was having a conversation with some colleagues about a institutions which wanted to understand some ways of integrating Azure’s data science services in their curriculum for the new semester So one of the suggestions we came up was the usage of Microsoft Azure DSVMs, HDInsight clusters, DataBricks & Notebooks.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
775Views
0likes
0Comments
Running Spark on a GPU enabled cluster with AZTK
First published on MSDN on Feb 07, 2018 The ability to run Spark on a GPU enabled cluster demonstrates a unique convergence of big data and high-performance computing (HPC) technologies.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
617Views
0likes
0Comments
Microsoft Machine Learning for Apache Spark
First published on MSDN on Aug 08, 2017 MMLSparkMMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
1.7KViews
0likes
0Comments
Learning more about the Microsoft Data Science Virtual Machine 4th April 6pm–7pm
First published on MSDN on Mar 27, 2017 Public webinar on DSVM This webinar focuses on demonstrating how the Data Science Virtual Machine (DSVM) in Microsoft Azure conveniently enables key end-to-end data analytics scenarios by providing users immediate access to a collection of the top data science and development tools of the industry, completely pre-configured, with worked out examples and sample code.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
534Views
0likes
0Comments
Spark for Azure HDInsight
First published on MSDN on Mar 15, 2017 Guest blog from Alberto De Marco Technology Solutions Professional – Big Data This week we just launched Azure Data Lake service in Europe Azure Data Lake Analytics and Azure Data Lake Store are now available in the North Europe region.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
629Views
0likes
0Comments
Big Data on Azure with No Limits Data, Analytics and Managed Clusters
First published on MSDN on Feb 24, 2017 HDInsight Reliable with an industry leading SLA Enterprise-grade security and monitoring Productive platform for developers and scientists Cost effective cloud scale Integration with leading ISV applications Easy for administrators to manage Resources & Hands on Labs for teaching https://github.
Lee_Stott
Mar 21, 2019 Place Educator Developer Blog
963Views
0likes
0Comments