Blog Post

Azure Synapse Analytics Blog
7 MIN READ

Access and analyze all data from the Data Hub in Azure Synapse Analytics

charlesfeddersenMS's avatar
Dec 10, 2020

The Data Hub in Azure Synapse is a central place where you can view and interact with your data sources and, most importantly, query across all of your data sources. Combine this with the Azure Open Datasets found in the Knowledge Center; you can augment your organizational data in a matter of seconds. Azure Open Datasets includes sample data important to business trends; here, you will find data relative to COVID-19, transportation, weather, economic indicators, and more.

 

Anatomy of the Data Hub

The Data Hub is available from the Synapse Studio left menu and is represented by the database cylinder icon. There are two tabs in the Data Hub blade, Workspace and Linked. The Workspace tab is where you will find your workspace databases for both SQL and Spark. The Linked tab is where you find linked external datasets and integration datasets used in data flows and pipelines. There are over 95 connectors to various data-centric storage technologies. It is also in this linked area where you will find sample data obtained from Azure Open Datasets.

 

In the header of the Data Hub blade, you will find the + menu. This menu provides the ability to add a new SQL database, connect to external data, create integration datasets, or browse the Knowledge Center gallery.

Image of the Data Hub tabs and + menu

 

The Data Hub blade provides quick access to your workspace and linked data stores through convenient Action (context) menus. For example, if you select the Views folder of a SQL pool, the context menu provides an option to create a New SQL Script with the option to scaffold a Create View query.

The Actions menu for a SQL pool Views folder is displayed

 

For linked storage accounts, the context menu gives you the choice of interacting with data via scaffolded SQL scripts or Apache Spark notebooks. In addition to this, when browsing a storage container, a handy data explorer tab will open. The data explorer's taskbar provides even more actions directly at your fingertips, including the ability to upload and download files, manage ACL access, create an integration dataset, and more!

A data explorer tab is displayed

 

Creating a serverless SQL pool

Let's create a serverless SQL Pool from within the Data Hub and observe how it gets added to the Workspace tab. Expand the + menu on theData Hub blade and select SQL database beneath the Workspace heading. To the right, a Create SQL database blade appears where you have the choice to create either a serverless or dedicated SQL pool type. In this case, we desire a serverless SQL pool type. We also need to name a database associated to that serverless pool. Select the Create button to deploy the serverless SQL Pool. After a few minutes, refresh the Workspace tab in the Data Hub blade to view the newly created database.

The Create SQL database form is displayed alongside the refreshed Data Hub blade showing the newly created serverless SQL database

 

Leveraging HTAP capabilities with Azure Synapse Link for Azure Cosmos DB

If you have an analytical store hosted in Azure Cosmos DB, you can connect to it directly from Azure Synapse Analytics through Azure Synapse Link. This connection enables data to flow from Azure Cosmos DB to Azure Synapse without the use of any ETL (Extract-Transform-Load) mechanisms. Azure Synapse Link provides a cloud-native hybrid transactional and analytical processing (HTAP) capability. This HTAP capability offers the benefit of delivering near-real-time data into analytical queries, Power BI dashboards, machine learning pipelines, and more without impacting the transactional workloads in Azure Cosmos DB.

 

To enable Azure Synapse Link, open your Cosmos DB resource in the Azure Portal. From the left menu, select the Features item from beneath the Settings heading. From the Features listing, select Azure Synapse Link. At this time, the Azure Synapse Link blade will appear on the right side of the screen. Select the Enable button to enable this feature.

The Azure Cosmos DB Features screen is shown along side the Azure Synapse Link form

 

In Synapse Studio, return to the Data Hub, expand the + menu, and select the Connect to external data item located in the Linked section of the menu.

The Data Hub + menu is expanded with Connect to external data selected

 

The Connect to external data blade will appear on the screen's right. Select one of Azure Cosmos DB API options. In this case, we'll be using the SQL API. Next, the New linked service form will display. Name the linked service and connect to your Azure Cosmos DB resource and analytical store container using your desired authentication method. Once complete, refresh the Data Hub screen to see your HTAP enabled container located in the Linked tab under the Azure Cosmos DB section. Query this data quickly by selecting a collection and expanding the Actions menu and selecting New SQL script, then Select TOP 100 rows.

The Azure Cosmos DB HTAP collection

 

In the generated SQL script, you will need to supply either a key or credential before executing the query. From the top taskbar, we can select the Sales Team serverless SQL database that we created earlier. Select the Run button to execute the query and see the sales transaction data.

A SQL script tab is shown with a query against Azure Cosmos DB and its tabular results. The Sales Team database selected

 

Augmenting existing data with Azure Open Datasets

Now that we have the Azure Cosmos DB linked to Azure Synapse, we can augment the sales data with COVID-19 data obtained from Azure Open Datasets. We will use the Knowledge Center to add this dataset to Azure Synapse. In the Data Hub, expand the + menu and select Browse gallery. With the Datasets tab selected in the gallery, choose the Bing COVID-19 Data card, then select the Continue button.

The Knowledge Center Gallery is shown with the Bing COVID-19 Data card selected

 

An informational screen then displays along with a preview of the data you should expect in the dataset. Select the Add dataset button to include this data in Azure Synapse.

An informational screen is displayed describing the COVID-19 dataset and providing a preview

 

The COVID-19 data is now available in the Data Hub under the Linked tab beneath the Azure Blob Storage heading. Expand the Actions menu next to the bing-covid-19-data folder, and choose New SQL script, then Select TOP 100 rows.

In the Linked tab of the Data Hub, the Azure Blob Storage item is expanded and the actions menu is expanded for the bing-covid-19-data folder. From the actions menu, new SQL Script and Select TOP 100 rows is selected

 

In the SQL script tab toolbar, choose the Sales Team serverless SQL pool and execute the query with the Run button.

The COVID data query is shown with tabular results

 

Let's now combine the COVID-19 data with the data from the Azure Cosmos DB analytical store.

A query combining both the COVID-19 data and the Azure Cosmos DB sales data is shown with its results in tabular format

 

At the top of the results view of the query, toggle the View to Chart. Here you can adjust the chart settings and visualize the data in an instant.

The combined query is shown again but this time with the View set to chart. A chart representing the data is shown

 

Clean up

You can remove the items created in this post through the Data Hub. On the Workspace tab of the Data Hub blade, expand the Databases section. Expand the actions menu next to the Sales Team (SQL) serverless SQL database and select Delete.

On the Workspace tab of the Data Hub blade, the Databases section is expanded, and the Sales Team SQL pool is shown with the Delete item highlighted

 

Next, select the Linked tab on the Data Hub blade. Expand the Azure Blob Storage section along with the Sample Datasets item. Open the actions menu next to the bing-covid-19-data folder and select Delete.

On the Linked tab of the Data Hub blade the Azure Blob Storage section and Sample Datasets items are expanded. Next to the bing-covid-19-data folder the actions menu is shown with the Delete item highlighted

 

The last item to remove is the connection to Azure Cosmos DB. Remain on the Linked tab in the Data Hub and expand the Azure Cosmos DB section. Open the actions menu next to the Azure Cosmos DB service and select Delete.

On the Linked tab of the Data Hub blade, the Cosmos DB section is expanded. Next to the Cosmos DB service, the actions menu is shown with the Delete menu item highlighted

 

Conclusion

In this blog post, we learned about the Data Hub in Azure Synapse Analytics. We connected to Azure Cosmos DB through Azure Synapse Link and quickly added COVID-19 data from the Knowledge Center Gallery. We then combined the sales data from Azure Cosmos DB with the COVID-19 data and visualized the results in both tabular and chart format.

Quick get started with Azure Synapse and try this tutorial with these resources:

 

 

 

Updated Dec 09, 2020
Version 1.0
No CommentsBe the first to comment