Data virtualization, now in public preview in Azure SQL Database, enables you to leverage all the power of Transact-SQL (T-SQL) and seamlessly query external data from Azure Data Lake Storage Gen2 or Azure Blob Storage, eliminating the need for data duplication, or ETL processes, allowing for faster analysis and insights.
Integrate external data, such as CSV, Parquet, or Delta files, with your relational database while maintaining the original data format and avoiding unnecessary data movement. Present integrated data to applications and reports as a standard SQL object or through a normal SELECT command. Data Virtualization for Azure SQL Database supports SAS tokens, Managed Identity, and User identity for secure access.
Data Virtualization for Azure SQL Database will introduce and expand support for:
- External File Format - with support for Parquet, CSV, and Delta.
- Support metadata functions and JSON functions.
For enhanced security and flexibility Data Virtualization for Azure SQL Database supports three authentication methods:
- Managed identity (system assigned managed identity and user-assigned managed identity).
- User identity.
Key Benefits
Just like in SQL Server 2022 and Azure SQL Managed Instance the key benefits of Data Virtualization for Azure SQL Database are:
- Seamless Data Access: Query external CSV, Parquet, and Delta Lake tables using T-SQL as if they were native tables within Azure SQL Database. Allowing for off-loading cold data while keeping it easily accessible.
- Enhanced Productivity: Reduce the time and effort required to integrate and analyze data from multiple sources.
- Cost Efficiency: Minimize the need for data replication and storage costs associated with traditional data integration methods.
- Real-Time Insights: Enable real-time data querying and insights without delays caused by data movement or synchronization.
- Security: Leverage SQL Server security features for granular permissions, credential management, and control.
Example
Data Virtualization for Azure SQL Database is based on the same core principles as SQL Server’s PolyBase feature. With support for Azure Data Lake Gen 2, using prefix adls:// and Azure Blob Storage, using prefix abs://.
For the following example we are going to use Azure Open Datasets, more specifically NYC yellow taxi trip records open data set which allows public access. For private data sources customers can leverage multiple authentication methods like SAS Tokens, Managed Identity and User Identity.
-- Create Azure Blob Storage (ABS) data source
CREATE EXTERNAL DATA SOURCE NYCTaxiExternalDataSource WITH (
LOCATION = 'abs://nyctlc@azureopendatastorage.blob.core.windows.net');
-- Using OPENROWSET to read Parquet files from the external data source
SELECT
TOP 10 *
FROM
OPENROWSET(
BULK '/yellow/puYear=*/puMonth=*/*.parquet',
DATA_SOURCE = 'NYCTaxiExternalDataSource',
FORMAT = 'parquet'
) AS filerows;
-- Or using External Tables
CREATE EXTERNAL FILE FORMAT DemoFileFormat WITH (FORMAT_TYPE = PARQUET);
--Create external table
CREATE EXTERNAL TABLE tbl_TaxiRides (
vendorID VARCHAR(100) COLLATE Latin1_General_BIN2,
tpepPickupDateTime DATETIME2,
tpepDropoffDateTime DATETIME2,
passengerCount INT,
tripDistance FLOAT,
puLocationId VARCHAR(8000),
doLocationId VARCHAR(8000),
startLon FLOAT,
startLat FLOAT,
endLon FLOAT,
endLat FLOAT,
rateCodeId SMALLINT,
storeAndFwdFlag VARCHAR(8000),
paymentType VARCHAR(8000),
fareAmount FLOAT,
extra FLOAT,
mtaTax FLOAT,
improvementSurcharge VARCHAR(8000),
tipAmount FLOAT,
tollsAmount FLOAT,
totalAmount FLOAT
) WITH (
LOCATION = 'yellow/puYear=*/puMonth=*/*.parquet',
DATA_SOURCE = NYCTaxiExternalDataSource,
FILE_FORMAT = DemoFileFormat
);
SELECT TOP 10 * FROM tbl_TaxiRides;
You can also use these capabilities in combination with other metadata functions like sp_describe_first_result_set, filename(), and filepath().
Getting started
Data Virtualization for Azure SQL Database is currently available in select regions, with broader availability coming soon across all Azure regions
Data Virtualization for Azure SQL Database is based on the same core principles as SQL Server’s PolyBase feature.
To know more and get started with Data Virtualization for Azure SQL Database.