Blog Post

Azure SQL Blog
4 MIN READ

General Availability of XML compression for Azure SQL Database and Managed Instance

Balmukund-Lakhani's avatar
Aug 02, 2023

In the ever-evolving world of database management, optimizing storage and improving performance are constant pursuits. One significant enhancement in this space is the introduction of XML compression in the MSSQL database engine. This feature is included in SQL Server 2022, and today we are announcing its general availability (GA) in Azure SQL Database and Azure SQL Managed Instance. This new feature brings with it an array of benefits, making data storage more efficient and queries faster. In this blog post, we will delve into the technical details of this feature and explore a practical example to showcase its potential.

 

Understanding XML compression

XML data has become ubiquitous in modern application development, as it offers a versatile way to structure and store information. However, XML documents can be quite verbose, leading to larger storage and memory requirements and slower query performance.

 

The XML compression feature addresses these concerns by providing an automatic mechanism to compress XML data. It uses a compression algorithm that significantly reduces the size of XML data. By doing so, it optimizes storage usage, reduces storage IO operations, and speeds up XML-related queries.

 

Advantages of XML compression Feature

Storage Savings: XML compression in Azure SQL Database and Managed Instance enables substantial storage savings, especially for databases that store large amounts of XML data.

 

Query Performance: With compressed XML data, the execution of XML queries becomes faster due to the reduced physical size of the data. Smaller data size means less time spent on data retrieval and, in turn, improved query performance.

 

Transparent Compression: One of the most significant advantages of this feature is its transparency. Users do not need to modify their existing queries or applications to utilize XML compression. The database engine automatically handles the compression and decompression processes, making it seamless and hassle-free.

 

Indexing and Statistics: XML compression extends its benefits to indexing and statistics as well. Compressed XML data can lead to smaller indexes, thereby reducing the time it takes to build and maintain them.

 

Example

To demonstrate the use of XML compression, you can use the script available at the bottom of this article. Here is the quick summary of results with table level compression and XML index level compression.

 

 

Size before compression

Size after compression

% Compression

Clustered index

872 KB

216 KB

75

Primary XML index

720 KB

504 KB

30

 

To get an estimation about object size after compression, sp_estimate_data_compression_savings can use used which supports parameter @xml_compression.

 

Conclusion

To conclude, the XML compression feature in Azure SQL Database and Managed Instance is a powerful addition that brings efficiency and performance improvements to XML data management. By automatically compressing XML data and providing storage savings, it enhances the overall database experience without requiring any changes to existing applications. As organizations strive for optimized cloud-based solutions, this feature proves to be a valuable asset in the modern data landscape.

 

Ready to experience the benefits of XML compression? Try it out now and share your thoughts about the XML compression feature in the comments below.

 

Demo script

Below script is based on table present in AdventureWorksLT database. More information about the database can be found at AdventureWorks sample databases - SQL Server | Microsoft Learn

 

 

 

-- Start Script

-- drop demo schema and table, if exists
DROP TABLE IF EXISTS demo.ProductModelXMLDemo;
DROP SCHEMA IF EXISTS demo;
GO
-- create demo schema
CREATE SCHEMA demo;
GO
-- populate into a different table.
SELECT *
INTO demo.ProductModelXMLDemo
FROM SalesLT.ProductModel;

-- add primary key to demo table.
ALTER TABLE demo.ProductModelXMLDemo 
ADD PRIMARY KEY (ProductModelID);

-- only few rows have XML data
SELECT *
FROM demo.ProductModelXMLDemo
WHERE CatalogDescription IS NOT NULL;

-- how many distinct values?
SELECT COUNT(*) 'number_of_rows'
       ,CONVERT(VARCHAR(8000), CatalogDescription) AS 'XML_text'
FROM demo.ProductModelXMLDemo
GROUP BY CONVERT(VARCHAR(8000), CatalogDescription);

-- since there are only few rows having XML data, below steps would duplicate them to more rows.

-- create a temporary table to store non-NULL CatalogDescription values with identity
CREATE TABLE #NonNullDescriptions (
    IdentityInt INT IDENTITY(1,1) NOT NULL ,
    CatalogDescription XML NOT NULL
);

-- insert non-NULL values into the temporary table
INSERT INTO #NonNullDescriptions (CatalogDescription)
SELECT  CatalogDescription
FROM demo.ProductModelXMLDemo
WHERE CatalogDescription IS NOT NULL;

-- update NULL rows with non-NULL values
UPDATE p1
SET p1.CatalogDescription = nd.CatalogDescription
FROM demo.ProductModelXMLDemo p1, #NonNullDescriptions nd
WHERE p1.CatalogDescription IS NULL
AND p1.ProductModelID%6 = nd.IdentityInt;

-- drop the temporary table
DROP TABLE #NonNullDescriptions;

-- now we have table populated XML data in more rows.
-- look at space used
EXEC sp_spaceused 'demo.ProductModelXMLDemo';
--  name                  rows  reserved  data    index_size  unused
--  -------------------   ----- --------  ----    ----------  ------
--  ProductModelXMLDemo   128   976 KB    872 KB  16 KB       88 KB

-- create primary XML index.
CREATE PRIMARY XML INDEX idx_xml_catalog_desc ON demo.ProductModelXMLDemo (CatalogDescription);
GO
EXEC sp_spaceused 'demo.ProductModelXMLDemo';
--    name                  rows  reserved  data    index_size  unused
--    -------------------   ----- --------  ----    ----------  ------
--    ProductModelXMLDemo   128   1752 KB   872 KB  720 KB      160 KB

--=======================================================================================
-- Scenario:1 - Uncompressed clustered index and compressed XML index
-- We can enable XML compression on index by rebuilding
ALTER INDEX idx_xml_catalog_desc ON demo.ProductModelXMLDemo REBUILD WITH (XML_COMPRESSION = ON);
GO
EXEC sp_spaceused 'demo.ProductModelXMLDemo';
--    name                  rows  reserved  data    index_size  unused
--    -------------------   ----- --------  ----    ----------  ------
--    ProductModelXMLDemo   128   1560 KB   872 KB  504 KB      184 KB
--====================================================================================
-- Scenario:2 Compressed table and uncompresses index
-- Enable XML compression on table and rebuild. 
-- Since we created compressed XML index in last step, we will drop and create without compression option.
ALTER TABLE demo.ProductModelXMLDemo REBUILD WITH (XML_COMPRESSION = ON);
GO
DROP INDEX IF EXISTS idx_xml_catalog_desc ON demo.ProductModelXMLDemo;
GO
CREATE PRIMARY XML INDEX idx_xml_catalog_desc ON demo.ProductModelXMLDemo (CatalogDescription);
GO
EXEC sp_spaceused 'demo.ProductModelXMLDemo';
--    name                  rows  reserved  data    index_size  unused
--    -------------------   ----- --------  ----    ----------  ------
--    ProductModelXMLDemo   128   1104 KB   216 KB  720 KB      168 KB
--====================================================================================
-- Scenario:3 - Compressed table and compressed index
-- Enable XML compression on index and rebuild. Table was already compressed in last step
ALTER INDEX idx_xml_catalog_desc ON demo.ProductModelXMLDemo REBUILD WITH (XML_COMPRESSION = ON);
GO
EXEC sp_spaceused 'demo.ProductModelXMLDemo';
--    name                  rows  reserved  data    index_size  unused
--    -------------------   ----- --------  ----    ----------  ------
--    ProductModelXMLDemo   128   912 KB    216 KB  504 KB      192 KB

-- End Script

 

 

 

Updated Aug 02, 2023
Version 2.0
  • martin_smith's avatar
    martin_smith
    Copper Contributor

    Is there any information available about what the algorithm does?

     

    How does it compare with using the "COMPRESS" function to manually compress the data?

     

    Other than storage/IO considerations are there any other performance differences for using this when calling methods on the compressed XML vs the uncompressed?

     

     

  • Deleted - As I mentioned in the blog "This feature is included in SQL Server 2022, and today we are announcing its general availability (GA) in Azure SQL Database and Azure SQL Managed Instance."

     

    Are you getting some error while using it? please contact me directly on LinkedIn. 

     

  • Vitaly_Bruk's avatar
    Vitaly_Bruk
    Copper Contributor

    Is there any chance for online create/rebuild with XML_COMPRESSION = ON?

    It is almost impossible to create or rebuild an index without online option, due to the large size of XML columns. In most cases, such columns are huge, and locks, and lock escalation will take down each environment.

  • Balmukund-Lakhani Will this feature become also available to the on-prem version of SQL-Server 2022?
    This would make sense for code compatibility across different services and platforms.
    What will be the release vehicle?
    CU8 or beyond, or a new SQL-Server Edition like SQL-Server 2024?