Modernization Best Practices and Reusable Assets Blog

10 MIN READ

Best practices for modernizing large-scale Text-Search from legacy data systems to Azure SQL

anilkota

Microsoft

Feb 24, 2025

A guide for designing and optimizing large-scale text-search using Full-Text Search on Azure SQL

Introduction

Text-search is an essential aspect of contemporary data management, enabling users to effectively extract information from natural-language text. This technology allows for the retrieval of specific details from a text corpus, providing insights that would be unattainable through conventional search methods.

Most modern databases include text-search capabilities by default, while legacy databases and data systems typically do not. In legacy systems, users may need to configure these features separately or use external services to implement text-search functions. SQL Server natively supports Full-Text search (and has been available from SQL Server 2005 onwards), which removes the requirement for additional component installations and thereby negates the need for any external data movement.

The effectiveness of any feature is contingent upon proper design and implementation. For handling text-search on large volume tables and VLDBs (i.e. TBs or PBs of data), adhering to established techniques and best practices ensures optimal performance and manageability. This article will specifically focus on the performance considerations and optimization techniques required for implementing Full-Text Search functionality on Azure SQL Database. We will discuss a use case that demonstrates how to optimize full-text search innovatively to deliver quick and efficient results.

Modernizing legacy workloads

This article delves into the text search feature in Azure SQL and its detailed implementation for a specific use-case. This is a common scenario that customers may encounter during migration or modernization from legacy mainframe, midrange or on-premises x86 systems. For instance, certain legacy systems employ an OLTP database for standard searches and a separate server specifically catering to the text-search use case. This design necessitates regular data transfers between these two systems, potentially leading to instances where customers cannot access up-to-date information and must accept staleness in their data processing.

There have been cases where text search requests were submitted to backend systems as separate queries, queued to run after business hours in batch jobs, with results being returned to the end-user on the following business day. Such implementations were common in legacy systems and continue to impact businesses today. When modernizing such workloads, customers can adopt and utilize native features like Full-Text Search which enables functional benefits to end-users like providing them with results in real-time and allowing them to focus more on their core business activities.

Below table shows how long these searches take when implemented through different technologies. The table volume considered here is ~5 billion rows

Technology	Preferred Execution Type	Feature utilized for Text-Search	End-to-End Timeframe
Legacy Systems	Nightly Batch	LIKE or External Text Search Server	1 business day
Cloud Database	Micro-batch	LIKE	~ 20 mins
Azure SQL DB	Real-time	Full Text Search	<1 sec

Note: The performance increase shown here is not a direct result of utilizing Full Text Search. Though Full Text Search increases the speed of querying such data, the increase in overall performance is a consequence of the system design and usage of multiple complementary SQL features that fit this use-case.

Scenario

Let's consider a use case where the SQL Database contains a substantially large table named CustomerInfo with billions of rows and terabytes of data. This table includes columns such as Customer Name, Address, State, CustInfo etc. Assume that Custinfo field encompasses free text / notes entered by customer service team for logging. Our use case involves retrieving rows from this Custinfo column based on a partial search criteria provided by the end-user through an application (i.e. web page, mobile app, etc.). For example, the query could be to extract all the rows where Custinfo field has notes stating that ‘mobile’ or ‘address’ is updated. The search functionality should retrieve all corresponding records, whether the match is exact or partial, and sort them according to their proximity to the provided key words.

Implementing a standard search using the LIKE operator on such a large dataset would be inefficient, and constructing the logic for specific word searches on a free-text character column is complex to build and maintain.

Full-Text Search for this use-case involves applying sound design principles, as well as leveraging SQL Server native features such as partitioning and indexed views, which enhance operational efficiency and manageability.

Note: In context of this article, we are using Azure SQL DB Hyperscale, but implementation remains mostly the same for other SQL offerings as well.

Technology Overview

Table Partitioning involves dividing a large table into smaller, more manageable segments called "partitions". Instead of storing all data in one extensive table, the data is separated into several smaller tables, each containing a portion of the data. Detailed information on Azure SQL Hyperscale table partitioning, along with Best Practices & Recommendations, can be found in a two-part technical blog: Part 1 and Part 2.

Indexed Views, also known as Materialized Views, is a SQL Server feature that can improve the performance of queries on large tables. Indexed views store the result set of a query physically on a disk, which allows for faster retrieval. Using Indexed views for Full-Text indexing can enhance the performance and maintainability of large datasets. Details on how to create an indexed view are provided here.

Full Text Search is a robust feature that facilitates complex search on text data stored in SQL tables. This feature supports efficient querying in extensive text fields such as documents, articles, and product descriptions through a specialized index known as a Full Text Index. Full-Text Search uses advanced indexing and query functions to perform complex searches, such as phrase matching and linguistic analysis, with high accuracy and speed. It ranks results based on relevance, ensuring the most pertinent results appear at the top. Search results can be refined with logical operators (AND/OR/NOT) and word weights.

The system supports rich query syntax for natural language queries and is compatible with multiple languages and character sets, making it suitable for global applications. A comprehensive overview and setup instructions for Full-Text Search can be found in the document here.

Key concepts of FULL-TEXT Search are Full Text Catalog and Full-Text Index.

Optimizing Full-Text Search in SQL Server

This blog post explains how to implement Text Search by using a combination of different but complementary built-in features like Full-Text Search, Partitioning, and Indexed views.

In this scenario, we are implementing the following techniques

The base table is large and therefore partitioned into smaller chunks of data
Each Indexed View will be built aligning to a single partition on the base table.
Each Full Text Index will be built aligning to a single Indexed View.
Full Text Catalogs will hold the relevant build information for the Full Text Indexes.

Key Benefits in this architecture for VLDB:

Provides a scalable model for growing data sizes.
Partitioning the base table provides better manageability and optimized performance.
Indexed views aligned with Partitioning Key allows quicker population of Indexed view.
Having Full text catalog aligned with indexed view allows to rebuild / reorganize Full text catalog for specific subset.
Full Text Index aligned with Indexed view provides manageable sized indexes and quicker crawling.
Ability to rebuild / reorganize specific full-text indexes and improved query performance.

Note: Full-Text index population is an asynchronous activity. For a given Index view / table, only one Full-Text index is allowed. With in a full-text index multiple columns can be included.

Below diagram provides a representation of how we can implement Full-Text search on very large tables.

Steps highlighted:

1 Partitioning base table.

2 Indexed views aligned with partition key.

3 Full-Text Catalogs.

4 Full-Text Indexes.

Partitioning

Choosing a partition key that evenly distributes data yields optimal results and is the most important parameter for partitioning. Partition switching/splitting divides a large partition into smaller ones when it grows over time. Here, if the partitioning column is STATE, the base table splits data based on STATE.

Sample Table definition along with partitioning:

--Partition Function Creation CREATE PARTITION FUNCTION [PF_STATE](varchar(2)) 
AS RANGE FOR VALUES 
('AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 
'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 
'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 
'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY')

--Partition Scheme Creation 
CREATE PARTITION SCHEME [PS_STATE] AS PARTITION [PF_STATE] ALL TO ([PRIMARY])

--Table DDL with partition Key. 

CREATE TABLE [dbo].[CUSTOMERINFO]
( [CUSTID] [int] NOT NULL, [CUSTNAME] [varchar](40), 
[ADDRESS] [char](60), [CITY] [char](20), STATE varchar(2), 
[ZIP] [char](10) , [COUNTRY] [char](15), CUSTINFO nvarchar(max) 
PRIMARY KEY CLUSTERED 
( [CUSTID] ASC, STATE ASC )
WITH (STATISTICS_NORECOMPUTE = OFF, 
IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ) 
ON [PS_STATE](STATE);

Indexed view

To improve efficiency, multiple indexed views are created and aligned with the partition key column. Indexed views are schema-bound and store the results of the view’s query, so the results are precomputed and stored on disk. For large tables, creating an indexed view that aligns with the partitioned column helps to quickly populate the indexed view and aids in updating the views when the base table changes. Multiple indexed views enable the creation of multiple full-text indexes, optimizing the search process.

Note: Indexed view should have an unique index so this can be utilized during full-text index creation.

Sample Indexed views along with clustered index:

-- Create Indexed view for STATE AL and clustered index which is subsequently used for FullText Index. CREATE VIEW [dbo].[v_CUSTOMERINFO_AL] WITH SCHEMABINDING AS select CUSTID ,CUSTNAME,ADDRESS,CITY,STATE,ZIP,COUNTRY,CUSTINFO From dbo.CUSTOMERINFO where STATE = 'AL' ; GO CREATE UNIQUE CLUSTERED INDEX [CX_CUSTOMERINFO_AL] ON [dbo].[v_CUSTOMERINFO_AL] ( CUSTID ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] GO -- Create Indexed view for STATE AK and clustered index which is subsequently used for FullText Index. CREATE VIEW [dbo].[v_CUSTOMERINFO_AK] WITH SCHEMABINDING AS select CUSTID ,CUSTNAME,ADDRESS,CITY,STATE,ZIP,COUNTRY,CUSTINFO From dbo.CUSTOMERINFO where STATE = 'AK' ; GO CREATE UNIQUE CLUSTERED INDEX [CX_CUSTOMERINFO_AK] ON [dbo].[v_CUSTOMERINFO_AK] ( CUSTID ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] GO

Full-Text Search

Full-Text catalog

A Full-Text catalog is a logical container for full-text indexes, independent of any table. One catalog can be used for indexes on different tables. For databases with large or growing text data, use multiple catalogs.

Rebuild the Full-Text catalog after significant data changes (bulk inserts, large updates) to improve performance and reduce index fragmentation. Include this rebuild in regular database maintenance activities.

Sample Full Text Catalog:

--FullText Catalog for STATE AL and AK. CREATE FULLTEXT CATALOG CT_CUSTOMERINFO_AL AS DEFAULT; CREATE FULLTEXT CATALOG CT_CUSTOMERINFO_AK AS DEFAULT;

To change properties of Full-Text Catalog like REORGANIZE and REBUILD provided here

Full-Text Index

A table or indexed view can only have one full-text index, which can cover multiple columns. These indexes are linked to a unique index of the base table and attached to a full-text catalog. They can be created on char-based data types and XML. The process of creating and maintaining full-text indexes is called population or crawl, which is asynchronous and updates with changes to the underlying table or view and can be maintained with the "Change_Tracking" property.

For large tables, aligning full-text indexes with indexed views allows for multiple manageable indexes, improving performance and speeding up data updates. Ensure high I/O file groups are allocated for full-text indexes in SQL Server installations.

Fulltext index comes with an option of adding & removing columns as well as enabling or disabling it as needed.

Note: During Bulk data load (or other unique scenarios), FULL-TEXT index population (i.e. Change_Tracking) can be set to Manual / OFF and post load activity, Populate Full-Text Index and reset Change_Tracking to AUTO.

Sample Full Text Index:

--Full Text Index aligned to Clustered Index on Indexed view and Full-Text Catalog. CREATE FULLTEXT INDEX ON [dbo].[v_CUSTOMERINFO_AL](CUSTINFO) KEY INDEX [CX_CUSTOMERINFO_AL] ON CT_CUSTOMERINFO_AL WITH CHANGE_TRACKING = AUTO; CREATE FULLTEXT INDEX ON [dbo].[v_CUSTOMERINFO_AK](CUSTINFO) KEY INDEX [CX_CUSTOMERINFO_AK] ON CT_CUSTOMERINFO_AK WITH CHANGE_TRACKING = AUTO

Full-Text Index properties that can be changed are listed here.

Sample queries using Full-Text search

Full Text samples with contains and more examples are here

Sample 1: Contains with OR

--To Identify the CUSTINFO column which has words ‘Mobile’ Or ‘Address’. select * from dbo.[v_CUSTOMERINFO_AL] where contains (CUSTINFO,'Mobile or Address')

Sample 2: Contains with AND

--To Identify the CUSTINFO column which has words ‘Mobile’ AND ‘Address’. select CUSTID,CUSTNAME,CUSTINFO from dbo.[v_CUSTOMERINFO_AL] where contains (CUSTINFO,'Mobile and Address')

Details about Ranking are in Full-Text search with Ranking

Note: While SQL optimizer selects best execution plan for a query, for very specific and unique scenarios Hints also can be used to optimize further.

In this specific use-case, Indexed View Matching functionality can be implemented using the NOEXPAND Table Hint which drastically improves performance. By adding this hint, the query optimizer uses the index on the view if a query contains references to columns that are present both in an indexed view and base tables, and the query optimizer determines that using the indexed view provides the best method for executing the query.

Please perform due diligence prior to using Hints.

Full-Text Index Maintenance and validation

Types of Full-Text Index population.

How to Improve the Performance of Full-Text indexes

Sample Queries that helps in Full Text Maintenance and validation:

--How to check Full Text index population status SELECT object_name(object_id) as tablename ,change_tracking_state_desc, has_crawl_completed,crawl_type_desc,crawl_start_date,crawl_end_date FROM sys.fulltext_indexes where object_name(object_id) like 'v_CUSTOMERINFO%'

Additional Information

If more advanced AI based search capabilities are needed, Azure provides solutions like Azure AI Search. These are different offerings from Azure SQL and must be deployed separately and integrated appropriately.
If the use-case involves Vector search, Azure SQL DB can work with vectors and details are here.
As the SQL Server Full Text functionality keeps advancing, there are specific features that get deprecated. These can be found here. When building new applications using Full Text, we recommend that this list is being considered for future proofing your design.

Feedback and suggestions

If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support!

Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.

Updated Apr 18, 2025

Version 5.0

database platform cse

database platform cse – db2

database platform cse – sql

anilkota

Microsoft

Joined September 02, 2022

View Profile

Modernization Best Practices and Reusable Assets Blog

Follow this blog board to get notified when there's new activity