Educator Developer Blog

5 MIN READ

[pt1] Choosing the right Data Storage Source (Generally available) for Azure AI Search

Iron Contributor

Feb 06, 2025

This guide provides a comprehensive look at data sources for integrating with Azure AI Search, specifically focusing on generally available options. We break down the available connectors and categorize them into three distinct sections:

When building AI-powered search solutions using Azure AI Search, selecting the right data source is crucial for optimizing efficiency, scalability, and overall search performance. Azure AI Search provides indexers that can pull data from various storage sources, transforming and enriching it for a better search experience. This article explores the key data sources available and offers best practices to help you choose the right one based on your use case.

Generally Available Data Sources by Azure AI Search: These indexers are designed for production-ready, generally available data connectors that pull data from other Azure services.
Preview Data Sources by Azure AI Search: If you're looking to explore the newest features, you can sign up for preview data sources and get early access to future capabilities.
Data Sources from Our Partners: Additionally, third-party partners provide useful data connectors for integration into Azure AI Search. Partners like BA Insight and Accenture offer specialized solutions for enterprise needs.

Generally Available Data Sources by Azure AI Search

When building AI-powered search solutions using Azure AI Search, choosing the appropriate data source is crucial for efficiency, scalability, and search performance. Azure AI Search provides indexers that pull data from various storage sources, transforming and enriching it for optimized search experiences.

This guide explores when to consider each data source and provides best practices for integrating them into Azure AI Search.

Choosing the Right Data Source for Your Use Case

Data Source	Best For	Key Benefits	Change Detection	Supported Content
Azure Blob Storage	Unstructured and semi-structured data	Supports metadata extraction, AI enrichment, and various formats	Auto-detects changes	PDFs, Office files, JSON, CSV, images, etc.
Azure Cosmos DB for NoSQL	High-volume JSON-based transactional data	Real-time indexing, built-in change tracking	_ts timestamp-based tracking	JSON documents, structured NoSQL data
Azure SQL Database	Structured relational data	Uses SQL queries, supports incremental indexing	SQL Change Tracking or High-Water Mark	Tables, Views, JSON-like data
Azure Table Storage	Key-value store, semi-structured data	Simple schema, high scalability	Manual tracking via custom metadata	Tabular, JSON-like data
Azure Data Lake Storage Gen2	Hierarchical, large-scale datasets	AI enrichment, hierarchical folder indexing	Auto-detects changes	Large CSVs, JSON, Office files, PDFs, ZIPs

1. Azure Blob Storage – For Unstructured & Semi-structured Data

When to Use Azure Blob Storage

Your data consists of documents, images, PDFs, Office files, HTML, XML, JSON, and CSVs.
You need AI enrichment for extracting text from images, scanned PDFs, or multi-format content.
You want metadata extraction for indexing and filtering content (e.g., file size, content type, last modified).
You need incremental indexing to detect new or modified files automatically.

Configuration Tips

Use AI skillsets to extract text from images, convert documents, and enhance searchability.
Enable content parsing modes to process JSON, Markdown, or other text-based formats.
Use inclusion/exclusion rules to avoid indexing non-searchable blobs like images and audio.

Change Detection

Auto-detection based on metadata_storage_last_modified timestamp.
Supports soft delete detection via metadata properties (AzureSearch_Skip, AzureSearch_SkipContent).

2. Azure Cosmos DB for NoSQL – For High-Volume JSON-Based Applications

When to Use Azure Cosmos DB

You have high-velocity, JSON-based structured data.
Your app requires low-latency, real-time search (e.g., e-commerce, IoT logs, user-generated content).
You need incremental indexing based on the _ts (timestamp) property.
Your data has complex nested structures that require SQL-like queries for transformation.

Configuration Tips

Use custom queries to flatten JSON structures for indexing.
Enable soft delete tracking using a Boolean flag (IsDeleted field).
Use Azure SDKs or REST APIs to automate index refresh.

Change Detection

Uses the _ts field for automatic change tracking.
Supports soft delete tracking with a custom Boolean field.

3. Azure SQL Database – For Structured Relational Data

When to Use Azure SQL Database

You manage relational data in tables or views.
You need SQL-based queries to shape data for indexing.
Your data changes frequently, and you require incremental indexing.
You need structured full-text search with filtering, faceting, and ranking.

Configuration Tips

Use SQL views if your data spans multiple tables.
Enable SQL Change Tracking for incremental indexing.
Optimize indexing performance by reducing unnecessary fields.

Change Detection

Uses SQL Change Tracking or High-Water Mark (timestamp-based detection).
Supports soft delete tracking via a Boolean flag (IsDeleted field).

4. Azure Table Storage – For Scalable, Key-Value Semi-Structured Data

When to Use Azure Table Storage

You have large-scale key-value data (e.g., logs, IoT telemetry, audit records).
You need a cost-effective way to store and index semi-structured data.
Your schema is flexible but needs basic search capabilities.

Configuration Tips

Define explicit field mappings to match Table Storage schema with the search index.
Use PartitionKey filtering in queries to optimize performance.
Set up custom metadata flags for deletion tracking.

Change Detection

No built-in change tracking.
Use PartitionKey-based queries for optimized incremental indexing.
Supports soft delete tracking via a custom IsDeleted Boolean field.

5. Azure Data Lake Storage Gen2 – For Hierarchical, Large-Scale Data

When to Use Azure Data Lake Storage

You manage big data with a hierarchical folder structure.
You need incremental indexing for text-based and semi-structured files (CSV, JSON, XML, PDFs).
Your search requirements include AI enrichment (e.g., extracting text from scanned PDFs).
Your data sources include large datasets that require indexing for analytics or retrieval.

Configuration Tips

Use folder-based queries to include/exclude specific subdirectories.
Enable metadata extraction to index file properties.
Configure data parsing modes (e.g., CSV row-to-document parsing).

Change Detection

Uses metadata_storage_last_modified for automatic change tracking.
Supports soft delete detection using metadata properties.

Key Takeaways

For unstructured data → Azure Blob Storage with AI enrichment.
For NoSQL JSON data → Azure Cosmos DB with real-time search.
For relational data → Azure SQL Database with structured search.
For key-value storage → Azure Table Storage for scalability.
For hierarchical big data → Azure Data Lake Storage Gen2 for large-scale search.

Final Thoughts

Selecting the right data source for Azure AI Search depends on data type, query patterns, and indexing needs. Azure AI Search provides indexers that automate data ingestion, while AI skillsets enhance searchability through OCR, NLP, and metadata extraction.

Need real-time updates? → Azure Cosmos DB
Need structured SQL-based queries? → Azure SQL Database
Need text extraction from documents? → Azure Blob Storage
Need a lightweight key-value store? → Azure Table Storage
Need big data search with hierarchical structure? → Azure Data Lake Storage Gen2

By choosing the right data source, you can maximize performance, reduce costs, and optimize search capabilities for your AI-powered applications.

What’s Next?

Explore Azure AI Search Indexers
Try out the Import & Vectorize Data Wizard in the Azure portal
Set up AI skillsets to enhance your search solution with OCR, NLP, and embeddings.

Updated Jan 29, 2025

Version 1.0

Iron Contributor

Joined November 07, 2022

View Profile