This guide provides a comprehensive look at data sources for integrating with Azure AI Search, specifically focusing on generally available options. We break down the available connectors and categorize them into three distinct sections:
- Generally Available Data Sources by Azure AI Search
- Preview Data Sources by Azure AI Search
- Data Sources from Our Partners
In This Article:
When building AI-powered search solutions using Azure AI Search, selecting the right data source is crucial for optimizing efficiency, scalability, and overall search performance. Azure AI Search provides indexers that can pull data from various storage sources, transforming and enriching it for a better search experience. This article explores the key data sources available and offers best practices to help you choose the right one based on your use case.
- Generally Available Data Sources by Azure AI Search: These indexers are designed for production-ready, generally available data connectors that pull data from other Azure services.
- Preview Data Sources by Azure AI Search: If you're looking to explore the newest features, you can sign up for preview data sources and get early access to future capabilities.
- Data Sources from Our Partners: Additionally, third-party partners provide useful data connectors for integration into Azure AI Search. Partners like BA Insight and Accenture offer specialized solutions for enterprise needs.
Generally Available Data Sources by Azure AI Search
When building AI-powered search solutions using Azure AI Search, choosing the appropriate data source is crucial for efficiency, scalability, and search performance. Azure AI Search provides indexers that pull data from various storage sources, transforming and enriching it for optimized search experiences.
This guide explores when to consider each data source and provides best practices for integrating them into Azure AI Search.
Choosing the Right Data Source for Your Use Case
| Data Source | Best For | Key Benefits | Change Detection | Supported Content | 
| Azure Blob Storage | Unstructured and semi-structured data | Supports metadata extraction, AI enrichment, and various formats | Auto-detects changes | PDFs, Office files, JSON, CSV, images, etc. | 
| Azure Cosmos DB for NoSQL | High-volume JSON-based transactional data | Real-time indexing, built-in change tracking | _ts timestamp-based tracking | JSON documents, structured NoSQL data | 
| Azure SQL Database | Structured relational data | Uses SQL queries, supports incremental indexing | SQL Change Tracking or High-Water Mark | Tables, Views, JSON-like data | 
| Azure Table Storage | Key-value store, semi-structured data | Simple schema, high scalability | Manual tracking via custom metadata | Tabular, JSON-like data | 
| Azure Data Lake Storage Gen2 | Hierarchical, large-scale datasets | AI enrichment, hierarchical folder indexing | Auto-detects changes | Large CSVs, JSON, Office files, PDFs, ZIPs | 
1. Azure Blob Storage – For Unstructured & Semi-structured Data
When to Use Azure Blob Storage
- Your data consists of documents, images, PDFs, Office files, HTML, XML, JSON, and CSVs.
- You need AI enrichment for extracting text from images, scanned PDFs, or multi-format content.
- You want metadata extraction for indexing and filtering content (e.g., file size, content type, last modified).
- You need incremental indexing to detect new or modified files automatically.
Configuration Tips
- Use AI skillsets to extract text from images, convert documents, and enhance searchability.
- Enable content parsing modes to process JSON, Markdown, or other text-based formats.
- Use inclusion/exclusion rules to avoid indexing non-searchable blobs like images and audio.
Change Detection
- Auto-detection based on metadata_storage_last_modified timestamp.
- Supports soft delete detection via metadata properties (AzureSearch_Skip, AzureSearch_SkipContent).
2. Azure Cosmos DB for NoSQL – For High-Volume JSON-Based Applications
When to Use Azure Cosmos DB
- You have high-velocity, JSON-based structured data.
- Your app requires low-latency, real-time search (e.g., e-commerce, IoT logs, user-generated content).
- You need incremental indexing based on the _ts (timestamp) property.
- Your data has complex nested structures that require SQL-like queries for transformation.
Configuration Tips
- Use custom queries to flatten JSON structures for indexing.
- Enable soft delete tracking using a Boolean flag (IsDeleted field).
- Use Azure SDKs or REST APIs to automate index refresh.
Change Detection
- Uses the _ts field for automatic change tracking.
- Supports soft delete tracking with a custom Boolean field.
3. Azure SQL Database – For Structured Relational Data
When to Use Azure SQL Database
- You manage relational data in tables or views.
- You need SQL-based queries to shape data for indexing.
- Your data changes frequently, and you require incremental indexing.
- You need structured full-text search with filtering, faceting, and ranking.
Configuration Tips
- Use SQL views if your data spans multiple tables.
- Enable SQL Change Tracking for incremental indexing.
- Optimize indexing performance by reducing unnecessary fields.
Change Detection
- Uses SQL Change Tracking or High-Water Mark (timestamp-based detection).
- Supports soft delete tracking via a Boolean flag (IsDeleted field).
4. Azure Table Storage – For Scalable, Key-Value Semi-Structured Data
When to Use Azure Table Storage
- You have large-scale key-value data (e.g., logs, IoT telemetry, audit records).
- You need a cost-effective way to store and index semi-structured data.
- Your schema is flexible but needs basic search capabilities.
Configuration Tips
- Define explicit field mappings to match Table Storage schema with the search index.
- Use PartitionKey filtering in queries to optimize performance.
- Set up custom metadata flags for deletion tracking.
Change Detection
- No built-in change tracking.
- Use PartitionKey-based queries for optimized incremental indexing.
- Supports soft delete tracking via a custom IsDeleted Boolean field.
5. Azure Data Lake Storage Gen2 – For Hierarchical, Large-Scale Data
When to Use Azure Data Lake Storage
- You manage big data with a hierarchical folder structure.
- You need incremental indexing for text-based and semi-structured files (CSV, JSON, XML, PDFs).
- Your search requirements include AI enrichment (e.g., extracting text from scanned PDFs).
- Your data sources include large datasets that require indexing for analytics or retrieval.
Configuration Tips
- Use folder-based queries to include/exclude specific subdirectories.
- Enable metadata extraction to index file properties.
- Configure data parsing modes (e.g., CSV row-to-document parsing).
Change Detection
- Uses metadata_storage_last_modified for automatic change tracking.
- Supports soft delete detection using metadata properties.
Key Takeaways
- For unstructured data → Azure Blob Storage with AI enrichment.
- For NoSQL JSON data → Azure Cosmos DB with real-time search.
- For relational data → Azure SQL Database with structured search.
- For key-value storage → Azure Table Storage for scalability.
- For hierarchical big data → Azure Data Lake Storage Gen2 for large-scale search.
Final Thoughts
Selecting the right data source for Azure AI Search depends on data type, query patterns, and indexing needs. Azure AI Search provides indexers that automate data ingestion, while AI skillsets enhance searchability through OCR, NLP, and metadata extraction.
- Need real-time updates? → Azure Cosmos DB
- Need structured SQL-based queries? → Azure SQL Database
- Need text extraction from documents? → Azure Blob Storage
- Need a lightweight key-value store? → Azure Table Storage
- Need big data search with hierarchical structure? → Azure Data Lake Storage Gen2
By choosing the right data source, you can maximize performance, reduce costs, and optimize search capabilities for your AI-powered applications.
What’s Next?
- Explore Azure AI Search Indexers
- Try out the Import & Vectorize Data Wizard in the Azure portal
- Set up AI skillsets to enhance your search solution with OCR, NLP, and embeddings.