Blog Post

Educator Developer Blog
5 MIN READ

[pt1] Choosing the right Data Storage Source (Generally available) for Azure AI Search

kevin_comba's avatar
kevin_comba
Iron Contributor
Feb 06, 2025

This guide provides a comprehensive look at data sources for integrating with Azure AI Search, specifically focusing on generally available options. We break down the available connectors and categorize them into three distinct sections:

  1. Generally Available Data Sources by Azure AI Search
  2. Preview Data Sources by Azure AI Search
  3. Data Sources from Our Partners

In This Article:

When building AI-powered search solutions using Azure AI Search, selecting the right data source is crucial for optimizing efficiency, scalability, and overall search performance. Azure AI Search provides indexers that can pull data from various storage sources, transforming and enriching it for a better search experience. This article explores the key data sources available and offers best practices to help you choose the right one based on your use case.

  • Generally Available Data Sources by Azure AI Search: These indexers are designed for production-ready, generally available data connectors that pull data from other Azure services.
  • Preview Data Sources by Azure AI Search: If you're looking to explore the newest features, you can sign up for preview data sources and get early access to future capabilities.
  • Data Sources from Our Partners: Additionally, third-party partners provide useful data connectors for integration into Azure AI Search. Partners like BA Insight and Accenture offer specialized solutions for enterprise needs.

Generally Available Data Sources by Azure AI Search

When building AI-powered search solutions using Azure AI Search, choosing the appropriate data source is crucial for efficiency, scalability, and search performance. Azure AI Search provides indexers that pull data from various storage sources, transforming and enriching it for optimized search experiences.

This guide explores when to consider each data source and provides best practices for integrating them into Azure AI Search.

Choosing the Right Data Source for Your Use Case

Data Source

Best For

Key Benefits

Change Detection

Supported Content

Azure Blob Storage

Unstructured and semi-structured data

Supports metadata extraction, AI enrichment, and various formats

Auto-detects changes

PDFs, Office files, JSON, CSV, images, etc.

Azure Cosmos DB for NoSQL

High-volume JSON-based transactional data

Real-time indexing, built-in change tracking

_ts timestamp-based tracking

JSON documents, structured NoSQL data

Azure SQL Database

Structured relational data

Uses SQL queries, supports incremental indexing

SQL Change Tracking or High-Water Mark

Tables, Views, JSON-like data

Azure Table Storage

Key-value store, semi-structured data

Simple schema, high scalability

Manual tracking via custom metadata

Tabular, JSON-like data

Azure Data Lake Storage Gen2

Hierarchical, large-scale datasets

AI enrichment, hierarchical folder indexing

Auto-detects changes

Large CSVs, JSON, Office files, PDFs, ZIPs

 

1. Azure Blob Storage – For Unstructured & Semi-structured Data

 When to Use Azure Blob Storage

 Configuration Tips

  • Use AI skillsets to extract text from images, convert documents, and enhance searchability.
  • Enable content parsing modes to process JSON, Markdown, or other text-based formats.
  • Use inclusion/exclusion rules to avoid indexing non-searchable blobs like images and audio.

Change Detection

  • Auto-detection based on metadata_storage_last_modified timestamp.
  • Supports soft delete detection via metadata properties (AzureSearch_Skip, AzureSearch_SkipContent).

2. Azure Cosmos DB for NoSQL – For High-Volume JSON-Based Applications

When to Use Azure Cosmos DB

  • You have high-velocity, JSON-based structured data.
  • Your app requires low-latency, real-time search (e.g., e-commerce, IoT logs, user-generated content).
  • You need incremental indexing based on the _ts (timestamp) property.
  • Your data has complex nested structures that require SQL-like queries for transformation.

Configuration Tips

  • Use custom queries to flatten JSON structures for indexing.
  • Enable soft delete tracking using a Boolean flag (IsDeleted field).
  • Use Azure SDKs or REST APIs to automate index refresh.

Change Detection

 

3. Azure SQL Database – For Structured Relational Data

When to Use Azure SQL Database

  • You manage relational data in tables or views.
  • You need SQL-based queries to shape data for indexing.
  • Your data changes frequently, and you require incremental indexing.
  • You need structured full-text search with filtering, faceting, and ranking.

Configuration Tips

  • Use SQL views if your data spans multiple tables.
  • Enable SQL Change Tracking for incremental indexing.
  • Optimize indexing performance by reducing unnecessary fields.

Change Detection

  • Uses SQL Change Tracking or High-Water Mark (timestamp-based detection).
  • Supports soft delete tracking via a Boolean flag (IsDeleted field).

4. Azure Table Storage – For Scalable, Key-Value Semi-Structured Data

When to Use Azure Table Storage

  • You have large-scale key-value data (e.g., logs, IoT telemetry, audit records).
  • You need a cost-effective way to store and index semi-structured data.
  • Your schema is flexible but needs basic search capabilities.

Configuration Tips

  • Define explicit field mappings to match Table Storage schema with the search index.
  • Use PartitionKey filtering in queries to optimize performance.
  • Set up custom metadata flags for deletion tracking.

Change Detection

  • No built-in change tracking.
  • Use PartitionKey-based queries for optimized incremental indexing.
  • Supports soft delete tracking via a custom IsDeleted Boolean field.

5. Azure Data Lake Storage Gen2 – For Hierarchical, Large-Scale Data

When to Use Azure Data Lake Storage

  • You manage big data with a hierarchical folder structure.
  • You need incremental indexing for text-based and semi-structured files (CSV, JSON, XML, PDFs).
  • Your search requirements include AI enrichment (e.g., extracting text from scanned PDFs).
  • Your data sources include large datasets that require indexing for analytics or retrieval.

Configuration Tips

  • Use folder-based queries to include/exclude specific subdirectories.
  • Enable metadata extraction to index file properties.
  • Configure data parsing modes (e.g., CSV row-to-document parsing).

Change Detection

Key Takeaways

  1. For unstructured dataAzure Blob Storage with AI enrichment.
  2. For NoSQL JSON dataAzure Cosmos DB with real-time search.
  3. For relational dataAzure SQL Database with structured search.
  4. For key-value storageAzure Table Storage for scalability.
  5. For hierarchical big dataAzure Data Lake Storage Gen2 for large-scale search.

Final Thoughts

Selecting the right data source for Azure AI Search depends on data type, query patterns, and indexing needs. Azure AI Search provides indexers that automate data ingestion, while AI skillsets enhance searchability through OCR, NLP, and metadata extraction.

 

  • Need real-time updates? → Azure Cosmos DB
  • Need structured SQL-based queries? → Azure SQL Database
  • Need text extraction from documents? → Azure Blob Storage
  • Need a lightweight key-value store? → Azure Table Storage
  • Need big data search with hierarchical structure? → Azure Data Lake Storage Gen2

 

By choosing the right data source, you can maximize performance, reduce costs, and optimize search capabilities for your AI-powered applications.

What’s Next?

Updated Jan 29, 2025
Version 1.0
No CommentsBe the first to comment