🎙️ Announcement: Logic Apps connectors in Azure AI Search for Integrated Vectorization

Microsoft

May 20, 2025

We’re excited to announce that Azure Logic Apps connectors are now supported within AI Search as data sources for ingestion into Azure AI Search vector stores. This unlocks the ability to ingest unstructured documents from a variety of systems—including SharePoint, Amazon S3, Dropbox and many more —into your vector index using a low-code experience.

This new capability is powered by Logic Apps templates, which orchestrate the entire ingestion pipeline—from extracting documents to embedding generation and indexing—so you can build Retrieval-Augmented Generation (RAG) applications with ease.

Grounding AI with RAG: Why Document Ingestion Matters

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for building grounded and trustworthy AI systems. Instead of generating answers from the model’s pretraining alone, RAG applications fetch relevant information from external knowledge bases—giving LLMs access to accurate and up-to-date enterprise data.

To power RAG, enterprises need a scalable way to ingest and index documents into a vector store. Whether you're working with policy documents, legal contracts, support tickets, or financial reports, getting this content into a searchable, semantic format is step one.

Simplified Ingestion with Integrated Vectorization

Azure AI Search’s Integrated Vectorization capability automates the process of turning raw content into semantically indexed vectors:

Chunking: Documents are split into meaningful text segments
Embedding: Each chunk is transformed into a vector using an embedding model like text-embedding-3-small or a custom model
Indexing: Vectors and associated metadata are written into a searchable vector store
Projection: Metadata is preserved to enable filtering, ranking, and hybrid queries

This eliminates the need to build or maintain custom pipelines, making it significantly easier to adopt RAG in production environments.

Ingest from Anywhere: Logic Apps + AI Search

With today’s release, we’re extending ingestion to a variety of new data sources by integrating Logic Apps connectors directly with AI Search. This allows you to retrieve unstructured content from enterprise systems and seamlessly ingest it into the vector store.

Here’s how the ingestion process works with Logic Apps:

Connect to Source Systems

Using prebuilt connectors, Logic Apps can fetch content from various data sources including Sharepoint document libraries, messages from Service Bur or Azure Queues, files from OneDrive or SFTP Server and more. You can trigger ingestion on demand or at schedule.

Parse and Chunk Documents

Next, Logic Apps uses built-in AI-powered document parsing actions to extract raw text. This is followed by the “Chunk Document” action, which:

Tokenizes the document based on language model-friendly units
Splits the content into semantically coherent chunks

This ensures optimal chunk size for downstream embedding and retrieval.
Note – Currently we default to a chunk size of 5000 in the workflows created for document ingestion. We’ll be updating the default chunk size to a smaller number in our next release. Meanwhile, you can update it in the workflow if you need a smaller chunk size.

Generate Embeddings with Azure OpenAI

The chunked text is then passed to the Azure OpenAI connector, where the text-embedding-3-small or another configured embedding model is used to generate high-dimensional vector representations. These vectors capture the semantic meaning of the content and are key to enabling accurate retrieval in RAG applications.

Write to Azure AI Search

Finally, the embeddings, along with any relevant metadata (e.g., document title, tags, timestamps), are written into the AI Search index. The index schema is created for you ——and can include fields for filtering, sorting, and semantic ranking.

Logic Apps Templates: Fast Start, Flexible Design

To help you get started, we’ve created Logic Apps templates specifically for RAG ingestion. These templates:

Include all the steps mentioned above
Are customizable if you want to update the default configuration

Whether you’re ingesting thousands of PDFs from SharePoint or syncing files from Amazon S3 bucket, these templates provide a production-grade foundation for building your pipeline.