We’re excited to announce that Azure Logic Apps connectors are now supported within AI Search as data sources for ingestion into Azure AI Search vector stores. This unlocks the ability to ingest unstructured documents from a variety of systems—including SharePoint, Amazon S3, Dropbox and many more —into your vector index using a low-code experience.
This new capability is powered by Logic Apps templates, which orchestrate the entire ingestion pipeline—from extracting documents to embedding generation and indexing—so you can build Retrieval-Augmented Generation (RAG) applications with ease.
Grounding AI with RAG: Why Document Ingestion Matters
Retrieval-Augmented Generation (RAG) has become a cornerstone technique for building grounded and trustworthy AI systems. Instead of generating answers from the model’s pretraining alone, RAG applications fetch relevant information from external knowledge bases—giving LLMs access to accurate and up-to-date enterprise data.
To power RAG, enterprises need a scalable way to ingest and index documents into a vector store. Whether you're working with policy documents, legal contracts, support tickets, or financial reports, getting this content into a searchable, semantic format is step one.
Simplified Ingestion with Integrated Vectorization
Azure AI Search’s Integrated Vectorization capability automates the process of turning raw content into semantically indexed vectors:
- Chunking: Documents are split into meaningful text segments
- Embedding: Each chunk is transformed into a vector using an embedding model like text-embedding-3-small or a custom model
- Indexing: Vectors and associated metadata are written into a searchable vector store
- Projection: Metadata is preserved to enable filtering, ranking, and hybrid queries
This eliminates the need to build or maintain custom pipelines, making it significantly easier to adopt RAG in production environments.
Ingest from Anywhere: Logic Apps + AI Search
With today’s release, we’re extending ingestion to a variety of new data sources by integrating Logic Apps connectors directly with AI Search. This allows you to retrieve unstructured content from enterprise systems and seamlessly ingest it into the vector store.
Here’s how the ingestion process works with Logic Apps:
- Connect to Source Systems
Using prebuilt connectors, Logic Apps can fetch content from various data sources including Sharepoint document libraries, messages from Service Bur or Azure Queues, files from OneDrive or SFTP Server and more. You can trigger ingestion on demand or at schedule.
- Parse and Chunk Documents
Next, Logic Apps uses built-in AI-powered document parsing actions to extract raw text. This is followed by the “Chunk Document” action, which:
- Tokenizes the document based on language model-friendly units
- Splits the content into semantically coherent chunks
This ensures optimal chunk size for downstream embedding and retrieval.
Note – Currently we default to a chunk size of 5000 in the workflows created for document ingestion. We’ll be updating the default chunk size to a smaller number in our next release. Meanwhile, you can update it in the workflow if you need a smaller chunk size.
- Generate Embeddings with Azure OpenAI
The chunked text is then passed to the Azure OpenAI connector, where the text-embedding-3-small or another configured embedding model is used to generate high-dimensional vector representations. These vectors capture the semantic meaning of the content and are key to enabling accurate retrieval in RAG applications.
- Write to Azure AI Search
Finally, the embeddings, along with any relevant metadata (e.g., document title, tags, timestamps), are written into the AI Search index. The index schema is created for you ——and can include fields for filtering, sorting, and semantic ranking.
Logic Apps Templates: Fast Start, Flexible Design
To help you get started, we’ve created Logic Apps templates specifically for RAG ingestion. These templates:
- Include all the steps mentioned above
- Are customizable if you want to update the default configuration
Whether you’re ingesting thousands of PDFs from SharePoint or syncing files from Amazon S3 bucket, these templates provide a production-grade foundation for building your pipeline.
Getting Started
Here is step by step detailed documentation to get started using Integrated Vectorization with Logic Apps data sources
👉 Get started with Logic Apps data sources for AI Search ingestion
👉 Learn more about Integrated Vectorization in Azure AI Search
We'd Love Your Feedback
We're just getting started. Tell us:
- What other data sources would you like to ingest?
- What enhancements would make ingestion easier for your use case?
- Are there specific industry templates or formats we should support?
👉 Reply to this post or share your ideas through our feedback form
We’re building this with you—so your feedback helps shape the future of AI-powered automation and RAG.
Updated May 20, 2025
Version 1.0DivSwa
Microsoft
Joined October 11, 2022
Azure Integration Services Blog
Follow this blog board to get notified when there's new activity