Blog Post

Azure Integration Services Blog
4 MIN READ

Announcing Parse & Chunk with Metadata in Logic Apps: Build Context-Aware RAG Agents

shahparth's avatar
shahparth
Icon for Microsoft rankMicrosoft
Oct 01, 2025

The new Parse document with metadata and Chunk text with metadata actions in Logic Apps bring powerful improvements to how you handle documents. Unlike the previously released parse and chunk actions, these new versions provide rich metadata alongside the text:

  • pageNumber — the page a chunk came from
  • totalPages — the total number of pages in the document
  • sentencesAreComplete — ensures chunks end on full sentences, avoiding broken fragments

This means you don’t just get raw text—you also get the context you need for citations, navigation, and downstream processing. You can also adjust your chunking strategy based on these metadata fields

Once parsed and chunked with metadata, you can embed and index documents in Azure AI Search, and then use an Agent Loop in Logic Apps that calls Vector Search as a Tool to answer questions with precise, page-level references

In this blog, we’ll walk through a scenario where we index two enterprise contracts (a Master Service Agreement and a Procurement Agreement) and then use an Agent Loop to answer natural language questions with citations.

Pre-requisites

  • Azure Blob Storage for your documents
  • Azure AI Search with an index
  • Azure OpenAI deployment (embeddings + chat model)
  • Logic App (Standard) with the new AI actions 

Here is a sample demo on GitHub you can provision to follow along.

Step 1: Ingestion flow 

Goal: Convert raw PDFs into sentence-aware chunks with metadata, then index them.

📸 Workflow overview

 

 

  • When a blob is added or modified (container with your contracts).

    📸 Blob trigger

     

  • Read blob content
    📸 
    Read blob content action

     


  • Parse document with Metadata
    • Input: File content from previous step
      📸 Parse document with metadata action

       

  • Chunk text with metadata
    • Input: entire parsed text items array
      📸 Chunk text with metadata action

       

  • Get multiple embeddings
    • Input: Embedding model and text chunks for which vector impressions will be generated
      📸 Get multiple embeddings action

       

  • Select index objects
    • Input: Raw text content, embeddings, documentName and uniqueID to be passed into the index 
      📸 Select array action

       

  • Index multiple documents
    • Input: Array object output from previous Select step
      📸 Index documents action

Step 2: Agent flow with Vector Search as a tool

Goal: Let the agent answer natural language questions grounded in your indexed contract

  • Conversational workflow creation: From the portal, create a new Conversational workflow type 
    📸 Conversational flow creation 

     

  • Agent action
    • Model: gpt-4.1 (Note: please use this model instead of gpt-4 or gpt-4o)
    • System instructions
      You are a helpful assistant, answering questions about specific documents. When a question is asked, follow these steps in order: 
      Use the agent parameter body prompt to pass in the user's questions to the Document search tool. Use this tool to do a vector search of the user's question, the output of the vector search tool will have the related information to answer the question. The output will be in the form of a json array. Each array object will have a "content" property, use the "content" property to generate an answer. Use only information to answer the user's question and cite the source using the page number you found it on. No other data or information should be used to answer the question.

      💡One of the coolest parts is how you can create an Agent Parameter that automatically carries the chat input into the tool call.
      In this case, our body prompt parameter brings the user’s question straight into the tool.


      💡Second of the coolest parts is because the tool’s response comes back in content, the agent automatically extracts it—no expressions required. It’s declarative and effortless.

      📸 Agent action

       

  • Tool: Overview
    • Input description: Details on what the tool achieves
    • Agent parameter: Body prompt to pass in context from the chat prompt to the tool
      📸 Tool action

       

  • Tool: Search vectors with natural language
    • Input index name: this is the name of AI Search index
    • Search text: Body prompt parameter containing query from prompt
    • Nearest neighbors: Number of matches to be returned
      📸 Tool: Search vector action

       

Step 3: Try it out (example end-to-end)

Indexing is automatic whenever a file is added to your storage container. 

The Storage trigger fires the document is read, parsed, chunked, embedded, and indexed into AI Search. You can confirm this end-to-end in run history for the indexing Logic App as well, where Parse and Chunk outputs clearly show pageNumber, totalPages, and sentencesAreComplete values.

📸 Screenshot: Indexing flow run history with Parse/Chunk metadata outputs

 

Now let's see it in action using the Chat experience to validate the retrieval flow

Example question: "What is the standard payment timeline"

📸 Answer

The answer contains detailed information along with citation of page numbers which is leveraged from the new actions that contain such metadata information. 

📸 Run history view of Agent

One can also trace the path agent followed with the inputs and outputs to streamline debugging and ensure agent responds reliably.

Conclusion

With Parse & Chunk with Metadata, you don’t just split text—you gain page numbers, total pages, and sentence completeness that make answers trustworthy and easy to cite. Combined with Agent Loop + Vector Search as a Tool, this unlocks production-ready contract Q&A in just a few steps.

Updated Oct 01, 2025
Version 1.0
No CommentsBe the first to comment