The new Parse document with metadata and Chunk text with metadata actions in Logic Apps bring powerful improvements to how you handle documents. Unlike the previously released parse and chunk actions, these new versions provide rich metadata alongside the text:
- pageNumber — the page a chunk came from
- totalPages — the total number of pages in the document
- sentencesAreComplete — ensures chunks end on full sentences, avoiding broken fragments
This means you don’t just get raw text—you also get the context you need for citations, navigation, and downstream processing. You can also adjust your chunking strategy based on these metadata fields
Once parsed and chunked with metadata, you can embed and index documents in Azure AI Search, and then use an Agent Loop in Logic Apps that calls Vector Search as a Tool to answer questions with precise, page-level references
In this blog, we’ll walk through a scenario where we index two enterprise contracts (a Master Service Agreement and a Procurement Agreement) and then use an Agent Loop to answer natural language questions with citations.
Pre-requisites
- Azure Blob Storage for your documents
- Azure AI Search with an index
- Azure OpenAI deployment (embeddings + chat model)
- Logic App (Standard) with the new AI actions
Here is a sample demo on GitHub you can provision to follow along.
Step 1: Ingestion flow
Goal: Convert raw PDFs into sentence-aware chunks with metadata, then index them.
📸 Workflow overview
- When a blob is added or modified (container with your contracts).
📸 Blob trigger
- Read blob content
📸 Read blob content action
- Parse document with Metadata
- Input: File content from previous step
📸 Parse document with metadata action
- Input: File content from previous step
- Chunk text with metadata
- Input: entire parsed text items array
📸 Chunk text with metadata action
- Input: entire parsed text items array
- Get multiple embeddings
- Input: Embedding model and text chunks for which vector impressions will be generated
📸 Get multiple embeddings action
- Input: Embedding model and text chunks for which vector impressions will be generated
- Select index objects
- Input: Raw text content, embeddings, documentName and uniqueID to be passed into the index
📸 Select array action
- Input: Raw text content, embeddings, documentName and uniqueID to be passed into the index
- Index multiple documents
- Input: Array object output from previous Select step
📸 Index documents action
- Input: Array object output from previous Select step
Step 2: Agent flow with Vector Search as a tool
Goal: Let the agent answer natural language questions grounded in your indexed contract
- Conversational workflow creation: From the portal, create a new Conversational workflow type
📸 Conversational flow creation
- Agent action
- Model: gpt-4.1 (Note: please use this model instead of gpt-4 or gpt-4o)
- System instructions
You are a helpful assistant, answering questions about specific documents. When a question is asked, follow these steps in order: Use the agent parameter body prompt to pass in the user's questions to the Document search tool. Use this tool to do a vector search of the user's question, the output of the vector search tool will have the related information to answer the question. The output will be in the form of a json array. Each array object will have a "content" property, use the "content" property to generate an answer. Use only information to answer the user's question and cite the source using the page number you found it on. No other data or information should be used to answer the question.
💡One of the coolest parts is how you can create an Agent Parameter that automatically carries the chat input into the tool call.
In this case, our body prompt parameter brings the user’s question straight into the tool.
📸 Agent action
💡Second of the coolest parts is because the tool’s response comes back in content, the agent automatically extracts it—no expressions required. It’s declarative and effortless.
- Tool: Overview
- Input description: Details on what the tool achieves
- Agent parameter: Body prompt to pass in context from the chat prompt to the tool
📸 Tool action
- Tool: Search vectors with natural language
- Input index name: this is the name of AI Search index
- Search text: Body prompt parameter containing query from prompt
- Nearest neighbors: Number of matches to be returned
📸 Tool: Search vector action
Step 3: Try it out (example end-to-end)
Indexing is automatic whenever a file is added to your storage container.
The Storage trigger fires the document is read, parsed, chunked, embedded, and indexed into AI Search. You can confirm this end-to-end in run history for the indexing Logic App as well, where Parse and Chunk outputs clearly show pageNumber, totalPages, and sentencesAreComplete values.
📸 Screenshot: Indexing flow run history with Parse/Chunk metadata outputs
Now let's see it in action using the Chat experience to validate the retrieval flow
Example question: "What is the standard payment timeline"
📸 Answer
The answer contains detailed information along with citation of page numbers which is leveraged from the new actions that contain such metadata information.
📸 Run history view of Agent
One can also trace the path agent followed with the inputs and outputs to streamline debugging and ensure agent responds reliably.
Conclusion
With Parse & Chunk with Metadata, you don’t just split text—you gain page numbers, total pages, and sentence completeness that make answers trustworthy and easy to cite. Combined with Agent Loop + Vector Search as a Tool, this unlocks production-ready contract Q&A in just a few steps.