Reflecting on a recent use case involving a large number of documents, with a need for OCR, Structured Output and Copilot Studio as a low code option
Setting the Stage: The Challenge
In today’s enterprise landscape, organizations are inundated with vast amounts of unstructured documents—scanned forms, invoices, contracts, and more. Extracting meaningful, structured data from these sources is a critical yet complex task. Recently, I faced just such a challenge: a project requiring the processing of thousands of documents, each needing accurate Optical Character Recognition (OCR), structured data extraction, and seamless integration into business workflows.
In this study, the customer has been trying to build an end-to-end solution using Copilot Studio, however they faced certain challenges when the volume of data increased and the analytical nature of the expected interactions with the service.
The Solution Landscape: Why Copilot Studio and Azure AI?
Traditionally, such tasks would demand a combination of specialized OCR engines, custom code, and manual effort. However, with the evolution of AI services and low-code platforms, the opportunity to streamline these processes has never been greater. Microsoft’s Azure AI Services offer robust, scalable tools for document intelligence and data extraction, while Copilot Studio provides a low-code canvas to orchestrate workflows and empower business users.
Bridging the Gap: Combining the Two Worlds
This case study explores how blending Azure AI’s advanced document processing with Copilot Studio’s low-code capabilities can deliver powerful, end-to-end solutions. In the sections that follow, I’ll walk through the architecture, key implementation steps, lessons learned, and the tangible benefits realized by merging these two worlds.
Understanding the Input Data
The project centered around a substantial repository of PDF files—thousands of documents accumulated over several years. These files reflected the diversity and complexity commonly found in enterprise archives: a wide variety of document types, layouts, and formats, all captured in digital form. However, the first page of documents included a schema type with certain fields and key dates and features. The documents ranged between 10 to 20 pages long. Without revealing the actual customer's data, let's assume the source documents are safety incident reports in pdf format.
Defining the Expected Outcomes
At the heart of the project was the customer’s need to unlock actionable insights from their vast collection of unstructured documents. Rather than simply digitizing or archiving these files, the goal was to enable intelligent querying and analysis that would drive business decisions and support compliance requirements.
The customer identified a set of core questions they wanted to answer from the documents, such as:
- Reference and Citation-Based Queries:
- Example: “What is the source document for this particular data point?”
- Example: “What are the incidents that John Smith was involved in?”
- Analytical and Statistical Queries:
- Example: “How many times did a specific type of incident occur within a given time frame?”
- Example: “Which documents mention a particular individual, and how often are they referenced across the dataset?”
- Example: “Are there trends or patterns in the types of cases processed over the past few years?”
While in many RAG based approaches the first category can be met in a satisfactory manner, the second category of queries need additional efforts, and this is where customer initially faced challenges with Copilot Studio.
The default capabilities in Copilot Studio to use unstructured data are explained here.
Below diagram shows how Copilot Studio deals with unstructured data with the underlying components in Dataverse as a SaaS capability with minimal control and customizations available out-of-the-box and as a low-code option. While it represents great capabilities in a lot of areas, for certain complex use cases you would prefer more control and customizations available for a developer:
The Solution
As a developer and a pro-code solution architect, I focused on the sample queries and expected responses, both citation-based and analytical. I used python as a programming language and services on Azure with relevant SDKs to solve the problem. These are the high-level steps taken for this use case:
Designing and creating the schema for relational data and vector data
Processing the input documents (1)
- extracting the contents
- extracting the structured output from the contents (2), pushing to the relational database (3)
- chunking and vectorizing (2), pushing to vector database (4)
Creating the client app in Microsoft Copilot Studio and pointing it to the vector db and relational db for relevant intents
In the sample implementation below services are used for each step as it is displayed in below diagram:
(1) Function App
(2) Azure OpenAI (embedding model and gpt model with response API)
(3) Azure SQL
(4) Azure AI Search
Source Code
The source code that I have produced for this use case has been refactored so it could be publicly shared. Please feel free to review and use it with care. It only contains the azure side and not the Copilot Studio part. I have used an app registration with proper RBAC access to the resources, to connect to these databases in Coplot Studio and interact with them in my implementation.
The source code (python) can be found here:
https://github.com/azadehgnia/AzureOpenAI_DocProcessing
Remarks
OpenAI's Response API is a very capable tool to extract structured output from documents and since it is generally available, can be a great addition to any enterprise solutions involving document processing and unstructured data.
Microsoft Copilot Studio, as an enterprise low code solution remains a strategic service where many organizations can rely on, to democratize the use of AI rapidly. There are new capabilities and features being added to service regularly and things that might be a limitation for certain tasks today can soon become a new feature, so keep an eye on it.