Blog Post

Apps on Azure Blog
7 MIN READ

FSI Knowledge Mining and Intelligent Document Process Reference Architecture

GeoJames's avatar
GeoJames
Icon for Microsoft rankMicrosoft
Apr 29, 2025

FSI customers such as insurance companies and banks rely on their vast amounts of data to provide sometimes hundreds of individual products to their customers. From assessing product suitability, underwriting, fraud investigations, and claims handling, many employees and applications depend on accessing this data to do their jobs efficiently. Since the capabilities of GenAI have been realised, we have been helping our customers in this market transform their business with unified systems that simplify access to this data and speed up the processing times of these core tasks, while remaining compliant with the numerous regulations that govern the FSI space.

Combining the use of Knowledge Mining with Intelligent Document processing provides a powerful solution to reduce the manual effort and inefficacies of ensuring data integrity and retrieval across the many use cases that most of our customers face daily.

What is Knowledge Mining and Intelligent Document Processing?

Knowledge Mining is a process that transforms large, unstructured data sets into searchable knowledge stores. Traditional search methods often rely on keyword matching, which can miss the context of the information. In contrast, knowledge mining uses advanced techniques like natural language processing (NLP) to understand the context and meaning behind the data, providing a robust searching mechanism that can look across all these data sources, understand the relationships between the data therefore providing more accurate and relevant results.

 

Intelligent Document Processing (IDP) is a workflow automation technology designed to scan, read, extract, categorise, and organise meaningful information from large streams of data. Its primary function is to extract valuable information from extensive data sets without human input, thereby increasing processing speed and accuracy while reducing costs. By leveraging a combination of Artificial Intelligence (AI), Machine Learning (ML), Optical Character Recognition (OCR), and Natural Language Processing (NLP), IDP handles both structured and unstructured documents. By ensuring that the processed data meets the "gold standard" - structured, complete, and compliant - IDP helps organizations maintain high-quality, reliable, and actionable data.

The Power of Knowledge Mining and Intelligent Document Processing as a Unified Solution

Knowledge Mining excels at quickly responding to natural language queries, providing valuable insights and making previously unsearchable data accessible. At the same time, IDP ensures that the processed data meets the "gold standard"—structured, complete, and compliant—making it both reliable and actionable. Together, these technologies empower organisations to harness the full potential of their data, driving better decision-making and improved efficiency.

__________________________________________________________________

Meet Alex: A Day in the Life of a Fraud Case Worker

 

Responsibilities:
  • Investigate potential fraud cases by manually searching across multiple systems.
  • Read and analyse large volumes of information to filter out relevant data.
  • Ensure compliance with regulatory requirements and maintain data accuracy.
  • Prepare detailed reports on findings and recommendations.
Lost in Data: The Struggles of Manual Fraud Investigation

Alex receives a new fraud case and starts by manually searching through multiple systems to gather information. This process takes several hours, and Alex has to read through numerous documents and emails to filter out relevant data. The inconsistent data formats and locations make it challenging to ensure accuracy. By the end of the day, Alex is exhausted and has only made limited progress on the case.

Effortless Efficiency: Fraud Investigation Transformed with Knowledge Mining and IDP

Alex receives a new fraud case and needs to gather all relevant information quickly. Instead of manually searching through multiple systems, Alex inputs the following natural language query into the unified system:

"Show me all documents, emails, and notes related to the recent transactions of client X that might indicate fraudulent activity."

The system quickly retrieves and presents a comprehensive summary of all relevant documents, emails, and notes, ensuring that the data is structured, complete, and compliant. This allows Alex to focus on analysing the data and making informed decisions, significantly improving the efficiency and accuracy of the investigation.

How has Knowledge Mining and IDP transformed Alex's role?

Before implementing Knowledge Mining and Intelligent Document Processing, Alex faced a manual process of searching across multiple systems to gather information. This was time-consuming and labour-intensive, often leading to delays in investigations. The overwhelming volume of data from various sources made it difficult to filter out relevant information, and the inconsistent data formats and locations increased the risk of errors. This high workload not only reduced Alex's efficiency but also led to burnout and decreased job satisfaction. However, with the introduction of a unified system powered by Knowledge Mining and IDP, these challenges were significantly mitigated. Automated searches using natural language queries allowed Alex to quickly find relevant information, while IDP ensured that the data processed was structured, complete, and compliant. This unified system provided a comprehensive view of the data, enabling Alex to make more informed decisions and focus on higher-value tasks, ultimately improving productivity and job satisfaction.

____________________________________________________________________

Example Architecture

 

Knowledge Mining
  1. Users can interact with the system through a portal on the customer’s front-end of choice. This will serve as the entry point for submitting queries and accessing the knowledge mining service. Front-end options could include web apps, container services or serverless integrations.
  2. Azure AI Search provides powerful RAG capabilities. Meanwhile, Azure Open AI provides access to large language models to summarise responses. These services combined will take the user’s query to search the knowledge base and return relevant information which can be augmented as required. Prompt engineering can provide customisation to how the data is returned.
  3. You define what the data sources your Azure AI Search will consume. This can be Azure storage services or other data repositories. Data that meets a pre-defined gold standard is queried by Azure AI Search and relevant data is returned to the user. Gold standard data could be based on compliance or business needs.
  4. Power BI can be used to create analytical reports based on the data retrieved and processed. This step involves visualising the data in an interactive and user-friendly manner, allowing users to gain insights and make data-driven decisions.
Intelligent Document Processing
  1. (Optional) Azure Data Factory is a data integration service that allows you to create workflows for data movement and transforming data at scale. This business data can be easily ingested to your Azure data storage solutions using pre-built connectors. This event driven approach ensures that as new data is generated, it can automatically be processed and ready for use in your knowledge mining solution.
  2. Data can be transformed using Functions apps and Azure OpenAI. Through prompt engineering, the large language model (LLM) can highlight specific issues in the documents, such as grammatical errors, irrelevant content, or incomplete information. The LLM can then be used to rewrite text to improve clarity and accuracy, add missing information, or reformat content to adhere to guidelines. Transformed data is stored as gold standard data.

____________________________________________________________________

Additional Cloud Considerations

Networking
  • VNETs (Virtual Networks) are a fundamental component of cloud infrastructure that enable secure and isolated networking configurations within a cloud environment. They allow different resources, such as virtual machines, databases, and services, to communicate with each other securely. Virtual networks ensure that services such as Azure AI Search, Azure OpenAI, and Power BI, can securely communicate with each other. This is crucial for maintaining the integrity and confidentiality of sensitive financial data.
  • Express Route or VPN are expected to be used when connecting on-premises infrastructure to Azure for several reasons. Your company Azure ExpressRoute provides a private, reliable, and high-speed connection between your data center and Microsoft Azure. It allows you to extend your infrastructure to Azure by providing private access to resources deployed in Azure Virtual Networks and public services like App service, private end points to various other services. This private peering ensures that your traffic never enters the public Internet, enhancing security and performance. ExpressRoute uses Border Gateway Protocol (BGP) for dynamic routing between your on-premises networks and Azure, ensuring efficient and secure data exchange. It also offers built-in redundancy and high availability, making it a robust solution for critical workloads.
  • Azure Front Door is a cloud-based Content Delivery Network (CDN) and application delivery service provided by Microsoft. It offers several key features, including global load balancing, dynamic site acceleration, SSL offloading, and a web application firewall, making it an ideal solution for optimizing and protecting web applications. We are expecting to use Front door in scenarios when the architecture will be expected to be used by users outside the organisation.
  • Azure API Management in this scenario is expected to be used when we look to rollout the solution to larger groups. We look to then integrate much more security, rate limiting, load balancing, etc.
Monitoring and Governance
  • Azure Monitor: This service collects and analyses telemetry data from various resources, providing insights into the performance and health of the system. It enables proactive identification and resolution of issues, ensuring the system runs smoothly.
  • Azure Cost Management and Billing: Provides tools for monitoring and controlling costs associated with the solution. It offers insights into spending patterns and resource usage, enabling efficient financial governance.
  • Application Insights: Provides application performance monitoring (APM) designed to help you understand how your applications are performing and to identify issues that may affect their performance and reliability

 

These components together ensure that the Knowledge Mining and Intelligent Document Processing solution is monitored for performance, secured against threats, compliant with regulations, and managed efficiently from a cost perspective.

____________________________________________________________________

Next steps:

  1. Identify the data and its sources that will feed into your own Knowledge Mine. Consider if you also need to implement Intelligent Document Processing to ensure data quality.
  2. Define your 'gold standards'. These guidelines will determine how your data might be transformed.
  3. Consider how to provide access to the data through an application portal, choose the right front-end technology for your use case.
  4. Once you have configured Azure AI search to point to the chosen data, consider how you might augment responses using Azure AI LLM models.

 

Useful resources

AI Landing Zone reference architecture

Azure and Open AI with API Manager

Secure connectivity from on premesis to Azure hosted solutions

 

Updated Mar 27, 2025
Version 1.0
No CommentsBe the first to comment