automation
10 TopicsUse generative AI to extract structured data out of emails
One thing we regularly hear from clients is that they receive information that are key to their business such as order requests via email in an unstructured format and sometimes there are structured information within the body of those emails in a variety of table formats. In today’s fast-paced digital world, businesses need a way to automatically extract, structure, and integrate this information into their existing applications. Whether it’s leveraging AI-powered document processing, natural language processing (NLP), or intelligent automation, the right approach can transform email-based orders into structured, actionable data. In this blog, we’ll explore one such scenario where AI can be leveraged to extract information in tabular format that has been provided within an email. The emails contextually belong to a specific domain, but the tables are not with consistent headers or shapes. Sometimes in the body of one email there could be multiple tables. The problem Statement Extract tabular information with varying table formats from emails The typical approach to this problem involves rule-based processing, where individual tables are extracted and merged based on predefined logic. However, given the variety of email formats from hundreds or even thousands of different senders, maintaining such rule-based logic becomes increasingly complex and difficult to manage. A more optimal solution is leveraging the cognitive capabilities of generative AI, which can dynamically adapt to different table structures, column names, and formatting variations—eliminating the need for constant rule updates while improving accuracy and scalability. To create this sample code, I used below email with test data, with two tables with inconsistent column names. It is going to provide some upcoming trainings information. Please note the difference between the column headers: Hi there, Regarding the upcoming trainings, this is the list: Event Date Description of Event Length Grade 2025-01-21 Digital environments 20 hours 5 2025-03-01 AI for Industry A 10 hours 3 and some further events in below list Date Subject Duration Grade 2025-01-21 Digital environments 2 2 days 1 2025-03-01 AI for Industry B 2 weeks 4 These sessions are designed to be interactive and informative, so your timely participation is crucial. Please make sure to log in or arrive on time to avoid missing key insights. If you have any questions or need assistance, feel free to reach out. Looking forward to seeing you there! Thanks, Azadeh These are the two tables within the email, and we need to extract one consistent table format with all the rows from these two tables. Table 1 Event Date Description of Event Length Grade 2025-01-21 Digital environments 20 hours 5 2025-03-01 AI for Industry A 10 hours 3 Table 2 Date Subject Duration Grade 2025-01-21 Digital environments 2 2 days 1 2025-03-01 AI for Industry B 2 weeks 4 To extract the tabular data into one single table in json format, I am using python with below libraries installed in my environment: pandas beautifulsoup4 openai lxml The Code I use azure OpenAI service with a gpt 4o deployment. Below code is just one way of solving this type of problem and can be customized or improved to fit to other similar problems. I have provided some guidelines about merging the tables and column names similarity in the user prompt. This sample code is using an email message that is saved in 'eml' format in a local path, but the email library has other capabilities to help you connect to a mailbox and get the emails. import email import pandas as pd from bs4 import BeautifulSoup import os from openai import AzureOpenAI endpoint = os.getenv("ENDPOINT_URL", "https://....myendpointurl....openai.azure.com/") deployment = os.getenv("DEPLOYMENT_NAME", "gpt-4o") subscription_key = os.getenv("AZURE_OPENAI_API_KEY", "myapikey) # Initialize Azure OpenAI Service client with key-based authentication client = AzureOpenAI( azure_endpoint=endpoint, api_key=subscription_key, api_version="2024-05-01-preview", ) # Process email content with GPT-4 def extract_information(email_body, client): soup = BeautifulSoup(email_body, "html.parser") body = soup.get_text() print(body) #Prepare the chat prompt chat_prompt = [ { "role": "system", "content": [ { "type": "text", "text": "You are an AI assistant that is expert in extracting structured data from emails." } ] }, { "role": "user", "content": [ { "type": "text", "text": f"Extract the required information from the following email and format it as JSON and consolidate the tables using the common column names. For example the columns length and duration are the same and the columns Event and Subject are the same:\n\n{body}" } ] } ] messages = chat_prompt # Generate the completion completion = client.chat.completions.create( model=deployment, messages=messages, max_tokens=800, temperature=0.1, top_p=0.95, frequency_penalty=0, presence_penalty=0, stop=None, stream=False ) return completion.choices[0].message.content email_file_name = r'...path to your file....\Test Email with Tables.eml' with open(email_file_name, "r") as f: msg = email.message_from_file(f) email_body = "" for part in msg.walk(): if part.get_content_type() == "text/plain": email_body = part.get_payload(decode=True).decode() elif part.get_content_type() == "text/html": email_body = part.get_payload(decode=True).decode() extracted_info = extract_information(email_body, client) print(extracted_info) The output is: ``` [ { "Event": "Digital environments", "Date": "2025-01-21", "Length": "20 hours", "Grade": 5 }, { "Event": "AI for Industry A", "Date": "2025-03-01", "Length": "10 hours", "Grade": 3 }, { "Event": "Digital environments 2", "Date": "2025-01-21", "Length": "2 days", "Grade": 1 }, { "Event": "AI for Industry B", "Date": "2025-03-01", "Length": "2 weeks", "Grade": 4 } ] ``` Key points in the code: Read an email and extract the body Use a gen AI model with the right instructions prompt to complete the task Gen AI will follow the instructions and create a combined consistent table Get the output in the right format, e.g. 'json' I hope you find this blog post helpful, and you can apply it to your use case/domain. Or you can simply get the idea of how to use generative AI to solve a problem, instead of building layers of custom logic.1.2KViews7likes1CommentThe Future of AI: Customizing AI agents with the Semantic Kernel agent framework
The blog post Customizing AI agents with the Semantic Kernel agent framework discusses the capabilities of the Semantic Kernel SDK, an open-source tool developed by Microsoft for creating AI agents and multi-agent systems. It highlights the benefits of using single-purpose agents within a multi-agent system to achieve more complex workflows with improved efficiency. The Semantic Kernel SDK offers features like telemetry, hooks, and filters to ensure secure and responsible AI solutions, making it a versatile tool for both simple and complex AI projects.1.1KViews3likes0CommentsThe Future of AI: Power Your Agents with Azure Logic Apps
Building intelligent applications no longer requires complex coding. With advancements in technology, you can now create agents using cloud-based tools to automate workflows, connect to various services, and integrate business processes across hybrid environments without writing any code.2.2KViews2likes1CommentWebNN: Bringing AI Inference to the Browser
Unlock the Future of AI with WebNN: Bringing Machine Learning to Your Browser Discover how the groundbreaking Web Neural Network API (WebNN) is revolutionizing web development by enabling powerful machine learning computations directly in your browser. From real-time AI interactions to privacy-preserving data processing, WebNN opens up a world of possibilities for creating intelligent, responsive web applications. Dive into our comprehensive guide to understand the architecture, see code examples, and explore exciting use cases that showcase the true potential of WebNN. Whether you're a seasoned developer or just curious about the future of web-based AI, this article is your gateway to the cutting-edge of technology. Read on to find out more!7.8KViews1like0CommentsUsing Bing with GenAI: from Sales Strategy to Compete Analysis
In today's data-driven world, understanding and leveraging data is crucial for maintaining a competitive edge. The integration of Bing as a search engine for extracting data from the internet is transforming how businesses interact with external and internal data sources. This approach not only enriches internal data repositories but also enhances decision-making and operational management processes. Key Highlights: Robust Data Extraction: Establish a sophisticated system for data extraction to access a wealth of information from the internet, crucial for understanding market dynamics and opportunities. Advanced Search Capabilities: Develop an internal data search engine using Bing, enabling rapid and reliable information retrieval to better understand the operational environment. Customized Data Handling: Differentiate between handling external and internal data to optimize data usage — external data for strategic insights and internal data for operational intelligence. Leverage Disruptive Technologies: Utilize Bing combined with generative AI to analyze market shares, customer reviews, and social media, providing deeper insights into market trends and consumer preferences.5KViews1like0Comments