Pamela Fox
2 TopicsEntity extraction with Azure OpenAI Structured Outputs
📺 Tune into our live stream on this topic on December 3rd! Have you ever wanted to extract some details from a large block of text, like to figure out the topics of a blog post or the location of a news article? In the past, I've had to use specialized models and domain-specific packages for entity extraction. But now, we can do entity extraction with large language models and get equally impressive results. 🎉 When we use the OpenAI gpt-4o model along with the structured outputs mode,we can define a schema for the details we'd like to extract and get a response that conforms to that schema. Here's the most basic example from the Azure OpenAI tutorial about structured outputs: class CalendarEvent(BaseModel): name: str date: str participants: list[str] completion = client.beta.chat.completions.parse( model="MODEL_DEPLOYMENT_NAME", messages=[ {"role": "system", "content": "Extract the event information."}, {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."}, ], response_format=CalendarEvent, ) output = completion.choices[0].message.parsed The code first defines the CalendarEvent class, an instance of a Pydantic model. Then it sends a request to the GPT model specifying a response_format of CalendarEvent . The parsed output will be a dictionary containing a name , date , and participants . We can even go a step farther and turn the parsed output into a CalendarEvent instance, using the Pydantic model_validate method: event = CalendarEvent.model_validate(event) With this structured outputs capability, it's easier than ever to use GPT models for "entity extraction" tasks: give it some data, tell it what sorts of entities to extract from that data, and constrain it as needed. Extracting from GitHub READMEs Let's see an example of a way that I actually used structured outputs, to help me summarize the submissions that we got to a recent hackathon. I can feed the README of a repository to the GPT model and ask for it to extract key details like project title and technologies used. First I define the Pydantic models: class Language(str, Enum): JAVASCRIPT = "JavaScript" PYTHON = "Python" DOTNET = ".NET" class Framework(str, Enum): LANGCHAIN = "Langchain" SEMANTICKERNEL = "Semantic Kernel" LLAMAINDEX = "Llamaindex" AUTOGEN = "Autogen" SPRINGBOOT = "Spring Boot" PROMPTY = "Prompty" class RepoOverview(BaseModel): name: str summary: str = Field(..., description="A 1-2 sentence description of the project") languages: list[Language] frameworks: list[Framework] In the code above, I asked for a list of a Python enum, which will constrain the model to return only options matching that list. I could have also asked for a list[str] to give it more flexibility, but I wanted to constrain it in this case. I also annoted the description using the Pydantic Field class so that I could specify the length of the description. Without that annotation, the descriptions are often much longer. We can use that description whenever we want to give additional guidance to the model about a field. Next, I fetch the GitHub readme, storing it as a string: url = "https://api.github.com/repos/shank250/CareerCanvas-msft-raghack/contents/README.md" response = requests.get(url) readme_content = base64.b64decode(response.json()["content"]).decode("utf-8") Finally, I send off the request and convert the result into a RepoOverview instance: completion = client.beta.chat.completions.parse( model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"), messages=[ { "role": "system", "content": "Extract info from the GitHub issue markdown about this hack submission.", }, {"role": "user", "content": readme_content}, ], response_format=RepoOverview, ) output = completion.choices[0].message.parsed repo_overview = RepoOverview.model_validate(output) You can see the full code in extract_github_repo.py That gives back an object like this one: RepoOverview( name='Job Finder Chatbot with RAG', description='This project is a chatbot application aimed at helping users find job opportunities and get relevant answers to questions about job roles, leveraging Retrieval-Augmented Generation (RAG) for personalized recommendations and answers.', languages=[<Language.JAVASCRIPT: 'JavaScript'>], azure_services=[<AzureService.AISEARCH: 'AI Search'>, <AzureService.POSTGRESQL: 'PostgreSQL'>], frameworks=[<Framework.SPRINGBOOT: 'Spring Boot'>] ) Extracting from PDFs I talk to many customers that want to extract details from PDF, like locations and dates, often to store as metadata in their RAG search index. The first step is to extract the PDF as text, and we have a few options: a hosted service like Azure Document Intelligence, or a local Python package like pymupdf. For this example, I'm using the latter, as I wanted to try out their specialized pymupdf4llm package that converts the PDF to LLM-friendly markdown. First I load in a PDF of an order receipt and convert it to markdown: md_text = pymupdf4llm.to_markdown("example_receipt.pdf") Then I define the Pydantic models for a receipt: class Item(BaseModel): product: str price: float quantity: int class Receipt(BaseModel): total: float shipping: float payment_method: str items: list[Item] order_number: int In this example, I'm using a nested Pydantic model Item for each item in the receipt, so that I can get detailed information about each item. And then, as before, I send the text off to the GPT model and convert the response back to a Receipt instance: completion = client.beta.chat.completions.parse( model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"), messages=[ {"role": "system", "content": "Extract the information from the blog post"}, {"role": "user", "content": md_text}, ], response_format=Receipt, ) output = completion.choices[0].message.parsed receipt = Receipt.model_validate(output) You can see the full code in extract_pdf_receipt.py Extracting from images Since the gpt-4o model is also a multimodal model, it can accept both images and text. That means that we can send it an image and ask it for a structured output that extracts details from that image. Pretty darn cool! First I load in a local image as a base-64 encoded data URI: def open_image_as_base64(filename): with open(filename, "rb") as image_file: image_data = image_file.read() image_base64 = base64.b64encode(image_data).decode("utf-8") return f"data:image/png;base64,{image_base64}" image_url = open_image_as_base64("example_graph_treecover.png") For this example, my image is a graph, so I'm going to have it extract details about the graph. Here are the Pydantic models: class Graph(BaseModel): title: str description: str = Field(..., description="1 sentence description of the graph") x_axis: str y_axis: str legend: list[str] Then I send off the base-64 image URI to the GPT model, inside a "image_url" type message, and convert the response back to a Graph object: completion = client.beta.chat.completions.parse( model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"), messages=[ {"role": "system", "content": "Extract the information from the graph"}, { "role": "user", "content": [ {"image_url": {"url": image_url}, "type": "image_url"}, ], }, ], response_format=Graph, ) output = completion.choices[0].message.parsed graph = Graph.model_validate(output) More examples You can use this same general approach for entity extraction across many file types, as long as they can be represented in either a text or image form. See more examples in my azure-openai-entity-extraction repository. As always, remember that large language models are probabilistic next-word-predictors that won't always get things right, so definitely evaluate the accuracy of the outputs before you use this approach for a business-critical task.