In today's enterprise landscape, procurement processes remain heavily manual despite decades of digital transformation. Many organizations still rely on legacy systems with web interfaces that lack proper APIs, resulting in procurement teams spending countless hours on data entry, contract verification, invoice processing, and compliance checks. This technical deep dive explores how we've leveraged Azure OpenAI's groundbreaking Computer Using Agent (CUA) to create an agentic solution that automates the entire procure-to-pay workflow without requiring API development or system modifications.
Solution Architecture
The solution leverages a comprehensive stack of Azure technologies:
- **Azure OpenAI Service**: Powers core AI capabilities
- Responses API: Orchestrates the workflow, by calling the tools below and performing actions automatically.
- Computer Using Agent (CUA) model: Enables browser automation. This is called through Function Calling, since there are other steps to be performed between the calls to this model, where the gpt-4o model is used, like reasoning through vision, performing vector search and evaluating business rules for anomalies detection.
- GPT-4o: Processes invoice images with vision capabilities
- Vector store: Maintains business rules and documentation
- Azure Container Apps: Hosts procurement web applications
- Azure SQL Database: Stores contract and procurement data
- Playwright: Handles browser automation underneath the CUA
Technical Flow: Under the Hood
Let's dive into the step-by-step execution flow to understand how the solution works.
The application merely calls the Responses API and provides instructions in natural language about what needs to be done in what sequence. Based on these instructions, the Responses API orchestrates the call to the other models and tools. It takes care of preparing the data for every next call based on the output from the previous call.
For example, in this case, the instructions are:
instructions = """
This is a Procure to Pay process. You will be provided with the Purchase Invoice image as input.
Note that Step 3 can be performed only after Step 1 and Step 2 are completed.
Step 1: As a first step, you will extract the Contract ID from the Invoice and also all the line items from the Invoice in the form of a table.
Step 2: You will then use the function tool to call the computer using agent with the Contract ID to get the contract details.
Step 3: You will then use the file search tool to retrieve the business rules applicable to detection of anomalies in the Procure to Pay process.
Step 4: Then, apply the retrieved business rules to match the invoice line items with the contract details fetched from in step 2, and detect anomalies if any.
- Perform validation of the Invoice against the Contract and determine if there are any anomalies detected.
- **When giving the verdict, you must call out each Invoice and Invoice line detail where the discrepancy was. Use your knowledge of the domain to interpret the information right and give a response that the user can store as evidence**
- Note that it is ok for the quantities in the invoice to be lesser than the quantities in the contract, but not the other way around.
- When providing the verdict, depict the results in the form of a Markdown table, matching details from the Invoice and Contract side-by-side. Verification of Invoice Header against Contract Header should be in a separate .md table format. That for the Invoice Lines verified against the Contract lines in a separate .md table format.
- If the Contract Data is not provided as an input when evaluating the Business rules, then desist from providing the verdict. State in the response that you could not provide the verdict since the Contract Data was not provided as an input. **DO NOT MAKE STUFF UP**.
**Use chain of thought when processing the user requests**
Step 5: Finally, you will use the function tool to call the computer using agent with the Invoice details to post the invoice header data to the system.
- use the content from step 4 above, under ### Final Verdict, for the value of the $remarks field, after replacing the new line characters with a space.
- The instructions you must pass are: Fill the form with purchase_invoice_no '$PurchaseInvoiceNumber', contract_reference '$contract_reference', supplier_id '$supplierid', total_invoice_value $total_invoice_value (in 2335.00 format), invoice_date '$invoice_data' (string in mm/dd/yyyy format), status '$status', remarks '$remarks'. Save this information by clicking on the 'save' button. If the response message shows a dialog box or a message box, acknowledge it. \n An example of the user_input format you must send is -- 'Fill the form with purchase_invoice_no 'PInv_001', contract_reference 'contract997801', supplier_id 'supplier99010', total_invoice_value 23100.00, invoice_date '12/12/2024', status 'approved', remarks 'invoice is valid and approved'. Save this information by clicking on the 'save' button. If the response message shows a dialog box or a message box, acknowledge it'
"""
Note that we are giving few shot examples above that will be used by the CUA model to interpret the inputs (e.g. purchase invoice header and lines information, in comma separated field-value pairs) before navigating to the target web pages
The tools that the Responses API has access to are:
tools_list = [
{
"type": "file_search",
"vector_store_ids": [vector_store_id_to_use],
"max_num_results": 20,
},
{
"type": "function",
"name": "post_purchase_invoice_header",
"description": "post the purchase invoice header data to the system",
"parameters": {
"type": "object",
"properties": {
"instructions": {
"type": "string",
"description": "The instructions to populate and post form data in the purchase invoice header form in the web page",
},
},
"required": ["instructions"],
},
},
{
"type": "function",
"name": "retrieve_contract",
"description": "fetch contract details for the given contractid",
"parameters": {
"type": "object",
"properties": {
"contractid": {
"type": "string",
"description": "The contract id registered for the Supplier in the System",
},
"instructions": {
"type": "string",
"description": "The instructions to populate and post form data in the purchase invoice header form in the web page",
},
},
"required": ["contractid", "instructions"],
},
},
]
1. Invoice Processing with Vision AI
The process begins when a user submits an invoice image for processing. The Responses API uses GPT-4o's vision capabilities to extract structured data from these documents, like the Purchase Invoice Header & lines, including the Contract number.
This step is autonomously performed by Responses API and does not involve any custom code.
2. Fetch Contract details using CUA model
The Contract number obtained above is required to navigate to the web page in the Line of Business Application to retrieve the matching Contract Header & lines information. The Responses API, through Function Calling, uses Playwright and the CUA Model to automate this step.
A Chromium browser opens up automatically through Playwright commands and the specific Contract object is navigated to. It takes a screen shot of the page that is then sent to the CUA Model.
The CUA Model views the loaded page uses its Vision capabilities and returns the contract header and lines information as a JSON Document for further processing.
async def retrieve_contract(contractid:str, instructions: str):
"""
Asynchronously retrieves the contract header and contract details through web automation.
This function navigates to a specified URL, follows given instructions to get the data on the page
in the form of a JSON document. It uses Playwright for web automation.
Args:
contractid (str): The id of the contract for which the data is to be retrieved.
instructions (str): User instructions for processing the data on this page.
Returns:
str: JSON string containing the contract data extracted from the page.
Raises:
ValueError: If no output is received from the model.
"""
async with LocalPlaywrightComputer() as computer:
tools = [
{
"type": "computer-preview",
"display_width": computer.dimensions[0],
"display_height": computer.dimensions[1],
"environment": computer.environment,
}
]
items = []
contract_url = contract_data_url + f"/{contractid}"
print(f"Navigating to contract URL: {contract_url}")
await computer.goto(contract_url)
# Wait for page to load completely
await computer.wait_for_load_state()
# i want to wait for 2 seconds to ensure the page is fully loaded
await asyncio.sleep(2)
# Take a screenshot to ensure the page content is captured
screenshot_bytes = await computer.screenshot()
screenshot_base64 = base64.b64encode(screenshot_bytes).decode('utf-8')
........ more code ....
This is the call made to the CUA Model with the screenshot to proceed with the data extraction
# Create very clear and specific instructions for the model
user_input = "You are currently viewing a contract details page. Please extract ALL data visible on this page into a JSON format. Include all field names and values. Format the response as a valid JSON object with no additional text before or after."
# Start the conversation with the screenshot and clear instructions - format fixed for image_url
items.append({
"role": "user",
"content": [
{"type": "input_text", "text": user_input},
{"type": "input_image", "image_url": f"data:image/png;base64,{screenshot_base64}"}
]
})
# Track if we received JSON data
json_data = None
max_iterations = 3 # Limit iterations to avoid infinite loops
current_iteration = 0
while json_data is None and current_iteration < max_iterations:
current_iteration += 1
print(f"Iteration {current_iteration} of {max_iterations}")
response = client.responses.create(
model="computer-use-preview",
input=items,
tools=tools,
truncation="auto",
)
# Access the output items directly from response.output
if not hasattr(response, 'output') or not response.output:
raise ValueError("No output from model")
print(f"Response: {response.output}")
items += response.output
3. Vector search to retrieve Business Rules
This step is performed autonomously by the Responses API where it searches for the business rules to be applied here, for anomaly detection. It uses the Vector Index created in Azure OpenAI.
Note that this is not Azure AI Search, but a turnkey Vector (File) Search tool capability in Responses API and Assistants API.
4. Evaluate business rules to detect anomalies
This step is performed autonomously by the Responses API using the reasoning capabilities in gpt-4o model. It generates a detailed report after performing the anomaly detection, after applying the business rules retrieved above, on the Purchase Invoice and the Contract Data from the previous steps. Towards the end of the program run, you will observe this getting printed on the Terminal in VS Code.
5. Using CUA Model to post the Purchase Invoice
This step is invoked by the Responses API through Function Calling
After Playwright takes a screen shot of the empty form on the Purchase invoice creation web page, it is sent to the CUA Model, which returns with instructions to Playwright to perform Form filling operation, by navigating through them field by field, filling values, and finally saving the form through a mouse click actions.
You can view a video demo of this application in action
Here is a link to the GitHub Repositories that this blog accompanies
- This Application > CUA-Automation-P2P
- The Web Application Project - CUA-Automation-P2P-Web