azure ai vision
39 TopicsDownloading tagged images from Azure Custom Vision
This blog post offers a comprehensive guide to efficiently downloading tagged images from Azure Custom Vision using Microsoft’s RESTful API. It walks readers through the process of authenticating with training keys, retrieving project tags, and automating bulk image downloads - overcoming the platform’s lack of a built-in export feature. Ideal for machine learning practitioners seeking reliable methods to audit, migrate, or back up their valuable training data.251Views0likes0CommentsAI Automation in Azure Foundry through turnkey MCP Integration and Computer Use Agent Models
The Fashion Trends Discovery Scenario In this walkthrough, we'll explore a sample application that demonstrates the power of combining Computer Use (CUA) models with Playwright browser automation to autonomously compile trend information from the internet, while leveraging MCP integration to intelligently catalog and store insights in Azure Blob Storage. The User Experience A fashion analyst simply provides a query like "latest trends in sustainable fashion" to our command-line interface. What happens next showcases the power of agentic AI—the system requires no further human intervention to: Autonomous Web Navigation: The agent launches Pinterest, intelligently locates search interfaces, and performs targeted queries Intelligent Content Discovery: Systematically identifies and interacts with trend images, navigating to detailed pages Advanced Content Analysis: Applies computer vision to analyze fashion elements, colors, patterns, and design trends Intelligent Compilation: Consolidates findings into comprehensive, professionally formatted markdown reports Contextual Storage: Recognizes the value of preserving insights and autonomously offers cloud storage options Technical capabilities leveraged Behind this seamless experience lies a coordination of AI models: Pinterest Navigation: The CUA model visually understands Pinterest's interface layout, identifying search boxes and navigation elements with pixel-perfect precision Search Results Processing: Rather than relying on traditional DOM parsing, our agent uses visual understanding to identify trend images and calculate precise interaction coordinates Content Analysis: Each discovered trend undergoes detailed analysis using GPT-4o's advanced vision capabilities, extracting insights about fashion elements, seasonal trends, and style patterns Autonomous Decision Making: The agent contextually understands when information should be preserved and automatically engages with cloud storage systems Technology Stack Overview At the heart of this solution lies an orchestration of several AI technologies, each serving a specific purpose in creating a truly autonomous agent. The architecture used ``` ┌─────────────────────────────────────────────────────────────────┐ │ Azure AI Foundry │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Responses API │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │ │ │ │ CUA Model │ │ GPT-4o │ │ Built-in MCP │ │ │ │ │ │ (Interface) │ │ (Content) │ │ Client │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Function Calling Layer │ │ (Workflow Orchestration) │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Playwright │◄──────────────► │ Trends Compiler │ │ Automation │ │ Engine │ └─────────────────┘ └──────────────────┘ │ ▼ ┌─────────────────────┐ │ Azure Blob │ │ Storage (MCP) │ └─────────────────────┘ ``` Azure OpenAI Responses API At the core of the agentic architecture in this solution, the Responses API provides intelligent decision-making capabilities that determine when to invoke Computer Use models for web crawling versus when to engage MCP servers for data persistence. This API serves as the brain of our agent, contextually understanding user intent and autonomously choosing the appropriate tools to fulfill complex multi-step workflows. Computer Use (CUA) Model Our specialized CUA model excels at visual understanding of web interfaces, providing precise coordinate mapping for browser interactions, layout analysis, and navigation planning. Unlike general-purpose language models, the CUA model is specifically trained to understand web page structures, identify interactive elements, and provide actionable coordinates for automated browser control. Playwright Browser Automation Acting as the hands of our agent, Playwright executes the precise actions determined by the CUA model. This robust automation framework translates AI insights into real-world browser interactions, handling everything from clicking and typing to screenshot capture and page navigation with pixel-perfect accuracy. GPT-4o Vision Model for Content Analysis While the CUA model handles interface understanding, GPT-4o provides domain-specific content reasoning. This powerful vision model analyzes fashion trends, extracts meaningful insights from images, and provides rich semantic understanding of visual content—capabilities that complement rather than overlap with the CUA model's interface-focused expertise. Model Context Protocol (MCP) Integration The application showcases the power of agentic AI through its autonomous decision-making around data persistence. The agent intelligently recognizes when compiled information needs to be stored and automatically engages with Azure Blob Storage through MCP integration, without requiring explicit user instruction for each storage operation. Unlike traditional function calling patterns where custom applications must relay MCP calls through client libraries, the Responses API includes a built-in MCP client that directly communicates with MCP servers. This eliminates the need for complex relay logic, making MCP integration as simple as defining tool configurations. Function Calling Orchestration Function calling orchestrates the complex workflow between CUA model insights and Playwright actions. Each step is verified and validated before proceeding, ensuring robust autonomous operation without human intervention throughout the entire trend discovery and analysis process. Let me walk you through the code used in the Application. Agentic Decision Making in Action Let's examine how our application demonstrates true agentic behavior through the main orchestrator in `app.py`: async def main() -> str: """Main entry point demonstrating agentic decision making.""" conversation_history = [] generated_reports = [] while True: user_query = input("Enter your query for fashion trends:-> ") # Add user input to conversation context new_user_message = { "role": "user", "content": [{"type": "input_text", "text": user_query}], } conversation_history.append(new_user_message) # The agent analyzes context and decides on appropriate actions response = ai_client.create_app_response( instructions=instructions, conversation_history=conversation_history, mcp_server_url=config.mcp_server_url, available_functions=available_functions, ) # Process autonomous function calls and MCP tool invocations for output in response.output: if output.type == "function_call": # Agent decides to compile trends function_to_call = available_functions[output.name] function_args = json.loads(output.arguments) function_response = await function_to_call(**function_args) elif output.type == "mcp_tool_call": # Agent decides to use MCP tools for storage print(f"MCP tool call: {output.name}") # MCP calls handled automatically by Responses API Key Agentic Behaviors Demonstrated: Contextual Analysis: The agent examines conversation history to understand whether the user wants trend compilation or storage operations Autonomous Tool Selection: Based on context, the agent chooses between function calls (for trend compilation) and MCP tools (for storage) State Management: The agent maintains conversation context across multiple interactions, enabling sophisticated multi-turn workflows Function Calling Orchestration: Autonomous Web Intelligence The `TrendsCompiler` class in `compiler.py` demonstrates sophisticated autonomous workflow orchestration: class TrendsCompiler: """Autonomous trends compilation with multi-step verification.""" async def compile_trends(self, user_query: str) -> str: """Main orchestration loop with autonomous step progression.""" async with LocalPlaywrightComputer() as computer: state = {"trends_compiled": False} step = 0 while not state["trends_compiled"]: try: if step == 0: # Step 1: Autonomous Pinterest navigation await self._launch_pinterest(computer) step += 1 elif step == 1: # Step 2: CUA-driven search and coordinate extraction coordinates = await self._search_and_get_coordinates( computer, user_query ) if coordinates: step += 1 elif step == 2: # Step 3: Autonomous content analysis and compilation await self._process_image_results( computer, coordinates, user_query ) markdown_report = await self._generate_markdown_report( user_query ) state["trends_compiled"] = True except Exception as e: print(f"Autonomous error handling in step {step}: {e}") state["trends_compiled"] = True return markdown_report Autonomous Operation Highlights: Self-Verifying Steps: Each step validates completion before advancing Error Recovery: Autonomous error handling without human intervention State-Driven Progression: The agent maintains its own execution state No User Prompts: Complete automation from query to final report Pinterest's Unique Challenge: Visual Coordinate Intelligence One of the most impressive demonstrations of CUA model capabilities lies in solving Pinterest's hidden URL challenge: async def _detect_search_results(self, computer) -> List[Tuple[int, int, int, int]]: """Use CUA model to extract image coordinates from search results.""" # Take screenshot for CUA analysis screenshot_bytes = await computer.screenshot() screenshot_b64 = base64.b64encode(screenshot_bytes).decode() # CUA model analyzes visual layout and identifies image boundaries prompt = """ Analyze this Pinterest search results page and identify all trend/fashion images displayed. For each image, provide the exact bounding box coordinates in the format: <click>x1,y1,x2,y2</click> Focus on the main content images, not navigation or advertisement elements. """ response = await self.ai_client.create_cua_response( prompt=prompt, screenshot_b64=screenshot_b64 ) # Extract coordinates using specialized parser coordinates = self.coordinate_parser.extract_coordinates(response.content) print(f"CUA model identified {len(coordinates)} image regions") return coordinates The Coordinate Calculation: def calculate_centers(self, coordinates: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int]]: """Calculate center coordinates for precise clicking.""" centers = [] for x1, y1, x2, y2 in coordinates: center_x = (x1 + x2) // 2 center_y = (y1 + y2) // 2 centers.append((center_x, center_y)) return centers key take aways with this approach: No DOM Dependency: Pinterest's hover-based URL revelation becomes irrelevant Visual Understanding: The CUA model sees what humans see—image boundaries Pixel-Perfect Targeting: Calculated center coordinates ensure reliable clicking Robust Navigation: Works regardless of Pinterest's frontend implementation changes Model Specialization: The Right AI for the Right Job Our solution demonstrates sophisticated AI model specialization: async def _analyze_trend_page(self, computer, user_query: str) -> Dict[str, Any]: """Use GPT-4o for domain-specific content analysis.""" # Capture the detailed trend page screenshot_bytes = await computer.screenshot() screenshot_b64 = base64.b64encode(screenshot_bytes).decode() # GPT-4o analyzes fashion content semantically analysis_prompt = f""" Analyze this fashion trend page for the query: "{user_query}" Provide detailed analysis of: 1. Fashion elements and style characteristics 2. Color palettes and patterns 3. Seasonal relevance and trend timing 4. Target demographics and style categories 5. Design inspiration and cultural influences Format as structured markdown with clear sections. """ # Note: Using GPT-4o instead of CUA model for content reasoning response = await self.ai_client.create_vision_response( model=self.config.vision_model_name, # GPT-4o prompt=analysis_prompt, screenshot_b64=screenshot_b64 ) return { "analysis": response.content, "timestamp": datetime.now().isoformat(), "query_context": user_query } Model Selection Rationale: CUA Model: Perfect for understanding "Where to click" and "How to navigate" GPT-4o: Excels at "What does this mean" and "How is this relevant" Specialized Strengths: Each model operates in its domain of expertise Complementary Intelligence: Combined capabilities exceed individual model limitations Compilation and Consolidation async def _generate_markdown_report(self, user_query: str) -> str: """Consolidate all analyses into comprehensive markdown report.""" if not self.image_analyses: return "No trend data collected for analysis." # Intelligent report structuring report_sections = [ f"# Fashion Trends Analysis: {user_query}", f"*Generated on {datetime.now().strftime('%B %d, %Y')}*", "", "## Executive Summary", await self._generate_executive_summary(), "", "## Detailed Trend Analysis" ] # Process each analyzed trend with intelligent categorization for idx, analysis in enumerate(self.image_analyses, 1): trend_section = [ f"### Trend Item {idx}", analysis.get('analysis', 'No analysis available'), f"*Analysis timestamp: {analysis.get('timestamp', 'Unknown')}*", "" ] report_sections.extend(trend_section) # Add intelligent trend synthesis report_sections.extend([ "## Trend Synthesis and Insights", await self._generate_trend_synthesis(), "", "## Recommendations", await self._generate_recommendations() ]) return "\n".join(report_sections) Intelligent Compilation Features: Automatic Structuring: Creates professional report formats automatically Content Synthesis: Combines individual analyses into coherent insights Temporal Context: Maintains timestamp and query context Executive Summaries: Generates high-level insights from detailed data Autonomous Storage Intelligence Note that there is no MCP Client code that needs to be implemented here. The integration is completely turnkey, through configuration alone. # In app_client.py - MCP tool configuration def create_app_tools(self, mcp_server_url: str, available_functions: Dict[str, Any]) -> List[Dict[str, Any]]: """Configure tools with automatic MCP integration.""" tools = [ { "type": "mcp", "server_label": "azure-storage-mcp-server", "server_url": mcp_server_url, "require_approval": "never", # Autonomous operation "allowed_tools": ["create_container", "list_containers", "upload_blob"], } ] return tools # Agent instructions demonstrate contextual intelligence instructions = f""" Step1: Compile trends based on user query using computer use agent. Step2: Prompt user to store trends report in Azure Blob Storage. Use MCP Server tools to perform this action autonomously. IMPORTANT: Maintain context of previously generated reports. If user asks to store a report, use the report generated in this session. """ Turnkey MCP Integration: Direct API Calls: MCP tools called directly by Responses API No Relay Logic: No custom MCP client implementation required Autonomous Tool Selection: Agent chooses appropriate MCP tools based on context Contextual Storage: Agent understands what to store and when Demo and Code reference Here is the GitHub Repo of the Application described in this post. See a demo of this application in action: Conclusion: Entering the Age of Practical Agentic AI The Fashion Trends Compiler Agent represents Agentic AI applications that work autonomously in real-world scenarios. By combining Azure AI Foundry's turnkey MCP integration with specialized AI models and robust automation frameworks, we've created an agent that doesn't just follow instructions but intelligently navigates complex multi-step workflows with minimal human oversight. Ready to build your own agentic AI solutions? Start exploring Azure AI Foundry's MCP integration and Computer Use capabilities to create the next generation of intelligent automation.1.7KViews3likes0CommentsConfigure Embedding Models on Azure AI Foundry with Open Web UI
Introduction Let’s take a closer look at an exciting development in the AI space. Embedding models are the key to transforming complex data into usable insights, driving innovations like smarter chatbots and tailored recommendations. With Azure AI Foundry, Microsoft’s powerful platform, you’ve got the tools to build and scale these models effortlessly. Add in Open Web UI, a intuitive interface for engaging with AI systems, and you’ve got a winning combo that’s hard to beat. In this article, we’ll explore how embedding models on Azure AI Foundry, paired with Open Web UI, are paving the way for accessible and impactful AI solutions for developers and businesses. Let’s dive in! To proceed with configuring the embedding model from Azure AI Foundry on Open Web UI, please firstly configure the requirements below. Requirements: Setup Azure AI Foundry Hub/Projects Deploy Open Web UI – refer to my previous article on how you can deploy Open Web UI on Azure VM. Optional: Deploy LiteLLM with Azure AI Foundry models to work on Open Web UI - refer to my previous article on how you can do this as well. Deploying Embedding Models on Azure AI Foundry Navigate to the Azure AI Foundry site and deploy an embedding model from the “Model + Endpoint” section. For the purpose of this demonstration, we will deploy the “text-embedding-3-large” model by OpenAI. You should be receiving a URL endpoint and API Key to the embedding model deployed just now. Take note of that credential because we will be using it in Open Web UI. Configuring the embedding models on Open Web UI Now head to the Open Web UI Admin Setting Page > Documents and Select Azure Open AI as the Embedding Model Engine. Copy and Paste the Base URL, API Key, the Embedding Model deployed on Azure AI Foundry and the API version (not the model version) into the fields below: Click “Save” to reflect the changes. Expected Output Now let us look into the scenario for when the embedding model configured on Open Web UI and when it is not. Without Embedding Models configured. With Azure Open AI Embedding models configured. Conclusion And there you have it! Embedding models on Azure AI Foundry, combined with the seamless interaction offered by Open Web UI, are truly revolutionizing how we approach AI solutions. This powerful duo not only simplifies the process of building and deploying intelligent systems but also makes cutting-edge technology more accessible to developers and businesses of all sizes. As we move forward, it’s clear that such integrations will continue to drive innovation, breaking down barriers and unlocking new possibilities in the AI landscape. So, whether you’re a seasoned developer or just stepping into this exciting field, now’s the time to explore what Azure AI Foundry and Open Web UI can do for you. Let’s keep pushing the boundaries of what’s possible!1.1KViews0likes0CommentsFrom Extraction to Insight: Evolving Azure AI Content Understanding with Reasoning and Enrichment
First introduced in public preview last year, Azure AI Content Understanding enables you to convert unstructured content—documents, audio, video, text, and images—into structured data. The service is designed to support consistent, high-quality output, directed improvements, built-in enrichment, and robust pre-processing to accelerate workflows and reduce cost. A New Chapter in Content Understanding Since our launch we’ve seen customers pushing the boundaries to go beyond simple data extraction with agentic solutions fully automating decisions. This requires more than just extracting fields. For example, a healthcare insurance provider decision to pay a claim requires cross-checking against insurance policies, applicable contracts, patient’s medical history and prescription datapoints. To do this a system needs the ability to interpret information in context, perform more complex enrichments and analysis across various data sources. Beyond field extraction, this requires a custom designed workflow leveraging reasoning. In response to this demand, Content Understanding now introduces Pro mode which enables enhanced reasoning, validation, and information aggregation capabilities. These updates allow the service to aggregate and compare results across sources, enrich extracted data with context, and deliver decisions as output. While Standard mode continues to offer reliable and scalable field extraction, Pro mode extends the service to support more complex content interpretation scenarios—enabling workflows that reflect the way people naturally reason over data. With this update, Content Understanding now solves a much larger component of your data processing workflows, offering new ways to automate, streamline, and enhance decision-making based on unstructured information. Key Benefits of Pro Mode Packed with cutting-edge reasoning capabilities, Pro mode revolutionizes document analysis. Multi-Content Input Process and aggregate information across multiple content files in a single request. Pro mode can build a unified schema from distributed data sources, enabling richer insight across documents. Multi-Step Reasoning Go beyond basic extraction with a process that supports reasoning, linking, validation, and enrichment. Knowledge Base Integration Seamlessly integrate with organizational knowledge bases and domain-specific datasets to enhance field inference. This ensures outputs can reason over the task of generating the output using the context of your business. When to Use Pro Mode Pro mode, currently limited to documents, is designed for scenarios where content understanding needs to go beyond surface-level extraction—ideal for use cases that traditionally require postprocessing, human review and decision-making based on multiple data points and contextual references. Pro mode enables intelligent processing that not only extracts data, but also validates, links, and enriches it. This is especially impactful when extracted information must be cross-referenced with external datasets or internal knowledge sources to ensure accuracy, consistency, and contextual depth. Examples include: Invoice processing that reconciles against purchase orders and contract terms Healthcare claims validation using patient records and prescription history Legal document review where clauses reference related agreements or precedents Manufacturing spec checks against internal design standards and safety guidelines By automating much of the reasoning, you can focus on higher value tasks! Pro mode helps reduce manual effort, minimize errors, and accelerate time to insight—unlocking new potential for downstream applications, including those that emulate higher-order decision-making. Simplified Pricing Model Introducing a simplified pricing structure that significantly reduces costs across all content modalities compared to previous versions, making enterprise-scale deployment more affordable and predictable. Expanded Feature Coverage We are also extending capabilities across various content types: Structured Document Outputs: Improved handling of tables spanning multiple pages, recognition of selection marks, and support for additional file types like .docx, .xlsx, .pptx, .msg, .eml, .rtf, .html, .md, and .xml. Classifier API: Automatically categorize/split and route documents to appropriate processing pipelines. Video Analysis: Extract data across an entire video or break a video into chapters automatically. Enrich metadata with face identification and descriptions that include facial images. Face API Preview: Detect, recognize, and enroll faces, enabling richer user-aware applications. Check out the details about each of these capabilities here - What's New for Content Understanding. Let's hear it from our customers Customers all over the globe are using Content Understanding for its powerful one-stop solution capabilities by leveraging advance modes of reasoning, grounding and confidence scores across diverse content types. ASC: AI-based analytics in ASC’s Recording Insights platform allows customers to move to a 100% compliance review coverage of conversations across multiple channels. ASC’s integration of Content Understanding replaces a previously complex setup—where multiple separate AI services had to be manually connected—with a single multimodal solution that delivers transcription, summarization, sentiment analysis, and data extraction in one streamlined interface. This shift not only simplifies implementation and accelerates time-to-value but also received positive customer feedback for its powerful features and the quick, hands-on support from Microsoft product teams. “With the integration of Content Understanding into the ASC Recording Insights platform, ASC was able to reduce R&D effort by 30% and achieve 5 times faster results than before. This helps ASC drive customer satisfaction and stay ahead of competition.” —Tobias Fengler, Chief Engineering Officer, ASC. To learn more about ASCs integration check out From Complexity to Simplicity: The ASC and Azure AI Partnership.” Ramp: Ramp, the all-in-one financial operations platform, is exploring how Azure AI Content Understanding can help transform receipts, bills, and multi-line invoices into structured data automatically. Ramp is leveraging the pre-built invoice template and experimenting with custom extraction capabilities across various document types. These experiments are helping Ramp evaluate how to further reduce manual entry and enhance the real-time logic that powers approvals, policy checks, and reconciliation. “Content Understanding gives us a single API to parse every receipt and statement we see—then lets our own AI reason over that data in real time. It's an efficient path from image to fully reconciled expense.” — Rahul S, Head of AI, Ramp MediaKind: MK.IO’s cloud-native video platform, available on Azure Marketplace—now integrates Azure AI Content Understanding to make it easy for developers to personalize streaming experiences. With just a few lines of code, you can turn full game footage into real-time, fan-specific highlight reels using AI-driven metadata like player actions, commentary, and key moments. “Azure AI Content Understanding gives us a new level of control and flexibility—letting us generate insights instantly, personalize streams automatically, and unlock new ways to engage and monetize. It’s video, reimagined.” —Erik Ramberg, VP, MediaKind Catch the full story from MediaKind in our breakout session at Build 2025 on May 18: My Game, My Way, where we walk you through the creation of personalized highlight reels in real-time. You’ll never look at your TV in the same way again. Getting Started For more details about the latest from Content Understanding check out Reasoning on multimodal content for efficient agentic AI app building Wednesday, May 21 at 2 PM PST Build your own Content Understanding solution in the Azure AI Foundry. Pro mode will be available in the Foundry starting June 1 st 2025 Refer to our documentation and sample code on Content Understanding Explore the video series on getting started with Content Understanding1.8KViews1like0CommentsIntroducing Azure AI Content Understanding for Beginners
Enterprises today face several challenges in processing and extracting insights from multimodal data, like managing diverse data formats, ensuring data quality, and streamlining workflows efficiently. Ensuring the accuracy and usability of extracted insights often requires advanced AI techniques, while inefficiencies in managing large data volumes increase costs and delay results. Azure AI Content Understanding addresses these pain points by offering a unified solution to transform unstructured data into actionable insights, improve data accuracy with schema extraction and confidence scoring, and integrate seamlessly with Azure’s ecosystem to enhance efficiency and reduce costs. Content Understanding makes it easy to extract custom task-specific output without advanced GenAI skills. It enables a quick path to scale for retrieval augmented generation (RAG) grounded by multimodal data or transactional content processing for agent workflows and process automation. We are excited to announce a new video series to help you get started with Azure AI Content Understanding and extract the task specific output for your business. Whether you're looking for a well-rounded overview, want to discover how to develop a RAG index ovideo content, or learn how to build a post-call analytics workflow, this series has something for everyone. What is Azure AI Content Understanding? Azure AI Content Understanding is a new Azure AI service, designed to process and transform content of any type, including documents, images, videos, audio, and text into a user-defined output schema. This streamlined process allows developers to reason over large amounts of unstructured data, accelerating time-to-value by generating an output that can be easily integrated into agentic, automation and analytical workflows. Video Series Highlights Azure AI Content Understanding: How to Get Started - Vinod Kurpad, Principal GPM, AI Services, shows how you can process content of any modality—audio, video, documents, and text—in a unified workflow in Azure AI Foundry using Azure AI Content Understanding. It's simple, intuitive, and doesn't require any GenAI skills. 2. Post-call Analytics Using Azure AI Content Understanding - Jan Goergen Senior Program Manager, AI Services shows how to process any number of video or audio call recordings quickly in Azure AI Foundry by leveraging the Post‑Call Analytics template powered by Content Understanding. The video also introduces the broader concept of templates, illustrating how you can embed Content Understanding into reusable templates that you can build, deploy, and share across projects. 3. RAG on Video Using Azure AI Content Understanding - Joe Filcik, Principal Product Manager, AI Services, shows how you can process videos and ground them on your data with multimodal retrieval augmented generation (RAG) to derive insights that would otherwise take much longer. Joe demonstrates how this can be achieved using a single Azure AI Content Understanding API in Azure AI Foundry. Why Azure AI Content Understanding? The Azure AI Content Understanding service is ideal for enterprises and developers looking to process large amounts of multimodal content, such as call center recordings and videos for training and compliance, without requiring GenAI skills such as prompt-engineering and model selection. Enjoy the video series and start exploring the possibilities with Azure AI Content Understanding. For additional resources: Watch the Video Series Try it in Azure AI Foundry Content Understanding documentation Content Understanding samples Feedback? Contact us at cu_contact@microsoft.com1.3KViews0likes0CommentsWhy Azure AI Is Retail’s Secret Sauce
Executive Summary Leading RCG enterprises are standardizing on Azure AI—specifically Azure OpenAI Service, Azure Machine Learning, Azure AI Search, and Azure AI Vision—to increase digital‑channel conversion, sharpen demand forecasts, automate store execution, and accelerate product innovation. Documented results include up to 30 percent uplift in search conversion, 10 percent reduction in stock‑outs, and multimillion‑dollar productivity gains. This roadmap consolidates field data from CarMax, Kroger, Coca‑Cola, Estée Lauder, PepsiCo and Microsoft reference architectures to guide board‑level investment and technology planning. 1 Strategic Value of Azure AI Azure AI delivers state‑of‑the‑art language (GPT‑4o, GPT-4.1), reasoning (o1, o3, o4-mini) and multimodal (Phi‑3 Vision) models through Azure OpenAI Service while unifying machine‑learning, search, and vision APIs under one security, compliance, and Responsible AI framework. Coca‑Cola validated Azure’s enterprise scale with a $1.1 billion, five‑year agreement covering generative AI across marketing, product R&D and customer service (Microsoft press release; Reuters). 2 Customer‑Experience Transformation 2.1 AI‑Enhanced Search & Recommendations Microsoft’s Two‑Stage AI‑Enhanced Search pattern—vector search in Azure AI Search followed by GPT reranking—has lifted search conversion by up to 30 percent in production pilots (Tech Community blog). CarMax uses Azure OpenAI generates concise summaries for millions of vehicle reviews, improving SEO performance and reducing editorial cycles from weeks to hours (Microsoft customer story). 2.2 Conversational Commerce The GPT‑4o real‑time speech endpoint supports multilingual voice interaction with end‑to‑end latencies below 300 ms—ideal for kiosks, drive‑thrus, and voice‑enabled customer support (Azure AI dev blog). 3 Supply‑Chain & Merchandising Excellence Azure Machine Learning AutoML for Time‑Series automates feature engineering, hyper‑parameter tuning, and back‑testing for SKU‑level forecasts (AutoML tutorial; methodology guide). PepsiCo reported lower inventory buffers and improved promotional accuracy during its U.S. pilot and is scaling globally (PepsiCo case study). In February 2025 Microsoft published an agentic systems blueprint that layers GPT agents on top of forecast outputs to generate replenishment quantities and route optimizations, compressing decision cycles in complex supply chains (Microsoft industry blog). 4 Marketing & Product Innovation Estée Lauder and Microsoft established an AI Innovation Lab that uses Azure OpenAI to accelerate concept development and campaign localization across 20 prestige brands (Estée Lauder press release). Coca‑Cola applies the same foundation models to generate ad copy, packaging text, and flavor concepts, maximizing reuse of trained embeddings across departments. Azure AI Studio provides prompt versioning, automated evaluation, and CI/CD pipelines for generative‑AI applications, reducing time‑to‑production for retail creative teams (Azure AI Studio blog). 5 Governance & Architecture The open‑source Responsible AI Toolbox bundles dashboards for fairness, interpretability, counterfactual analysis, and error inspection, enabling documented risk mitigation for language, vision, and tabular models (Responsible AI overview). Microsoft’s Retail Data Solutions Reference Architecture describes how to land POS, loyalty, and supply‑chain data into Microsoft Fabric or Synapse Lakehouses and expose it to Azure AI services through governed semantic models (architecture guide). 6 Implementation Roadmap Phase Key Activities Azure AI Services & Assets 0 – Foundation (Weeks 0‑2) Align business goals, assess data, deploy landing zone Azure Landing Zone; Retail Data Architecture 1 – Pilot (Weeks 3‑6) Build one measurable use case (e.g., AI Search or AutoML forecasting) in Azure AI Studio Azure AI Search; Azure OpenAI; Azure ML AutoML 2 – Industrialize (Months 2‑6) Integrate with commerce/ERP; add Responsible AI monitoring; CI/CD automation Responsible AI Toolbox 3 – Scale Portfolio (Months 6‑12) Extend to smart‑store vision, generative marketing, and agentic supply chain Azure AI Vision; agentic systems pattern Pilots typically achieve < 6‑week time‑to‑value and 3–7 percentage‑point operating‑margin improvement when search conversion gains, inventory precision, and store‑associate efficiency are combined (see CarMax, PepsiCo, and Kroger sources above). 7 Key Takeaways for Executives Unified Platform: Generative, predictive, and vision workloads run under one governance model and SLA. Proven Financial Impact: Field results confirm double‑digit revenue uplift and meaningful OPEX savings. Future‑Proof Investments: Continuous model refresh (GPT‑4.1, o3, o4-mini) and clear migration guidance protect ROI. Built‑in Governance: Responsible AI tooling accelerates compliance and audit readiness. Structured Scale Path: A phased roadmap de‑risks experimentation and enables enterprise deployment within 12 months. Bottom line: Azure AI provides the technical depth, operational maturity, and economic model required to deploy AI at scale across RCG value chains—delivering quantifiable growth and efficiency without introducing multi‑vendor complexity.493Views0likes0CommentsUsing the CUA model in Azure OpenAI for procure to Pay Automation
Solution Architecture The solution leverages a comprehensive stack of Azure technologies: **Azure OpenAI Service**: Powers core AI capabilities Responses API: Orchestrates the workflow, by calling the tools below and performing actions automatically. Computer Using Agent (CUA) model: Enables browser automation. This is called through Function Calling, since there are other steps to be performed between the calls to this model, where the gpt-4o model is used, like reasoning through vision, performing vector search and evaluating business rules for anomalies detection. GPT-4o: Processes invoice images with vision capabilities Vector store: Maintains business rules and documentation Azure Container Apps: Hosts procurement web applications Azure SQL Database: Stores contract and procurement data Playwright: Handles browser automation underneath the CUA Technical Flow: Under the Hood Let's dive into the step-by-step execution flow to understand how the solution works. The application merely calls the Responses API and provides instructions in natural language about what needs to be done in what sequence. Based on these instructions, the Responses API orchestrates the call to the other models and tools. It takes care of preparing the data for every next call based on the output from the previous call. For example, in this case, the instructions are: instructions = """ This is a Procure to Pay process. You will be provided with the Purchase Invoice image as input. Note that Step 3 can be performed only after Step 1 and Step 2 are completed. Step 1: As a first step, you will extract the Contract ID from the Invoice and also all the line items from the Invoice in the form of a table. Step 2: You will then use the function tool to call the computer using agent with the Contract ID to get the contract details. Step 3: You will then use the file search tool to retrieve the business rules applicable to detection of anomalies in the Procure to Pay process. Step 4: Then, apply the retrieved business rules to match the invoice line items with the contract details fetched from in step 2, and detect anomalies if any. - Perform validation of the Invoice against the Contract and determine if there are any anomalies detected. - **When giving the verdict, you must call out each Invoice and Invoice line detail where the discrepancy was. Use your knowledge of the domain to interpret the information right and give a response that the user can store as evidence** - Note that it is ok for the quantities in the invoice to be lesser than the quantities in the contract, but not the other way around. - When providing the verdict, depict the results in the form of a Markdown table, matching details from the Invoice and Contract side-by-side. Verification of Invoice Header against Contract Header should be in a separate .md table format. That for the Invoice Lines verified against the Contract lines in a separate .md table format. - If the Contract Data is not provided as an input when evaluating the Business rules, then desist from providing the verdict. State in the response that you could not provide the verdict since the Contract Data was not provided as an input. **DO NOT MAKE STUFF UP**. **Use chain of thought when processing the user requests** Step 5: Finally, you will use the function tool to call the computer using agent with the Invoice details to post the invoice header data to the system. - use the content from step 4 above, under ### Final Verdict, for the value of the $remarks field, after replacing the new line characters with a space. - The instructions you must pass are: Fill the form with purchase_invoice_no '$PurchaseInvoiceNumber', contract_reference '$contract_reference', supplier_id '$supplierid', total_invoice_value $total_invoice_value (in 2335.00 format), invoice_date '$invoice_data' (string in mm/dd/yyyy format), status '$status', remarks '$remarks'. Save this information by clicking on the 'save' button. If the response message shows a dialog box or a message box, acknowledge it. \n An example of the user_input format you must send is -- 'Fill the form with purchase_invoice_no 'PInv_001', contract_reference 'contract997801', supplier_id 'supplier99010', total_invoice_value 23100.00, invoice_date '12/12/2024', status 'approved', remarks 'invoice is valid and approved'. Save this information by clicking on the 'save' button. If the response message shows a dialog box or a message box, acknowledge it' """ Note that we are giving few shot examples above that will be used by the CUA model to interpret the inputs (e.g. purchase invoice header and lines information, in comma separated field-value pairs) before navigating to the target web pages The tools that the Responses API has access to are: tools_list = [ { "type": "file_search", "vector_store_ids": [vector_store_id_to_use], "max_num_results": 20, }, { "type": "function", "name": "post_purchase_invoice_header", "description": "post the purchase invoice header data to the system", "parameters": { "type": "object", "properties": { "instructions": { "type": "string", "description": "The instructions to populate and post form data in the purchase invoice header form in the web page", }, }, "required": ["instructions"], }, }, { "type": "function", "name": "retrieve_contract", "description": "fetch contract details for the given contractid", "parameters": { "type": "object", "properties": { "contractid": { "type": "string", "description": "The contract id registered for the Supplier in the System", }, "instructions": { "type": "string", "description": "The instructions to populate and post form data in the purchase invoice header form in the web page", }, }, "required": ["contractid", "instructions"], }, }, ] 1. Invoice Processing with Vision AI The process begins when a user submits an invoice image for processing. The Responses API uses GPT-4o's vision capabilities to extract structured data from these documents, like the Purchase Invoice Header & lines, including the Contract number. This step is autonomously performed by Responses API and does not involve any custom code. 2. Fetch Contract details using CUA model The Contract number obtained above is required to navigate to the web page in the Line of Business Application to retrieve the matching Contract Header & lines information. The Responses API, through Function Calling, uses Playwright and the CUA Model to automate this step. A Chromium browser opens up automatically through Playwright commands and the specific Contract object is navigated to. It takes a screen shot of the page that is then sent to the CUA Model. The CUA Model views the loaded page uses its Vision capabilities and returns the contract header and lines information as a JSON Document for further processing. async def retrieve_contract(contractid:str, instructions: str): """ Asynchronously retrieves the contract header and contract details through web automation. This function navigates to a specified URL, follows given instructions to get the data on the page in the form of a JSON document. It uses Playwright for web automation. Args: contractid (str): The id of the contract for which the data is to be retrieved. instructions (str): User instructions for processing the data on this page. Returns: str: JSON string containing the contract data extracted from the page. Raises: ValueError: If no output is received from the model. """ async with LocalPlaywrightComputer() as computer: tools = [ { "type": "computer-preview", "display_width": computer.dimensions[0], "display_height": computer.dimensions[1], "environment": computer.environment, } ] items = [] contract_url = contract_data_url + f"/{contractid}" print(f"Navigating to contract URL: {contract_url}") await computer.goto(contract_url) # Wait for page to load completely await computer.wait_for_load_state() # i want to wait for 2 seconds to ensure the page is fully loaded await asyncio.sleep(2) # Take a screenshot to ensure the page content is captured screenshot_bytes = await computer.screenshot() screenshot_base64 = base64.b64encode(screenshot_bytes).decode('utf-8') ........ more code .... This is the call made to the CUA Model with the screenshot to proceed with the data extraction # Create very clear and specific instructions for the model user_input = "You are currently viewing a contract details page. Please extract ALL data visible on this page into a JSON format. Include all field names and values. Format the response as a valid JSON object with no additional text before or after." # Start the conversation with the screenshot and clear instructions - format fixed for image_url items.append({ "role": "user", "content": [ {"type": "input_text", "text": user_input}, {"type": "input_image", "image_url": f"data:image/png;base64,{screenshot_base64}"} ] }) # Track if we received JSON data json_data = None max_iterations = 3 # Limit iterations to avoid infinite loops current_iteration = 0 while json_data is None and current_iteration < max_iterations: current_iteration += 1 print(f"Iteration {current_iteration} of {max_iterations}") response = client.responses.create( model="computer-use-preview", input=items, tools=tools, truncation="auto", ) # Access the output items directly from response.output if not hasattr(response, 'output') or not response.output: raise ValueError("No output from model") print(f"Response: {response.output}") items += response.output 3. Vector search to retrieve Business Rules This step is performed autonomously by the Responses API where it searches for the business rules to be applied here, for anomaly detection. It uses the Vector Index created in Azure OpenAI. Note that this is not Azure AI Search, but a turnkey Vector (File) Search tool capability in Responses API and Assistants API. 4. Evaluate business rules to detect anomalies This step is performed autonomously by the Responses API using the reasoning capabilities in gpt-4o model. It generates a detailed report after performing the anomaly detection, after applying the business rules retrieved above, on the Purchase Invoice and the Contract Data from the previous steps. Towards the end of the program run, you will observe this getting printed on the Terminal in VS Code. 5. Using CUA Model to post the Purchase Invoice This step is invoked by the Responses API through Function Calling After Playwright takes a screen shot of the empty form on the Purchase invoice creation web page, it is sent to the CUA Model, which returns with instructions to Playwright to perform Form filling operation, by navigating through them field by field, filling values, and finally saving the form through a mouse click actions. You can view a video demo of this application in action Here is a link to the GitHub Repositories that this blog accompanies This Application > CUA-Automation-P2P The Web Application Project - CUA-Automation-P2P-Web1.4KViews0likes1CommentArizona Department of Transportation Innovates with Azure AI Vision
The Arizona Department of Transportation (ADOT) is committed to providing safe and efficient transportation services to the residents of Arizona. With a focus on innovation and customer service, ADOT’s Motor Vehicle Division (MVD) continually seeks new ways to enhance its services and improve the overall experience for its residents. The challenge ADOT MVD had a tough challenge to ensure the security and authenticity of transactions, especially those involving sensitive information. Every day, the department needs to verify thousands of customers seeking to use its online services to perform activities like updating customer information including addresses, renewing vehicle registrations, ordering replacement driver licenses, and ordering driver and vehicle records. Traditional methods of identity verification, such as manual checks and physical presence, were not only time-consuming and error-prone, but didn’t provide any confidence that the department was dealing with the right customer in remote interactions, such as online using its web portal. With high daily demand and stringent security requirements, the department recognized the need to enhance its digital presence and improve customer engagement. Facial verification technology has been a longstanding method for verifying a user's identity on-device and online account login for its convenience and efficiency. However, challenges are increasing as malicious actors persist in their attempts to manipulate and deceive the system through various spoofing techniques. The solution To address these challenges, the ADOT turned to Azure AI Vision Face API (also known as Azure Face Service), with Liveness Detection. This technology leverages advanced machine learning algorithms to verify the identity of individuals in real time. The Liveness Detection feature aims to verify that the system engages with a physically present, living individual during the verification process. This is achieved by differentiating between a real (live) and fake (spoof) representation which may include photographs, videos, masks, or other means to mimic a real person. By using facial verification and liveness detection, the system can determine whether the person in front of the camera is a live human being and not a photograph or a video. This cutting-edge technology has transformed the way the department operates to make it more efficient, secure, and reliable. Implementation and collaboration The department worked closely with Microsoft's team to ensure a seamless integration of the technology. "We were extremely excited to partner with Microsoft to use their passive liveness verification and facial verification all in one step," said Grant Hawkes, a contracted partner with the department’s Motor Vehicle Modernization (MvM) Project and its Lead Foundation Architect. "The Microsoft engineers were super receptive and super helpful. They would actually tweak the software a little bit for our use case, making our lives much easier. We have this wonderful working relationship with Microsoft, and they were extremely open with us, extremely receptive to ideas and whatever else it took. And we've only seen the ease of use get better and better and better.” Key benefits ADOT MVD has realized numerous benefits from the adoption of Azure AI Vision face liveness and verification functionality: Enhanced security—The technology has helped to reduce the risk of identity theft and fraud by enabling the verification of identities in real time, so the department can ensure that only authorized individuals can access sensitive information and complete transactions. Improved efficiency—By streamlining the verification process, the time required for identity checks has been reduced. In addition, the department is now able to offer some services online that were previously only able to be done in office, such as driver license renewals and title transfers. Accessibility—The technology has made the process easier for individuals with disabilities and the elderly to complete transactions, as they no longer have to make their way to an office for certain services. In this way, it's more inclusive and user-friendly. Cost-effective—The Azure AI Vision face technology works seamlessly across different devices, including laptops and smartphones, without requiring expensive hardware, and fits into ADOT’s existing budget. Verifying mobile driver's licenses (mDLs) is one of the most significant applications of this technology. Arizona was one of the first states to offer ISO 18013-5 compliant mDLs, allowing residents to store their driver's licenses on their mobile devices, making it more convenient and secure. Another notable application is electronic transfer of vehicle titles. Residents can now transfer vehicle titles electronically, eliminating the need for physical presence and paperwork. This will make the process much easier for citizens, while also making it more efficient and secure, reducing the risk of fraud. On-demand authentication ADOT MVD has also developed an innovative solution called on-demand authentication (ODA). This allows residents to verify their identity remotely using their mobile devices. When a resident calls ADOT MVD’s call center, they receive a text message with a link to verify their identity. The system uses Azure AI Vision to perform facial verification and liveness detection, ensuring that the person on the other end of the call is who they claim to be. "This technology has been key in mitigating fraud by increasing our confidence that we're working with the right person," said Grant Hawkes. "The whole process takes maybe a few seconds and is user-friendly for both the call center representative and the customer." Future plans The success of Azure AI Vision has prompted ADOT to explore further applications, and other state agencies are now looking at adopting the technology as well. "We see this growing and growing," said Grant Hawkes. "We're working to roll this technology out to more and more departments within the state as part of a unified identity solution. We see the value in this technology and what can be done with it." The ADOT’s adoption of Azure AI Vision Face liveness and verification functionality has transformed the way the department operates. By enhancing security, improving efficiency, and making services more accessible, the technology has brought significant benefits to both the department and the residents of Arizona. As the department continues to innovate and expand the use of this technology, it sets a benchmark for other states and organizations to follow. Our commitment to Trustworthy AI Organizations across industries are leveraging Azure AI and Copilot capabilities to drive growth, increase productivity, and create value-added experiences. We’re committed to helping organizations use and build AI that is trustworthy, meaning it is secure, private, and safe. We bring best practices and learnings from decades of researching and building AI products at scale to provide industry-leading commitments and capabilities that span our three pillars of security, privacy, and safety. Trustworthy AI is only possible when you combine our commitments, such as our Secure Future Initiative and our Responsible AI principles, with our product capabilities to unlock AI transformation with confidence. Get started: Learn more about Azure AI Vision. Learn more about Face Liveness Detection, a milestone in identity verification. See how face detection works. Try it now. Read about Enhancing Azure AI Vision Face API with Liveness Detection. Learn how Microsoft empowers responsible AI practices.467Views6likes1CommentAgentic P2P Automation: Harnessing the Power of OpenAI's Responses API
The Procure-to-Pay (P2P) process is traditionally error-prone and labor-intensive, requiring someone to manually open each purchase invoice, look up contract details in a separate system, and painstakingly compare the two to identify anomalies—a task prone to oversight and inconsistency. About the sample Application The 'Agentic' characteristics demonstrated here using the Responses API are: The client application makes a single call to the Responses API that internally handles all the actions autonomously, processes the information and returns the response. In other words, the client application does not have to perform those actions itself. These actions that the Responses API uses, are Hosted tools like (file search, vision-based reasoning). Function calling is used to invoke custom action not available in the Hosted tools (i.e. calling Azure Logic App in this case). The Responses API delegates control to the client application that executes the identified Function, hands over the response to the Responses API to complete the rest of the steps in the business process Handling of state across all the tool calls and orchestrating them in the right sequence are all handled by the Responses API. It autonomously takes the output from each Tool call and uses it to prepare the request for the next one. There is no Workflow logic implemented in the code to perform these steps. It is all done through natural language instructions passed when calling the Responses API, and through the Tool actions. The P2P Anomaly Detection system follows this workflow: Processes purchase invoice images using computer vision capabilities of gpt-4o Extracts critical information like Contract ID, Supplier ID, and line items from it Retrieves corresponding contract details from an external system via Azure Logic App, through Function Calling capabilities in Responses API Performs a vector Search for the business rules in the OpenAI vector store, for detection of anomalies in Procure to Pay processes Applies the Business rules on the Invoice details and validates them against the details in the Contract data, using gpt-4o for reasoning Generates a detailed report of violations and anomalies using gpt-4o Code Walkthrough 1. Tools The Agent (i.e. the application) uses the configuration for File search, and for the Function Call to invoke the Azure Logic App. # These are the tools that will be used by the Responses API. tools_list = [ { "type": "file_search", "vector_store_ids": [config.vector_store_id], "max_num_results": 20, }, { "type": "function", "name": "retrieve_contract", "description": "fetch contract details for the given contract_id and supplier_id", "parameters": { "type": "object", "properties": { "contract_id": { "type": "string", "description": "The contract id registered for the Supplier in the System", }, "supplier_id": { "type": "string", "description": "The Supplier ID registered in the System", }, }, "required": ["contract_id", "supplier_id"], }, }, ] 2. Instructions to the Agent Unlike Chat Completions End points that use System Prompts, the Responses API uses Instructions. This contains the prompt that describes how the Agent should go about implementing the use case in its entirety. instructions=""" This is a Procure to Pay process. You will be provided with the Purchase Invoice image as input. Note that Step 3 can be performed only after Step 1 and Step 2 are completed. Step 1: As a first step, you will extract the Contract ID and Supplier ID from the Invoice and also all the line items from the Invoice in the form of a table. Step 2: You will then use the function tool to call the Logic app with the Contract ID and Supplier ID to get the contract details. Step 3: You will then use the file search tool to retrieve the business rules applicable to detection of anomalies in the Procure to Pay process. Step 4: Then, apply the retrieved business rules to match the invoice line items with the contract details fetched from the system, and detect anomalies if any. Provide the list of anomalies detected in the Invoice, and the business rules that were violated. """ 3. User input to Responses API Load the Invoice image as an encoded base64 string, and add that to user input payload. For simplicity the user input is passed as 'user_prompt' as a string literal in the code, just for demonstration purposes. user_prompt = """ here are the Purchase Invoice image(s) as input. Detect anomalies in the procure to pay process and give me a detailed report """ # read the Purchase Invoice image(s) to be sent as input to the model image_paths = ["data_files/Invoice-002.png"] def encode_image_to_base64(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") # Encode images base64_images = [encode_image_to_base64(image_path) for image_path in image_paths] input_messages = [ { "role": "user", "content": [ {"type": "input_text", "text": user_prompt}, *[ { "type": "input_image", "image_url": f"data:image/jpeg;base64,{base64_image}", "detail": "high", } for base64_image in base64_images ], ], } ] 4. Invoking the Responses API The single call below performs all the different steps required to complete the anomaly detection end to end. Note that all the actions like Image based reasoning over the Invoice, vector search to retrieve the Business rules, reasoning over every tool call output and preparing the input for the next tool call, all happens directly within the API, in the cloud. # The following code is to call the Responses API with the input messages and tools response = client.responses.create( model=config.model, instructions=instructions, input=input_messages, tools=tools_list, tool_choice="auto", parallel_tool_calls=False, ) tool_call = response.output[0] There is only one step, related to Function call, that needs to run the custom function locally in the Application. The Responses API response indicates that a Function Call invocation has to happen before it can complete the process. It provides the Function name and the arguments required to make that call. We then make that function call, locally in the application, to Azure Logic Apps. We get the response back from the Function call, and that that to the payload of input message to the Responses API. It then completes the rest of the steps in the workflow. # We know this needs a function call, that needs to be executed from here in the application code. # Lets get hold of the function name and arguments from the Responses API response. function_response = None function_to_call = None function_name = None # When a function call is entailed, Responses API gives us control so that we can make the call from our application. # Note that this is because function call is to run our own custom code, it is not a hosted tool that Responses API can directly access and run. if response.output[0].type == "function_call": function_name = response.output[0].name function_to_call = available_functions[function_name] function_args = json.loads(response.output[0].arguments) # Lets call the Logic app with the function arguments to get the contract details. function_response = function_to_call(**function_args) # append the response message to the input messages, and proceed with the next call to the Responses API. input_messages.append(tool_call) # append model's function call message input_messages.append({ # append result message "type": "function_call_output", "call_id": tool_call.call_id, "output": str(function_response) }) # This is the final call to the Responses API with the input messages and tools response_2 = client.responses.create( model=config.model, instructions=instructions, input=input_messages, tools=tools_list, ) print(response_2.output_text) 5. Function Call Here is the code snippet that invokes the Azure Logic App and returns the relevant contract details from the Azure SQL Database. if response.output[0].type == "function_call": function_name = response.output[0].name function_to_call = available_functions[function_name] function_args = json.loads(response.output[0].arguments) # Lets call the Logic app with the function arguments to get the contract details. function_response = function_to_call(**function_args) # append the response message to the input messages, and proceed with the next call to the Responses API. input_messages.append(tool_call) # append model's function call message input_messages.append({ # append result message "type": "function_call_output", "call_id": tool_call.call_id, "output": str(function_response) }) # This is the final call to the Responses API with the input messages and tools response_2 = client.responses.create( model=config.model, instructions=instructions, input=input_messages, tools=tools_list, ) print(response_2.output_text) Code Run outcome Here is the output from the run of the Responses API call ## ✅ Contract Line Items (Raw JSON) ```json [ { "ContractID": "CON000002", "LineID": "LINE000003", "SupplierID": "SUP0008", "ContractDate": "2022-10-19T00:00:00", "ExpirationDate": "2023-01-07T00:00:00", "TotalAmount": 66543.390625, "Currency": "USD", "Status": "Expired", "ItemID": "ITEM0040", "Quantity": 78, "UnitPrice": 136.75, "TotalPrice": 10666.5, "DeliveryDate": "2023-01-01T00:00:00", "ItemDescription": "Description for ITEM0040" }, { "ContractID": "CON000002", "LineID": "LINE000004", "SupplierID": "SUP0008", "ContractDate": "2022-10-19T00:00:00", "ExpirationDate": "2023-01-07T00:00:00", "TotalAmount": 66543.390625, "Currency": "USD", "Status": "Expired", "ItemID": "ITEM0082", "Quantity": 57, "UnitPrice": 479.8699951171875, "TotalPrice": 27352.58984375, "DeliveryDate": "2022-11-26T00:00:00", "ItemDescription": "Description for ITEM0082" }, { "ContractID": "CON000002", "LineID": "LINE000005", "SupplierID": "SUP0008", "ContractDate": "2022-10-19T00:00:00", "ExpirationDate": "2023-01-07T00:00:00", "TotalAmount": 66543.390625, "Currency": "USD", "Status": "Expired", "ItemID": "ITEM0011", "Quantity": 21, "UnitPrice": 398.0899963378906, "TotalPrice": 8359.8896484375, "DeliveryDate": "2022-11-29T00:00:00", "ItemDescription": "Description for ITEM0011" }, { "ContractID": "CON000002", "LineID": "LINE000006", "SupplierID": "SUP0008", "ContractDate": "2022-10-19T00:00:00", "ExpirationDate": "2023-01-07T00:00:00", "TotalAmount": 66543.390625, "Currency": "USD", "Status": "Expired", "ItemID": "ITEM0031", "Quantity": 47, "UnitPrice": 429.0299987792969, "TotalPrice": 20164.41015625, "DeliveryDate": "2022-12-09T00:00:00", "ItemDescription": "Description for ITEM0031" } ] ## 🧾 Extracted Details from Invoice - **Contract ID:** CON000002 - **Supplier ID:** SUP0008 - **Total Invoice Value:** $113,130.16 USD - **Invoice Date:** 2023-06-15 --- ### 📦 Invoice Line Items | Item ID | Quantity | Unit Price | Total Price | Description | |-----------|----------|------------|-------------|------------------------------| | ITEM0040 | 116 | $136.75 | $15,863.00 | Description for ITEM0040 | | ITEM0082 | 116 | $554.62 | $64,335.92 | Description for ITEM0082 | | ITEM0011 | 36 | $398.09 | $14,331.24 | Description for ITEM0011 | | ITEM0031 | 36 | $475.00 | $17,100.00 | Description for ITEM0031 | | ITEM9999 | 10 | $150.00 | $1,500.00 | Extra item not in contract | --- ## 📄 Contract Details Retrieved ### ITEM0040 - Quantity: 78 - Unit Price: $136.75 - Total Price: $10,666.50 ### ITEM0082 - Quantity: 57 - Unit Price: $479.87 - Total Price: $27,352.59 ### ITEM0011 - Quantity: 21 - Unit Price: $398.09 - Total Price: $8,359.89 ### ITEM0031 - Quantity: 47 - Unit Price: $429.03 - Total Price: $20,164.41 - **Contract Expiration:** 2023-01-07 (Status: Expired) --- ## ❗ Anomalies Detected ### 🔴 Contract Expiry - Invoice dated **2023-06-15** refers to an **expired contract** (expired on **2023-01-07**). ### 🔴 Quantity Exceeds Contract - **ITEM0040:** 116 > 78 - **ITEM0082:** 116 > 57 - **ITEM0011:** 36 > 21 - **ITEM0031:** 36 ≤ 47 (✅ within limit) ### 🔴 Price Discrepancy - **ITEM0082:** Invoiced @ $554.62 vs Contract @ $479.87 - **ITEM0031:** Invoiced @ $475.00 vs Contract @ $429.03 ### 🔴 Extra Item - **ITEM9999** not found in contract records. --- ## 🧩 Conclusion Multiple business rule violations were found: - ❌ Contract expired - ❌ Quantity overrun - ❌ Price discrepancies - ❌ Unauthorized items > **Recommended:** Detailed investigation and corrective action. References: The source code of the application used in this sample - here Read about the Responses API here Read about the availability of this API on Azure here View a video of the demonstration of this sample application below.756Views2likes0CommentsExploring Azure OpenAI Assistants and Azure AI Agent Services: Benefits and Opportunities
In the rapidly evolving landscape of artificial intelligence, businesses are increasingly turning to cloud-based solutions to harness the power of AI. Microsoft Azure offers two prominent services in this domain: Azure OpenAI Assistants and Azure AI Agent Services. While both services aim to enhance user experiences and streamline operations, they cater to different needs and use cases. This blog post will delve into the details of each service, their benefits, and the opportunities they present for businesses. Understanding Azure OpenAI Assistants What Are Azure OpenAI Assistants? Azure OpenAI Assistants are designed to leverage the capabilities of OpenAI's models, such as GPT-3 and its successors. These assistants are tailored for applications that require advanced natural language processing (NLP) and understanding, making them ideal for conversational agents, chatbots, and other interactive applications. Key Features Pre-trained Models: Azure OpenAI Assistants utilize pre-trained models from OpenAI, which means they come with a wealth of knowledge and language understanding out of the box. This reduces the time and effort required for training models from scratch. Customizability: While the models are pre-trained, developers can fine-tune them to meet specific business needs. This allows for the creation of personalized experiences that resonate with users. Integration with Azure Ecosystem: Azure OpenAI Assistants seamlessly integrate with other Azure services, such as Azure Functions, Azure Logic Apps, and Azure Cognitive Services. This enables businesses to build comprehensive solutions that leverage multiple Azure capabilities. Benefits of Azure OpenAI Assistants Enhanced User Experience: By utilizing advanced NLP capabilities, Azure OpenAI Assistants can provide more natural and engaging interactions. This leads to improved customer satisfaction and loyalty. Rapid Deployment: The availability of pre-trained models allows businesses to deploy AI solutions quickly. This is particularly beneficial for organizations looking to implement AI without extensive development time. Scalability: Azure's cloud infrastructure ensures that applications built with OpenAI Assistants can scale to meet growing user demands without compromising performance. Understanding Azure AI Agent Services What Are Azure AI Agent Services? Azure AI Agent Services provide a more flexible framework for building AI-driven applications. Unlike Azure OpenAI Assistants, which are limited to OpenAI models, Azure AI Agent Services allow developers to utilize a variety of AI models, including those from other providers or custom-built models. Key Features Model Agnosticism: Developers can choose from a wide range of AI models, enabling them to select the best fit for their specific use case. This flexibility encourages innovation and experimentation. Custom Agent Development: Azure AI Agent Services support the creation of custom agents that can perform a variety of tasks, from simple queries to complex decision-making processes. Integration with Other AI Services: Like OpenAI Assistants, Azure AI Agent Services can integrate with other Azure services, allowing for the creation of sophisticated AI solutions that leverage multiple technologies. Benefits of Azure AI Agent Services Diverse Use Cases: The ability to use any AI model opens a world of possibilities for businesses. Whether it's a specialized model for sentiment analysis or a custom-built model for a niche application, organizations can tailor their solutions to meet specific needs. Enhanced Automation: AI agents can automate repetitive tasks, freeing up human resources for more strategic activities. This leads to increased efficiency and productivity. Cost-Effectiveness: By allowing the use of various models, businesses can choose cost-effective solutions that align with their budget and performance requirements. Opportunities for Businesses Improved Customer Engagement Both Azure OpenAI Assistants and Azure AI Agent Services can significantly enhance customer engagement. By providing personalized and context-aware interactions, businesses can create a more satisfying user experience. For example, a retail company can use an AI assistant to provide tailored product recommendations based on customer preferences and past purchases. Data-Driven Decision Making AI agents can analyze vast amounts of data and provide actionable insights. This capability enables organizations to make informed decisions based on real-time data analysis. For instance, a financial institution can deploy an AI agent to monitor market trends and provide investment recommendations to clients. Streamlined Operations By automating routine tasks, businesses can streamline their operations and reduce operational costs. For example, a customer support team can use AI agents to handle common inquiries, allowing human agents to focus on more complex issues. Innovation and Experimentation The flexibility of Azure AI Agent Services encourages innovation. Developers can experiment with different models and approaches to find the most effective solutions for their specific challenges. This culture of experimentation can lead to breakthroughs in product development and service delivery. Enhanced Analytics and Insights Integrating AI agents with analytics tools can provide businesses with deeper insights into customer behavior and preferences. This data can inform marketing strategies, product development, and customer service improvements. For example, a company can analyze interactions with an AI assistant to identify common customer pain points, allowing them to address these issues proactively. Conclusion In summary, both Azure OpenAI Assistants and Azure AI Agent Services offer unique advantages that can significantly benefit businesses looking to leverage AI technology. Azure OpenAI Assistants provide a robust framework for building conversational agents using advanced OpenAI models, making them ideal for applications that require sophisticated natural language understanding and generation. Their ease of integration, rapid deployment, and enhanced user experience make them a compelling choice for businesses focused on customer engagement. Azure AI Agent Services, on the other hand, offer unparalleled flexibility by allowing developers to utilize a variety of AI models. This model-agnostic approach encourages innovation and experimentation, enabling businesses to tailor solutions to their specific needs. The ability to automate tasks and streamline operations can lead to significant cost savings and increased efficiency. Additional Resources To further explore Azure OpenAI Assistants and Azure AI Agent Services, consider the following resources: Agent Service on Microsoft Learn Docs Watch On-Demand Sessions Streamlining Customer Service with AI-Powered Agents: Building Intelligent Multi-Agent Systems with Azure AI Microsoft learn Develop AI agents on Azure - Training | Microsoft Learn Community and Announcements Tech Community Announcement: Introducing Azure AI Agent Service Bonus Blog Post: Announcing the Public Preview of Azure AI Agent Service AI Agents for Beginners 10 Lesson Course https://aka.ms/ai-agents-beginners3.7KViews0likes2Comments