Artificial Intelligence

331 Topics

Black Forest Labs FLUX.2 Visual Intelligence for Enterprise Creative now on Microsoft Foundry
Black Forest Labs’ (BFL) FLUX.2 is now available on Microsoft Foundry. Building on FLUX1.1 [pro] and FLUX.1 Kontext  [pro], we’re excited to introduce FLUX.2 [pro] which continues to push the frontier for visual intelligence. FLUX.2 [pro] delivers state-of-the-art quality with pre-optimized settings, matching the best closed models for prompt adherence and visual fidelity while generating faster at lower cost. Prompt: "Cinematic film still of a woman walking alone through a narrow Madrid street at night, warm street lamps, cool blue shadows, light rain reflecting on cobblestones, moody and atmospheric, shallow depth of field, natural skin texture, subtle film grain and introspective mood" This prompt shines because it taps into FLUX.2 [pro]'s cinematic‑lighting engine, letting the model fuse warm street‑lamp glow and cool shadows into a visually striking, film‑grade composition. What’s game-changing about FLUX.2 [pro]? FLUX.2 is designed for real-world creative workflows where consistency, accuracy, and iteration speed determine whether AI generation can replace traditional production pipelines. The model understands lighting, perspective, materials, and spatial relationships. It maintains characters and products consistent across up to 10 reference images simultaneously. It adheres to brand constraints like exact hex colors and legible text. The result: production-ready assets with fewer touchups and stronger brand fidelity. What’s New: Production‑grade quality up to 4MP: High‑fidelity, coherent scenes with realistic lighting, spatial logic, and fine detail suitable for product photography and commercial use cases. Multi‑reference consistency: Reference up to 10 images simultaneously with the best character, product, and style consistency available today. Generate dozens of brand-compliant assets where identity stays perfectly aligned shot to shot. Brand‑accurate results: Exact hex‑color matching, reliable typography, and structured controls (JSON, pose guidance) mean fewer manual fixes and stronger brand compliance. Strong prompt fidelity for complex directions: Improved adherence to complex, structured instructions including multi-part prompts, compositional constraints, and JSON-based controls. 32K token context supports long, detailed workflows with exact positioning specifications, physics-aware lighting, and precise compositional requirements in a single prompt. Optimized inference: FLUX.2 [pro] delivers state-of-the-art quality with pre-optimized inference settings, generating faster at lower cost than competing closed models. FLUX.2 transforms creative production economics by enabling workflows that weren't possible with earlier systems. Teams ship complete campaigns in days instead of weeks, with fewer manual touchups and stronger brand fidelity at scale. This performance stems from FLUX.2's unified architecture, which combines generation and editing in a single latent flow matching model. How it Works FLUX.2 combines image generation and editing in a single latent flow matching architecture, coupling a Mistral‑3 24B vision‑language model (VLM) with a rectified flow transformer. The VLM brings real‑world knowledge and contextual understanding, while the flow transformer models spatial relationships, material properties, and compositional logic that earlier architectures struggled to render. FLUX.2’s architecture unifies visual generation and editing, fuses language‑grounded understanding with flow‑based spatial modeling, and delivers production‑ready, brand‑safe images with predictable control especially when you need consistent identity, exact colors, and legible typography at high resolution. Technical details can be found in the FLUX.2 VAE blog post. Top enterprise scenarios & patterns to try with FLUX.2 [pro] The addition of FLUX.2 [pro] is the next step in the evolution for delivering faster, richer, and more controllable generation unlocking a new wave of creative potential for enterprises. Bring FLUX.2 [pro] into your workflow and transform your creative pipeline from concept to production by trying out these patterns: Enterprise scenarios Patterns to try E‑commerce hero shots Start with a small set of references (product front, material/texture, logo). Prompt for a studio hero shot on a white seamless background, three‑quarter view, softbox key + subtle rim light. Include exact hex for brand accents and specify logo placement. Output at 4MP. Product variants at scale Reuse the hero references; ask for specific colorway, angle, and background variants (e.g., “Create {COLOR} variant, {ANGLE} view, {BG} background”). Keep brand hex and logo position constant across variants. Campaign consistency (character/product identity) Provide 5–10 reference images for the character/product (faces, outfits, mood boards). Request the same identity across scenes with consistent lighting/style (e.g., cinematic warm daylight) and defined environments (e.g., urban rooftop). Marketing templates & localization Define a template (e.g., 3‑column grid: left image, right text). Set headline/body sizes (e.g., 24pt/14pt), contrast ≥ 4.5:1, and brand font. Swap localized copy per locale while keeping layout and spacing consistent. Best practices to get to production readiness with Microsoft Foundry FLUX.2 [pro] brings state-of-the-art image quality to your fingertips. In Microsoft Foundry, you can turn those capabilities into predictable, governed outcomes by standardizing templates, managing references, enforcing brand rules, and controlling spend. These practices below leverage FLUX.2 [pro]’s visual intelligence and turn them into repeatable recipes, auditable artifacts, and cost‑controlled processes within a governed Foundry pipeline. Best Practice What to do Foundry tip Approved templates Create 3–5 templates (e.g., hero shot, variant gallery, packaging, social card) with sections for Composition (camera, lighting, environment), Brand (hex colors, logo placement), Typography (font, sizes, contrast), and Output (resolution, format). Store templates in Foundry as approved artifacts; version them and restrict edits via RBAC. Versioned reference sets Keep 3–10 references per subject (product: front/side/texture; talent: face/outfit/mood) and link them to templates. Save references in governed Foundry storage; reference IDs travel with the job metadata. Resolution staging Use a three‑stage plan: Concept (1–2MP) → Review (2–3MP) → Final (4MP). Leverage FLUX.1 [pro] and FLUX1.1 Kontext [pro] before the Final stage for fast iteration and cost control Enforce stage‑based quotas and cap max resolution per job; require approval to move to 4MP. Automated QA & approvals Run post‑generation checks for color match, text legibility, and safe‑area compliance; gate final renders behind a review step. Use Foundry workflows to require sign‑off at the Review stage before Final stage. Telemetry & feedback Track latency, success rate, usage, and cost per render; collect reviewer notes and refine templates. Dashboards in Foundry: monitor job health, cost, and template performance. Foundry Models continues to grow with cutting-edge additions to meet every enterprise need—including models from Black Forest Labs, OpenAI, and more. From models like GPT‑image‑1, FLUX.2 [pro], and Sora 2, Microsoft Foundry has become the place where creators push the boundaries of what’s possible. Watch how Foundry transforms creative workflows with this demo: Customer Stories As seen at Ignite 2025, real‑world customers like Sinyi Realty have already demonstrated the efficiency of Black Forest Lab’s models on Microsoft Foundry by choosing FLUX.1 Kontext [pro] for its superior performance and selective editing. For their new 'Clear All' feature, they preferred a model that preserves the original room structure and simply removes clutter, rather than generating a new space from scratch, saving time and money. Read the story to learn more. “We wanted to stay in the same workspace rather than having to maintain different platforms,” explains TeWei Hsieh, who works in data engineering and data architecture. “By keeping FLUX Kontext model in Foundry, our data scientists and data engineers can work in the same environment.” As customers like Sinyi Realty have already shown, BFL FLUX models raise the bar for speed, precision, and operational efficiency. With FLUX.2 now on Microsoft Foundry, organizations can bring that same competitive edge directly into their own production pipelines. FLUX.2 [pro] Pricing Foundry Models are fully hosted and managed on Azure. FLUX.2 [pro] is available through pay-as-you-go and on Global Standard deployment type with the following pricing: Generated image: The first generated megapixel (MP) is charged $0.03. Each subsequent megapixel is charged $0.015. Reference image(s): We charge $0.015 for each megapixel. Important Notes: For pricing, resolution is always rounded up to the next megapixel, separately for each reference image and for the generated image. 1 megapixel is counted as 1024x1024 pixels For multiple reference images, each reference image is counted as 1 megapixel Images exceeding 4 megapixels are resized to 4 megapixels Reference the Foundry Models pricing page for pricing. Build Trustworthy AI Solutions Black Forest Labs models in Foundry Models are delivered under the Microsoft Product Terms, giving you enterprise-grade security and compliance out of the box. Each FLUX endpoint offers Content Safety controls and guardrails. Runtime protections include built-in content-safety filters, role-based access control, virtual-network isolation, and automatic Azure Monitor logging. Governance signals stream directly into Azure Policy, Purview, and Microsoft Sentinel, giving security and compliance teams real-time visibility. Together, Microsoft's capabilities let you create with more confidence, knowing that privacy, security, and safety are woven into every Black Forest Labs deployment from day one. Getting Started with FLUX.2 in Microsoft Foundry If you don’t have an Azure subscription, you can sign up for an Azure account here. Search for the model name in the model catalog in Foundry under “Build.” FLUX.2-pro Open the model card in the model catalog. Click on deploy to obtain the inference API and key. View your deployment under Build > Models. You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground. You can use the API and key with various clients. Learn More ▶️ RSVP for the next Model Monday LIVE on YouTube or On-Demand 👩‍💻 Explore FLUX.2 Documentation on Microsoft Learn 👋 Continue the conversation on Discord
Naomi Moneypenny
Dec 19, 2025 Place Microsoft Foundry Blog
679Views
0likes
1Comment
From Large Semi-Structured Docs to Actionable Data: Reusable Pipelines with ADI, AI Search & OpenAI
Problem Space Large semi-structured documents such as contracts, invoices, hospital tariff/rate cards multi-page reports, and compliance records often carry essential information that is difficult to extract reliably with traditional approaches. Their layout can span several pages, the structure is rarely consistent, and related fields may appear far apart even though they must be interpreted together. This makes it hard not only to detect the right pieces of information but also to understand how those pieces relate across the document. LLM can help, but when documents are long and contain complex cross-references, they may still miss subtle dependencies or generate hallucinated information. That becomes risky in environments where small errors can cascade into incorrect decisions. At the same time, these documents don’t change frequently, while the extracted data is used repeatedly by multiple downstream systems at scale. Because of this usage pattern, a RAG-style pipeline is often not ideal in terms of cost, latency, or consistency. Instead, organizations need a way to extract data once, represent it consistently, and serve it efficiently in a structured form to a wide range of applications, many of which are not conversational AI systems. At this point, data stewardship becomes critical, because once information is extracted, it must remain accurate, governed, traceable, and consistent throughout its lifecycle. When the extracted information feed compliance checks, financial workflows, risk models, or end-user experiences, the organization must ensure that the data is not just captured correctly but also maintained with proper oversight as it moves across systems. Any extraction pipeline that cannot guarantee quality, reliability, and provenance introduces long-term operational risk. The core problem, therefore, is finding a method that handles the structural and relational complexity of large semi-structured documents, minimizes LLM hallucination risk, produces deterministic results, and supports ongoing data stewardship so that the resulting structured output stays trustworthy and usable across the enterprise. Target Use Cases The potential applications for an Intelligent Document Processing (IDP) pipeline differ across industries. Several industry-specific use cases are provided as examples to guide the audience in conceptualising and implementing solutions tailored to their unique requirements. Hospital Tariff Digitization for Tariff-Invoice Reconciliation in Health Insurance Document types: Hospital tariff/rate cards, annexures/guidelines, pre-authorization guidelines etc. Technical challenge: Charges for the same service might appear under different sections or for different hospital room types across different versions of tariff/rate cards. Table + free text mix, abbreviations, and cross-page references. Downstream usage: Reimbursement orchestration, claims adjudication Commercial Loan Underwriting in Banking Document types: Balance sheets, cash-flow statements, auditor reports, collateral documents. Technical Challenge: Ratios and covenants must be computed from fields located across pages. Contextual dependencies: “Net revenue excluding exceptional items” or footnotes that override values. Downstream usage: Loan decisioning models, covenant monitoring, credit scoring. Procurement Contract Intelligence in Manufacturing Document types: Vendor agreements, SLAs, pricing annexures. Technical Challenge: Pricing rules defined across clauses that reference each other. Penalty and escalation conditions hidden inside nested sections. Downstream usage: Automated PO creation, compliance checks. Regulatory Compliance Extraction Document types: GDPR/HIPAA compliance docs, audit reports. Technical Challenge: Requirements and exceptions buried across many sections. Extraction must be deterministic since compliance logic is strict. Downstream usage: Rule engines, audit workflows, compliance checklist. Solution Approaches Problem Statement Across industries from finance and healthcare to legal and compliance, large semi-structured documents serve as the single source of truth for critical workflows. These documents often span hundreds of pages, mixing free text, tables, and nested references. Before any automation can validate transactions, enforce compliance, or perform analytics, this information must be transformed into a structured, machine-readable format. The challenge isn’t just size; it’s complexity. Rules and exceptions are scattered, relationships span multiple sections, and formatting inconsistencies make naive parsing unreliable. Errors at this stage ripple downstream, impacting reconciliation, risk models, and decision-making. In short, the fidelity of this digitization step determines the integrity of every subsequent process. Solving this problem requires a pipeline that can handle structural diversity, preserve context, and deliver deterministic outputs at scale. Challenges There are many challenges which can arise while solving for such large complex documents. The documents can have ~200-250 pages. The documents structures and layouts can be extremely complex in nature. A document or a page may contain a mix of various layouts like tables, text blocks, figures etc. Sometimes a single table can stretch across multiple pages, but only the first page contains the table header, leaving the remaining pages without column labels. A topic on one page may be referenced from a different page, so there can be complex inter-relationship amongst different topics in the same documents which needs to be structured in a machine-readable format. The document can be semi-structured as well (some parts are structured; some parts are unstructured or free text) The downstream applications might not always be AI-assisted (it can be core analytics dashboard or existing enterprise legacy system), so the structural storage of the digitized items from the documents need to be very well thought out before moving ahead with the solution. Motivation Behind High Level Approach A larger document (number of pages ~200) needs to be divided into smaller chunks so that it becomes readable and digestible (within context length) for the LLM. To make the content/input of the LLM truly context-aware, the references must be maintained across pages (for example, table headers of long and continuous tables need to be injected to those chunks which would have the tables without the headers). If a pre-defined set of topics/entities are being covered in the documents in consideration, then topic/entity-wise information needs to be extracted for making the system truly context-aware. Different chunks can cover similar topic/entity which becomes a search problem The retrieval needs to happen for every topic/entity so that all information related to one topic/entity are in a single place and as a result the downstream applications become efficient, scalable and reliable over time. Sample Architecture and Implementation Let’s take a possible approach to demonstrate the feasibility of the following architecture, building on the motivation outlined above. The solution divides a large, semi-structured document into manageable chunks, making it easier to maintain context and references across pages. First, the document is split into logical sections. Then, OCR and layout extraction capture both text and structure, followed by structure analysis to preserve semantic relationships. Annotated chunks are labeled and grouped by entity, enabling precise extraction of items such as key-value pairs or table data. As a result, the system efficiently transforms complex documents into structured, context-rich outputs ready for downstream analytics and automation. Architecture Components The key elements of the architecture diagram include components 1-6, which are code modules. Components 7 and 8 represent databases that store data chunks and extracted items, while component 9 refers to potential downstream systems that will use the structured data obtained from extraction. Chunking: Break documents into smaller, logical sections such as pages or content blocks. Enables parallel processing and improves context handling for large files. Technology: Python-based chunking logic using pdf2image and PIL for image handling. OCR & Layout Extraction: Convert scanned images into machine-readable text while capturing layout details like bounding boxes, tables, and reading order for structural integrity. Technology: Azure Document Intelligence or Microsoft Foundry Content Understanding Prebuilt Layout model combining OCR with deep learning for text, tables, and structure extraction. Context Aware Structural Analysis: Analyse the extracted layout to identify document components such as headers, paragraphs, and tables. Preserves semantic relationships for accurate interpretation. Technology: Custom Python logic leveraging OCR output to inject missing headers, summarize layout (row/column counts, sections per page). Labelling: Assign entity-based labels to chunks according to predefined schema or SME input. Helps filter irrelevant content and focus on meaningful sections. Technology: Azure OpenAI GPT-4.1-mini with NLI-style prompts for multi-class classification. Entity-Wise Grouping: Organize chunks by entity type (e.g., invoice number, total amount) for targeted extraction. Reduces noise and improves precision in downstream tasks. Technology: Azure AI Search with Hybrid Search and Semantic Reranking for grouping relevant chunks. Item Extraction: Extract specific values such as key-value pairs, line items, or table data from grouped chunks. Converts semi-structured content into structured fields. Technology: Azure OpenAI GPT-4.1-mini with Set of Marking style prompts using layout clues (row × column, headers, OCR text). Interim Chunk Storage: Store chunk-level data including OCR text, layout metadata, labels, and embeddings. Supports traceability, semantic search, and audit requirements. Technology: Azure AI Search for chunk indexing and Azure OpenAI Embedding models for semantic retrieval. Document Store: Maintain final extracted items with metadata and bounding boxes. Enables quick retrieval, validation, and integration with enterprise systems. Technology: Azure Cosmos DB, Azure SQL DB, Azure AI Search, or Microsoft Fabric depending on downstream needs (analytics, APIs, LLM apps). Downstream Integration: Deliver structured outputs (JSON, CSV, or database records) to business applications or APIs. Facilitates automation and analytics across workflows. Technology: REST APIs, Azure Functions, or Data Pipelines integrated with enterprise systems. Algorithms Consider these key algorithms when implementing the components above: Structural Analysis – Inject headers: Detect tables page by page; compare the last row of a table on page i with the first row of a table on page i+1, if column counts match and ≥4/5 style features (Font Weight, Background Colour, Font Style, Foreground Colour, Similar Font Family) match, mark it as a continuous table (header missing) and inject the previous page’s header into the next page’s table, repeating across pages. Labelling – Prompting Guide: Run NLI checks per SOC chunk image (ground on OCR text) across N curated entity labels, return {decision ∈ {ENTAILED, CONTRADICTED, NEUTRAL}, confidence ∈ [0,1]}, and output only labels where decision = ENTAILED and confidence > 0.7. Entity-Wise Grouping – Querying Chunks per Entity & Top‑50 Handling: Construct the query from the entity text and apply hybrid search with label filters for Azure AI Search, starting with chunks where the target label is sole, then expanding to observed co‑occurrence combinations under a cap to prevent explosion. If label frequency >50, run staged queries (sole‑label → capped co‑label combos); otherwise use a single hybrid search with semantic reranking, merge results and deduplicate before scoring. Entity-Wise Grouping – Chunk to Entity relevance scoring: For each retrieved chunk, split text into spans; compute cosine similarities to the entity text and take the mean s. Boost with a gated nonlinearity b=σ(k(s-m))⋅s. where σ is sigmoid function and k,m are tunables to emphasize mid-range relevance while suppressing very low s. Min–max normalize the re-ranker score r → r_norm; compute the final score F=α*b+(1-α)*r_norm, and keep the chunk iff F≥τ. Item Extraction – Prompting Guide: Provide the chunk image as an input and ground on visual structure (tables, headers, gridlines, alignment, typography) and document structural metadata to segment and align units; reconcile ambiguities via OCR extracted text, then enumerate associations by positional mapping (header ↔ column, row ↔ cell proximity) and emit normalized objects while filtering narrative/policy text by layout and pattern cues. Deployment at Scale There are several ways to implement a document extraction pipeline, each with its own pros and cons. The best deployment model depends on scenario requirements. Below are some common approaches with their advantages and disadvantages. Host as REST API Pros: Enables straightforward creation, customization, and deployment across scalable compute services such as Azure Kubernetes Service. Cons: Processing time and memory usage scale with document size and complexity, potentially requiring multiple iterations to optimize performance. Deploy as Azure Machine Learning (ML) Pipeline Pros: Facilitates efficient time and memory management, as Azure ML supports processing large datasets at scale. Cons: The pipeline may be more challenging to develop, customize, and maintain. Deploy as Azure Databricks Job Pros: Offers robust time and memory management similar to Azure ML, with advanced features such as Data Autoloader for detecting data changes and triggering pipeline execution. Cons: The solution is highly tailored to Azure Databricks and may have limited customization options. Deploy as Microsoft Fabric Pipeline Pros: Provides capabilities comparable to Azure ML and Databricks, and features like Fabric Activator replicate Databricks Autoloader functionality. Cons: Presents similar limitations found in Azure ML and Azure Databricks approaches. Each method should be carefully evaluated to ensure alignment with technical and operational requirements. Evaluation Objective: The aim is to evaluate how accurately a document extraction pipeline extracts information by comparing its output with manually verified data. Approach: Documents are split into sections, labelled, and linked to relevant entities; then, AI tools extract key items through the outlined pipeline mentioned above. The extracted data is checked against expert-curated records using both exact and approximate matching techniques. Key Metrics: Individual Item Attribute Match: Assesses the system’s ability to identify specific item attributes using strict and flexible comparison methods. Combined Item Attribute Match: Evaluates how well multiple attributes are identified together, considering both exact and fuzzy matches. Precision Calculation: Precision for each metric reflects the proportion of correctly matched entries compared to all reference entries. Findings for a real-world scenario: Fuzzy matching of item key attributes yields high precision (over 90%), but accuracy drops for key attribute combinations (between 43% and 48%). These results come from analysis across several datasets to ensure reliability. How This Addresses the Problem Statement The sample architecture described integrates sectioning, entity linking, and attribute extraction as foundational steps. Each extracted item is then evaluated against expert-curated datasets using both strict (exact) and flexible (fuzzy) matching algorithms. This approach directly addresses the problem statement by providing measurable metrics, such as individual and combined attribute match rates and precision calculations, that quantify the system’s reliability and highlight areas for improvement. Ultimately, this methodology ensures that the pipeline’s output is systematically validated, and its strengths and limitations are clearly understood in real-world contexts. Plausible Alternative Approaches No single approach fits every use case; the best method depends on factors like document complexity, structure, sensitivity, and length as well as the downstream application types, Consider these alternative approaches for different scenarios. Using Azure OpenAI alone Article: Best Practices for Structured Extraction from Documents Using Azure OpenAI Using Azure OpenAI + Azure Document Intelligence + Azure AI Search: RAG like solution Article 1: Document Field Extraction with Generative AI Article 2: Complex Data Extraction using Document Intelligence and RAG Article 3: Design and develop a RAG solution Using Azure OpenAI + Azure Document Intelligence + Azure AI Search: Non-RAG like solution Article: Using Azure AI Document Intelligence and Azure OpenAI to extract structured data from documents GitHub Repository: Content processing solution accelerator Conclusion Intelligent Document Processing for large semi-structured documents isn’t just about extracting data, it’s about building trust in that data. By combining Azure Document Intelligence for layout-aware OCR with OpenAI models for contextual understanding, we create a well thought out in-depth pipeline that is accurate, scalable, and resilient against complexity. Chunking strategies ensure context fits within model limits, while header injection and structural analysis preserve relationships across pages to make it context-aware. Entity-based grouping and semantic retrieval transform scattered content into organized, query-ready data. Finally, rigorous evaluation with scalable ground truth strategy roadmap, using precision, recall, and fuzzy matching, closes the loop, ensuring reliability for downstream systems. This pattern delivers more than automation; it establishes a foundation for compliance, analytics, and AI-driven workflows at enterprise scale. In short, it’s a blueprint for turning chaotic document into structured intelligence, efficient, governed, and future-ready for any kind of downstream applications. End-to-End Evaluation Approaches Guidance Given the complexity of this system, it should undergo a thorough end-to-end evaluation to ensure correctness, robustness, and performance across the pipeline. Continuous monitoring and observability of these metrics will enable iterative improvements and help the system scale reliably as requirements evolve. If you would like to read more about the end-to-end evaluation approaches guidance, please refer to our tech community blog: From Large Semi-Structured Docs to Actionable Data: In-Depth Evaluation Approaches Guidance. References Azure Content Understanding in Foundry Tools Azure Document Intelligence in Foundry Tools Azure OpenAI in Microsoft Foundry models Azure AI Search Azure Machine Learning (ML) Pipelines Azure Databricks Job Microsoft Fabric Pipeline
anishganguli
Dec 18, 2025 Place Azure Architecture Blog
399Views
0likes
0Comments
Introducing OpenAI’s GPT-image-1.5 in Microsoft Foundry
Developers building with visual AI can often run into the same frustrations: images that drift from the prompt, inconsistent object placement, text that renders unpredictably, and editing workflows that break when iterating on a single asset. That’s why we are excited to announce OpenAI's GPT Image 1.5 is now generally available in Microsoft Foundry. This model can bring sharper image fidelity, stronger prompt alignment, and faster image generation that supports iterative workflows. Starting today, customers can request access to the model and start building in the Foundry platform. Meet GPT Image 1.5 AI driven image generation began with early models like OpenAI's DALL-E, which introduced the ability to transform text prompts into visuals. Since then, image generation models have been evolving to enhance multimodal AI across industries. GPT Image 1.5 represents continuous improvement in enterprise-grade image generation. Building on the success of GPT Image 1 and GPT Image 1 mini, these enhanced models introduce advanced capabilities that cater to both creative and operational needs. The new image models offer: Text-to-image: Stronger instruction following and highly precise editing. Image-to-image: Transform existing images to iteratively refine specific regions Improved visual fidelity: More detailed scenes and realistic rendering. Accelerated creation times: Up to 4x faster generation speed. Enterprise integration: Deploy and scale securely in Microsoft Foundry. GPT Image 1.5 delivers stronger image preservation and editing capabilities, maintaining critical details like facial likeness, lighting, composition, and color tone across iterative changes. You’ll see more consistent preservation of branded logos and key visuals, making it especially powerful for marketing, brand design, and ecommerce workflows—from graphics and logo creation to generating full product catalogs (variants, environments, and angles) from a single source image. Benchmarks Based on an internal Microsoft dataset, GPT Image 1.5 performs higher than other image generation models in prompt alignment and infographics tasks. It focuses on making clear, strong edits – performing best on single-turn modification, delivering the higher visual quality in both single and multi-turn settings. The following results were found across image generation and editing: Text to image Prompt alignment Diagram / Flowchart GPT Image 1.5 91.2% 96.9% GPT Image 1 87.3% 90.0% Qwen Image 83.9% 33.9% Nano Banana Pro 87.9% 95.3% Image editing Evaluation Aspect Modification Preservation Visual Quality Face Preservation Metrics BinaryEval SC (semantic) DINO (Visual) BinaryEval AuraFace Single-turn GPT image 1 99.2% 51.0% 0.14 79.5% 0.30 Qwen image 81.9% 63.9% 0.44 76.0% 0.85 GPT Image 1.5 100% 56.77% 0.14 89.96% 0.39 Multi-turn GPT Image 1 93.5% 54.7% 0.10 82.8% 0.24 Qwen image 77.3% 68.2% 0.43 77.6% 0.63 GPT image 1.5 92.49% 60.55% 0.15 89.46% 0.28 Using GPT Image 1.5 across industries Whether you’re creating immersive visuals for campaigns, accelerating UI and product design, or producing assets for interactive learning GPT Image 1.5 gives modern enterprises the flexibility and scalability they need. Image models can allow teams to drive deeper engagement through compelling visuals, speed up design cycles for apps, websites, and marketing initiatives, and support inclusivity by generating accessible, high‑quality content for diverse audiences. Watch how Foundry enables developers to iterate with multimodal AI across Black Forest Labs, OpenAI, and more: Microsoft Foundry empowers organizations to deploy these capabilities at scale, integrating image generation seamlessly into enterprise workflows. Explore the use of AI image generation here across industries like: Retail: Generate product imagery for catalogs, e-commerce listings, and personalized shopping experiences. Marketing: Create campaign visuals and social media graphics. Education: Develop interactive learning materials or visual aids. Entertainment: Edit storyboards, character designs, and dynamic scenes for films and games. UI/UX: Accelerate design workflows for apps and websites. Microsoft Foundry provides security and compliance with built-in content safety filters, role-based access, network isolation, and Azure Monitor logging. Integrated governance via Azure Policy, Purview, and Sentinel gives teams real-time visibility and control, so privacy and safety are embedded in every deployment. Learn more about responsible AI at Microsoft. Pricing Model Pricing (per 1M tokens) - Global GPT-image-1.5 Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $32 Cost efficiency improves as well: image inputs and outputs are now cheaper compared to GPT Image 1, enabling organizations to generate and iterate on more creative assets within the same budget. For detailed pricing, refer here. Getting started Learn more about image generation, explore code samples, and read about responsible AI protections here. Try GPT Image 1.5 in Microsoft Foundry and start building multimodal experiences today. Whether you’re designing educational materials, crafting visual narratives, or accelerating UI workflows, these models deliver the flexibility and performance your organization needs.
Naomi Moneypenny
Dec 17, 2025 Place Microsoft Foundry Blog
3.1KViews
0likes
1Comment
Integrate Custom Azure AI Agents with Copilot Studio and M365 Copilot
Integrating Custom Agents with Copilot Studio and M365 Copilot In today's fast-paced digital world, integrating custom agents with Copilot Studio and M365 Copilot can significantly enhance your company's digital presence and extend your CoPilot platform to your enterprise applications and data. This blog will guide you through the integration steps of bringing your custom Azure AI Agent Service within an Azure Function App, into a Copilot Studio solution and publishing it to M365 and Teams Applications. When Might This Be Necessary: Integrating custom agents with Copilot Studio and M365 Copilot is necessary when you want to extend customization to automate tasks, streamline processes, and provide better user experience for your end-users. This integration is particularly useful for organizations looking to streamline their AI Platform, extend out-of-the-box functionality, and leverage existing enterprise data and applications to optimize their operations. Custom agents built on Azure allow you to achieve greater customization and flexibility than using Copilot Studio agents alone. What You Will Need: To get started, you will need the following: Azure AI Foundry Azure OpenAI Service Copilot Studio Developer License Microsoft Teams Enterprise License M365 Copilot License Steps to Integrate Custom Agents: Create a Project in Azure AI Foundry: Navigate to Azure AI Foundry and create a project. Select 'Agents' from the 'Build and Customize' menu pane on the left side of the screen and click the blue button to create a new agent. Customize Your Agent: Your agent will automatically be assigned an Agent ID. Give your agent a name and assign the model your agent will use. Customize your agent with instructions: Add your knowledge source: You can connect to Azure AI Search, load files directly to your agent, link to Microsoft Fabric, or connect to third-party sources like Tripadvisor. In our example, we are only testing the CoPilot integration steps of the AI Agent, so we did not build out additional options of providing grounding knowledge or function calling here. Test Your Agent: Once you have created your agent, test it in the playground. If you are happy with it, you are ready to call the agent in an Azure Function. Create and Publish an Azure Function: Use the sample function code from the GitHub repository to call the Azure AI Project and Agent. Publish your Azure Function to make it available for integration. azure-ai-foundry-agent/function_app.py at main · azure-data-ai-hub/azure-ai-foundry-agent Connect your AI Agent to your Function: update the "AIProjectConnString" value to include your Project connection string from the project overview page of in the AI Foundry. Role Based Access Controls: We have to add a role for the function app on OpenAI service. Role-based access control for Azure OpenAI - Azure AI services | Microsoft Learn Enable Managed Identity on the Function App Grant "Cognitive Services OpenAI Contributor" role to the System-assigned managed identity to the Function App in the Azure OpenAI resource Grant "Azure AI Developer" role to the System-assigned managed identity for your Function App in the Azure AI Project resource from the AI Foundry Build a Flow in Power Platform: Before you begin, make sure you are working in the same environment you will use to create your Copilot Studio agent. To get started, navigate to the Power Platform (https://make.powerapps.com) to build out a flow that connects your Copilot Studio solution to your Azure Function App. When creating a new flow, select 'Build an instant cloud flow' and trigger the flow using 'Run a flow from Copilot'. Add an HTTP action to call the Function using the URL and pass the message prompt from the end user with your URL. The output of your function is plain text, so you can pass the response from your Azure AI Agent directly to your Copilot Studio solution. Create Your Copilot Studio Agent: Navigate to Microsoft Copilot Studio and select 'Agents', then 'New Agent'. Make sure you are in the same environment you used to create your cloud flow. Now select ‘Create’ button at the top of the screen From the top menu, navigate to ‘Topics’ and ‘System’. We will open up the ‘Conversation boosting’ topic. When you first open the Conversation boosting topic, you will see a template of connected nodes. Delete all but the initial ‘Trigger’ node. Now we will rebuild the conversation boosting agent to call the Flow you built in the previous step. Select 'Add an Action' and then select the option for existing Power Automate flow. Pass the response from your Custom Agent to the end user and end the current topic. My existing Cloud Flow: Add action to connect to existing Cloud Flow: When this menu pops up, you should see the option to Run the flow you created. Here, mine does not have a very unique name, but you see my flow 'Run a flow from Copilot' as a Basic action menu item. If you do not see your cloud flow here add the flow to the default solution in the environment. Go to Solutions > select the All pill > Default Solution > then add the Cloud Flow you created to the solution. Then go back to Copilot Studio, refresh and the flow will be listed there. Now complete building out the conversation boosting topic: Make Agent Available in M365 Copilot: Navigate to the 'Channels' menu and select 'Teams + Microsoft 365'. Be sure to select the box to 'Make agent available in M365 Copilot'. Save and re-publish your Copilot Agent. It may take up to 24 hours for the Copilot Agent to appear in M365 Teams agents list. Once it has loaded, select the 'Get Agents' option from the side menu of Copilot and pin your Copilot Studio Agent to your featured agent list Now, you can chat with your custom Azure AI Agent, directly from M365 Copilot! Conclusion: By following these steps, you can successfully integrate custom Azure AI Agents with Copilot Studio and M365 Copilot, enhancing you’re the utility of your existing platform and improving operational efficiency. This integration allows you to automate tasks, streamline processes, and provide better user experience for your end-users. Give it a try! Curious of how to bring custom models from your AI Foundry to your Copilot Studio solutions? Check out this blog
hannahabbott
Dec 16, 2025 Place Microsoft Foundry Blog
19KViews
3likes
11Comments
From Large Semi-Structured Docs to Actionable Data: In-Depth Evaluation Approaches Guidance
Introduction Extracting structured data from large, semi-structured documents (the detailed solution implementation overview and architecture is provided in this tech community blog: From Large Semi-Structured Docs to Actionable Data: Reusable Pipelines with ADI, AI Search & OpenAI) demands a rigorous evaluation framework. The goal is to ensure our pipeline is accurate, reliable, and scalable before we trust it with mission-critical data. This framework breaks evaluation into clear phases, from how we prepare the document, to how we find relevant parts, to how we validate the final output. It provides metrics, examples, and best practices at each step, forming a generic pattern that can be applied to various domains. Framework Overview A very structured and stepwise approach for evaluation is given below: Establish Ground Truth & Sampling: Define a robust ground truth set and sampling method to fairly evaluate all parts of the document. Preprocessing Evaluation: Verify that OCR, chunking, and any structural augmentation (like adding headers) preserve all content and context. Labelling Evaluation: Check classification of sections/chunks by content based on topic/entity and ensure irrelevant data is filtered out without losing any important context. Retrieval Evaluation: Ensure the system can retrieve the right pieces of information (using search) with high precision@k and recall@k. Extraction Accuracy Evaluation: Measure how well the final structured data matches the expected values (field accuracy, record accuracy, overall precision/recall). Continuous Improvement Loop with SME: Use findings to retrain, tweak, and improve, enabling the framework to be reused for new documents and iterations. SMEs play a huge role in such scenarios. Detailed Guidance on Evaluation Below is a step-by-step, in-depth guide to evaluating this kind of IDP (Indelligent Document Processing) pipeline, covering both the overall system and its individual components: Establish Ground Truth & Sampling Why: Any evaluation is only as good as the ground truth it’s compared against. Start by assembling a reliable “source of truth” dataset for your documents. This often means manual labelling of some documents by domain experts (e.g., a legal team annotating key clauses in a contract, or accountants verifying invoice fields). Because manual curation is expensive, be strategic in what and how we sample. Ground Truth Preparation: Identify the critical fields and sections we need to extract, and create an annotated set of documents with those values marked correct. For example, if processing financial statements, we might mark the ground truth values for Total Assets, Net Income, Key Ratios, etc. This ground truth should be the baseline to measure accuracy against. Although creating it is labour-intensive, it yields a precise benchmark for model performance. Stratified Sampling: Documents like contracts or policies have diverse sections. To evaluate thoroughly, use stratified sampling – ensure your test set covers all major content types and difficulty levels. For instance, if 15% of pages in a set of contracts are annexes or addendums, then ~15% of your evaluation pages should come from annexes, not just the main body. This prevents the evaluation from overlooking challenging or rare sections. In practice, we might partition a document by section type (clauses, tables, schedules, footnotes) and sample a proportion from each. This way, metrics reflect performance on each type of content, not just the easiest portions. Multi-Voter Agreement (Consensus): It’s often helpful to have multiple automated voters on the outputs before involving humans. For example, suppose we extracted an invoice amount; we can have: A regex/format checker/fuzzy matching voter A cross-field logic checker/embedding based matching voter An ML model confidence score/LLM as a judge vote If all signals are strong, we label that extraction as Low Risk; if they conflict, mark it High Risk for human review. By tallying such “votes”, we create tiers of confidence. Why? Because in many cases, a large portion of outputs will be obviously correct (e.g., over 80% might have unanimous high confidence), and we can safely assume those are right, focusing manual review on the remainder. This strategy effectively reduces the human workload while maintaining quality. Preprocessing Evaluation Before extracting meaning, make sure the raw text and structure are captured correctly. Any loss here breaks the whole pipeline. Key evaluation checks: OCR / Text Extraction Accuracy Character/Error Rate: Sample pages to see how many words are recognized correctly (use per-word confidence to spot issues). Layout Preservation: Ensure reading order isn’t scrambled, especially in multi-column pages or footnotes. Content Coverage: Verify every sentence from a sample page appears in the extracted text. Missing footers or sidebars count as gaps. Chunking Completeness: Combined chunks should reconstruct the full document. Word counts should match. Segment Integrity: Chunks should align to natural boundaries (paragraphs, tables). Track a metric like “95% clean boundaries.” Context Preservation: If a table or section spans chunks, mark relationships so downstream logic sees them as connected. Multi-page Table Handling Header Insertion Accuracy: Validate that continued pages get the correct header (aim for high 90% to maintain context across documents). No False Headers: Ensure new tables aren’t mistakenly treated as continuations. Track a False Continuation Rate and push it to near zero. Practical Check: Sample multi-page tables across docs to confirm consistent extraction and no missed rows. Structural Links / References Link Accuracy: Confirm references (like footnotes or section anchors) map to the right targets (e.g., 98%+ correct). Ontology / Category Coverage: If content is pre-grouped, check precision (no mis-grouping) and recall (nothing left uncategorized). Implication The goal is to ensure the pre-processed chunks are a faithful, complete, and structurally coherent representation of the original document. Metrics like content coverage, boundary cleanliness, and header accuracy help catch issues early. Fixing them here saves significant downstream debugging. Labelling Evaluation – “Did we isolate the right pieces?” Once we chunk the document, we label those chunks (with ML or rules) to map them to the right entities and throw out the noise. Think of this step as sorting useful clauses from filler. Section/Entity Labelling Accuracy Treat labelling as a multi-class or multi-label classification problem. Precision (Label Accuracy): Of the chunks we labelled as X, how many were actually X? Example: Model tags 40 chunks as “Financial Data.” If 5 are wrong, precision is 87.5. High precision avoids polluting a category (topic/entity) with junk. Recall (Coverage): Of the chunks that truly belong to category X, how many did we catch? Example: Ground truth has 50 Financial Data chunks, model finds 45. Recall is 90%. High recall prevents missing important sections. Example: A model labels paper sections as Introduction, Methods, Results, etc. It marks 100 sections as Results and 95 are correct (95% precision). It misses 5 actual Results (slightly lower recall). That’s acceptable if downstream steps can still recover some items. But low precision means the labelling logic needs tightening. Implication Low precision means wrong info contaminates the category. Low recall means missing crucial bits. Use these metrics to refine definitions or adjust the labelling logic. Don’t just report one accuracy number; precision and recall per label tell the real story. Retrieval Evaluation – “Can we find the right info when we ask?” Many document pipelines use retrieval to narrow a huge file down to the few chunks most likely to contain the answer corresponding to a topic/entity. If we need a “termination date,” we first fetch chunks about dates or termination, then extract from those. Retrieval must be sharp, or everything downstream suffers. Precision@K How many of the top K retrieved chunks are actually relevant? If we grab 5 chunks for “Key Clauses” and 4 are correct, Precision@5 is 80%. We usually set K to whatever the next stage consumes (3 or 5). High precision keeps extraction clean. Average it across queries or fields. Critical fields may demand very high Precision@K. Recall@K Did we retrieve enough of the relevant chunks? If there are 2 relevant chunks in the doc but the top 5 results include only 1, recall is 50%. Good recall means we aren’t missing mentions in other sections or appendices. Increasing K improves recall but can dilute precision. Tune both together. Ranking Quality (MRR, NDCG) If order matters, use rank-aware metrics. MRR: Measures how early the first relevant result appears. Perfect if it’s always at rank 1. NDCG@K: Rewards having the most relevant chunks at the top. Useful when relevance isn’t binary. Most pipelines can get away with Precision@K and maybe MRR. Implication Test 50 QA pairs from policy documents, retrieving 3 passages per query. Average Precision@3: 85%. Average Recall@3: 92%. MRR: 0.8. Suppose, we notice “data retention” answers appear in appendices that sometimes rank low. We increase K to 5 for that query type. Precision@3 rises to 90%, and Recall@5 hits roughly 99%. Retrieval evaluation is a sanity check. If retrieval fails, extraction recall will tank no matter how good the extractor is. Measure both so we know where the leak is. Also keep an eye on latency and cost if fancy re-rankers slow things down. Extraction Accuracy Evaluation – “Did we get the right answers?” Look at each field and measure how often we got the right value. Precision: Of the values we extracted, what percent are correct? Use exact match or a lenient version if small format shifts don’t matter. Report both when useful. Recall: Out of all ground truth values, how many did we actually extract? Per-field breakdown: Some fields will be easy (invoice numbers, dates), others messy (vendor names, free text). A simple table makes this obvious and shows where to focus improvements. Error Analysis Numbers don’t tell the whole story. Look at patterns: OCR mix-ups Bad date or amount formats Wrong chunk retrieved upstream Misread tables Find the recurring mistakes. That’s where the fixes live. Holistic Metrics If needed, compute overall precision/recall across all extracted fields. But per-field and record-level are usually what matter to stakeholders. Implication Precision protects against wrong entries. Recall protects against missing data. Choose your balance based on risk: If false positives hurt more (wrong financial numbers), favour precision. If missing items hurts more (missing red-flag clauses), favour recall. Continuous Improvement Loop with SME Continuous improvement means treating evaluation as an ongoing feedback loop rather than a one-time check. Each phase’s errors point to concrete fixes, and every fix is re-measured to ensure accuracy moves in the right direction without breaking other components. The same framework also supports A/B testing alternative methods and monitoring real production data to detect drift or new document patterns. Because the evaluation stages are modular, they generalize well across domains such as contracts, financial documents, healthcare forms, or academic papers with only domain-specific tweaks. Over time, this creates a stable, scalable and measurable path toward higher accuracy, better robustness, and easier adaptation to new document types. Conclusion Building an end-to-end evaluation framework isn’t just about measuring accuracy, it’s about creating trust in the entire pipeline. By breaking the process into clear phases, defining robust ground truth, and applying precision/recall-driven metrics at every stage, we ensure that document processing systems are reliable, scalable, and adaptable. This structured approach not only highlights where improvements are needed but also enables continuous refinement through SME feedback and iterative testing. Ultimately, such a framework transforms evaluation from a one-time exercise into a sustainable practice, paving the way for higher-quality outputs across diverse domains.
anishganguli
Dec 15, 2025 Place Azure Architecture Blog
156Views
1like
0Comments
Streamline Azure NetApp Files Management—Right from Your IDE
The Azure NetApp Files VS Code Extension is designed to streamline storage provisioning and management directly within the developer’s IDE. Traditional workflows often require extensive portal navigation, manual configuration, and policy management, leading to inefficiencies and context switching. The extension addresses these challenges by enabling AI-powered automation through natural language commands, reducing provisioning time from hours to minutes while minimizing errors and improving compliance. Key capabilities include generating production-ready ARM templates, validating resources, and delivering optimization insights—all without leaving the coding environment.
GeertVanTeylingen
Dec 15, 2025 Place Azure Architecture Blog
85Views
0likes
0Comments
How Azure NetApp Files Object REST API powers Azure and ISV Data and AI services – on YOUR data
This article introduces the Azure NetApp Files Object REST API, a transformative solution for enterprises seeking seamless, real-time integration between their data and Azure's advanced analytics and AI services. By enabling direct, secure access to enterprise data—without costly transfers or duplication—the Object REST API accelerates innovation, streamlines workflows, and enhances operational efficiency. With S3-compatible object storage support, it empowers organizations to make faster, data-driven decisions while maintaining compliance and data security. Discover how this new capability unlocks business potential and drives a new era of productivity in the cloud.
GeertVanTeylingen
Dec 15, 2025 Place Azure Architecture Blog
720Views
0likes
0Comments
Accelerating HPC and EDA with Powerful Azure NetApp Files Enhancements
High-Performance Computing (HPC) and Electronic Design Automation (EDA) workloads demand uncompromising performance, scalability, and resilience. Whether you're managing petabyte-scale datasets or running compute intensive simulations, Azure NetApp Files delivers the agility and reliability needed to innovate without limits.
GeertVanTeylingen
Dec 15, 2025 Place Azure Architecture Blog
521Views
1like
0Comments
Context-Aware RAG System with Azure AI Search to Cut Token Costs and Boost Accuracy
🚀 Introduction As AI copilots and assistants become integral to enterprises, one question dominates architecture discussions: “How can we make large language models (LLMs) provide accurate, source-grounded answers — without blowing up token costs?” Retrieval-Augmented Generation (RAG) is the industry’s go-to strategy for this challenge. But traditional RAG pipelines often use static document chunking, which breaks semantic context and drives inefficiencies. To address this, we built a context-aware, cost-optimized RAG pipeline using Azure AI Search and Azure OpenAI, leveraging AI-driven semantic chunking and intelligent retrieval. The result: accurate answers with up to 85% lower token consumption. Majorly in this blog we are considering: Tokenization Chunking The Problem with Naive Chunking Most RAG systems split documents by token or character count (e.g., every 1,000 tokens). This is easy to implement but introduces real-world problems: 🧩 Loss of context — sentences or concepts get split mid-idea. ⚙️ Retrieval noise — irrelevant fragments appear in top results. 💸 Higher cost — you often send 5× more text than necessary. These issues degrade both accuracy and cost efficiency. 🧠 Context-Aware Chunking: Smarter Document Segmentation Instead of breaking text arbitrarily, our system uses an LLM-powered preprocessor to identify semantic boundaries — meaning each chunk represents a complete and coherent concept. Example Naive chunking: “Azure OpenAI Service offers… [cut] …integrates with Azure AI Search for intelligent retrieval.” Context-aware chunking: “Azure OpenAI Service provides access to models like GPT-4o, enabling developers to integrate advanced natural language understanding and generation into their applications. It can be paired with Azure AI Search for efficient, context-aware information retrieval.” ✅ The chunk is self-contained and semantically meaningful. This allows the retriever to match queries with conceptually complete information rather than partial sentences — leading to precision and fewer chunks needed per query. Architecture Diagram Chunking Service: Purpose: Transforms messy enterprise data (wikis, PDFs, transcripts, repos, images) into structured, model-friendly chunks for Retrieval-Augmented Generation (RAG). ChallengeChunking FixLLM context limitsBreaks docs into smaller piecesEmbedding sizeKeeps within token boundsRetrieval accuracyGranular, relevant sections onlyNoiseRemoves irrelevant blocksTraceabilityChunk IDs for auditabilityCost/latencyRe-embed only changed chunks The Chunking Flow (End-to-End) The Chunking Service sits in the ingestion pipeline and follows this sequence: Ingestion: Raw text arrives from sources (wiki, repo, transcript, PDF, image description). Token-aware splitting: Large text is cut into manageable pre-chunks with a 100-token overlap, ensuring no semantic drift across boundaries. Semantic segmentation: Each pre-chunk is passed to an Azure OpenAI Chat model with a structured prompt. Output = JSON array of semantic chunks (sectiontitle, speaker, content). Optional overlap injection: Character-level overlap can be applied across chunks for discourse-heavy text like meeting transcripts. Embedding generation: Each chunk is passed to Azure OpenAI Embeddings API (text-embedding-3-small), producing a 1536-dimension vector. Indexing: Chunks (text + vectors) are uploaded to Azure AI Search. Retrieval: During question answering or document generation, the system pulls top-k chunks, concatenates them, and enriches the prompt for the LLM. Resilience & Traceability The service is built to handle real-world pipeline issues. It retries once on rate limits, validates JSON outputs, and fails fast on malformed data instead of silently dropping chunks. Each chunk is assigned a unique ID (chunk_<sequence>_<sourceTag>), making retrieval auditable and enabling selective re-embedding when only parts of a document change. ☁️ Why Azure AI Search Matters Here Azure AI Search (formerly Cognitive Search) is the heart of the retrieval pipeline. Key Roles: Vector Search Engine: Stores embeddings of chunks and performs semantic similarity search. Hybrid Search (Keyword + Vector): Combines lexical and semantic matching for high precision and recall. Scalability: Supports millions of chunks with blazing-fast search latency. Metadata Filtering: Enables fine-grained retrieval (e.g., by document type, author, section). Native Integration with Azure OpenAI: Allows a seamless, end-to-end RAG pipeline without third-party dependencies. In short, Azure AI Search provides the speed, scalability, and semantic intelligence to make your RAG pipeline enterprise-grade. 💡 Importance of Azure OpenAI Azure OpenAI complements Azure AI Search by providing: High-quality embeddings (text-embedding-3-large) for accurate vector search. Powerful generative reasoning (GPT-4o or GPT-4.1) to craft contextually relevant answers. Security and compliance within your organization’s Azure boundary — critical for regulated environments. Together, these two services form the retrieval (Azure AI Search) and generation (Azure OpenAI) halves of your RAG system. 💰 Token Efficiency By limiting the model’s input to only the most relevant, semantically meaningful chunks, you drastically reduce prompt size and cost. Approach Tokens per Query Typical Cost Accuracy Full-document prompt ~15,000–20,000 Very high Medium Fixed-size RAG chunks ~5,000–8,000 Moderate Medium-high Context-aware RAG (this approach) ~2,000–3,000 Low High 💰 Token Cost Reduction Analysis Let’s quantify it: Step Naive Approach (no RAG) Your Approach (Context-Aware RAG) Prompt context size Entire document (e.g., 15,000 tokens) Top 3 chunks (e.g., 2,000 tokens) Tokens per query ~16,000 (incl. user + system) ~2,500 Cost reduction — ~84% reduction in token usage Accuracy Often low (hallucinations) Higher (targeted retrieval) That’s roughly an 80–85% reduction in token usage while improving both accuracy and response speed. 🧱 Tech Stack Overview Component Service Purpose Chunking Engine Azure OpenAI (GPT models) Generate context-aware chunks Embedding Model Azure OpenAI Embedding API Create high-dimensional vectors Retriever Azure AI Search Perform hybrid and vector search Generator Azure OpenAI GPT-4o Produce final answer Orchestration Layer Python / FastAPI / .NET c# Handle RAG pipeline 🔍 The Bottom Line By adopting context-aware chunking and Azure AI Search-powered RAG, you achieve: ✅ Higher accuracy (contextually complete retrievals) 💸 Lower cost (token-efficient prompts) ⚡ Faster latency (smaller context per call) 🧩 Scalable and secure architecture (fully Azure-native) This is the same design philosophy powering Microsoft Copilot and other enterprise AI assistants today. 🧪 Real-Life Example: Context-Aware RAG in Action To bring this architecture to life, let’s walk through a simple example of how documents can be chunked, embedded, stored in Azure AI Search, and then queried to generate accurate, cost-efficient answers. Imagine you want to build an internal knowledge assistant that answers developer questions from your company’s Azure documentation. ⚙️ Step 1: Intelligent Document Chunking We’ll use a small LLM call to segment text into context-aware chunks — rather than fixed token counts //Context Aware Chunking //text can be your retrieved text from any page/ document private async Task<List<SemanticChunk>> AzureOpenAIChunk(string text) { try { string prompt = $@" Divide the following text into logical, meaningful chunks. Each chunk should represent a coherent section, topic, or idea. Return the result as a JSON array, where each object contains: - sectiontitle - speaker (if applicable, otherwise leave empty) - content Do not add any extra commentary or explanation. Only output the JSON array. Do not give content an array, try to keep all in string. TEXT: {text}" var client = GetAzureOpenAIClient(); var chatCompletionsOptions = new ChatCompletionOptions { Temperature = 0, FrequencyPenalty = 0, PresencePenalty = 0 }; var Messages = new List<OpenAI.Chat.ChatMessage> { new SystemChatMessage("You are a text processing assistant."), new UserChatMessage(prompt) }; var chatClient = client.GetChatClient( deploymentName: _appSettings.Agent.Model); var response = await chatClient.CompleteChatAsync(Messages, chatCompletionsOptions); string responseText = response.Value.Content[0].Text.ToString(); string cleaned = Regex.Replace(responseText, @"```[\s\S]*?```", match => { var match1 = match.Value.Replace("```json", "").Trim(); return match1.Replace("```", "").Trim(); }); // Try to parse the response as JSON array of chunks return CreateChunkArray(cleaned); } catch (JsonException ex) { _logger.LogError("Failed to parse GPT response: " + ex.Message); throw; } catch (Exception ex) { _logger.LogError("Error in AzureOpenAIChunk: " + ex.Message); throw; } } 🧠 Step 2: Adding Overlaps for better result We are adding overlapping between chunks for better and accurate answers. Overlapping window can be modified based on the documents. public List<SemanticChunk> AddOverlap(List<SemanticChunk> chunks, string IDText, int overlapChars = 0) { var overlappedChunks = new List<SemanticChunk>(); for (int i = 0; i < chunks.Count; i++) { var current = chunks[i]; string previousOverlap = i > 0 ? chunks[i - 1].Content[^Math.Min(overlapChars, chunks[i - 1].Content.Length)..] : ""; string combinedText = previousOverlap + "\n" + current.Content; var Id = $"chunk_{i + '_' + IDText}"; overlappedChunks.Add(new SemanticChunk { Id = Regex.Replace(Id, @"[^A-Za-z0-9_\-=]", "_"), Content = combinedText, SectionTitle = current.SectionTitle }); } return overlappedChunks; } 🧠 Step 3: Generate and Store Embeddings in Azure AI Search We convert each chunk into an embedding vector and push it to an Azure AI Search index. public async Task<List<SemanticChunk>> AddEmbeddings(List<SemanticChunk> chunks) { var client = GetAzureOpenAIClient(); var embeddingClient = client.GetEmbeddingClient("text-embedding-3-small"); foreach (var chunk in chunks) { // Generate embedding using the EmbeddingClient var embeddingResult = await embeddingClient.GenerateEmbeddingAsync(chunk.Content).ConfigureAwait(false); chunk.Embedding = embeddingResult.Value.ToFloats(); } return chunks; } public async Task UploadDocsAsync(List<SemanticChunk> chunks) { try { var indexClient = GetSearchindexClient(); var searchClient = indexClient.GetSearchClient(_indexName); var result = await searchClient.UploadDocumentsAsync(chunks); } catch (Exception ex) { _logger.LogError("Failed to upload documents: " + ex); throw; } } 🤖 Step 4: Generate the Final Answer with Azure OpenAI Now we combine the top chunks with the user query to create a cost-efficient, context-rich prompt. P.S. : Here in this example we have used semantic kernel agent , in real time any agent can be used and any prompt can be updated. var context = await _aiSearchService.GetSemanticSearchresultsAsync(UserQuery); // Gets chunks from Azure AI Search //here UserQuery is query asked by user/any question prompt which need to be answered. string questionWithContext = $@"Answer the question briefly in short relevant words based on the context provided. Context : {context}. \n\n Question : {UserQuery}?"; var _agentModel = new AgentModel() { Model = _appSettings.Agent.Model, AgentName = "Answering_Agent", Temperature = _appSettings.Agent.Temperature, TopP = _appSettings.Agent.TopP, AgentInstructions = $@"You are a cloud Migration Architect. " + "Analyze all the details from top to bottom in context based on the details provided for the Migration of APP app using Azure Services. Do not assume anything." + "There can be conflicting details for a question , please verify all details of the context. If there are any conflict please start your answer with word - **Conflict**." + "There might not be answers for all the questions, please verify all details of the context. If there are no answer for question just mention - **No Information**" }; _agentModel = await _agentService.CreateAgentAsync(_agentModel); _agentModel.QuestionWithContext = questionWithContext; var modelWithResponse = await _agentService.GetAnswerAsync(_agentModel); 🧠 Final Thoughts Context-aware RAG isn’t just a performance optimization — it’s an architectural evolution. It shifts the focus from feeding LLMs more data to feeding them the right data. By letting Azure AI Search handle intelligent retrieval and Azure OpenAI handle reasoning, you create an efficient, explainable, and scalable AI assistant. The outcome: Smarter answers, lower costs, and a pipeline that scales with your enterprise. Wiki Link: Tokenization and Chunking IP Link: AI Migration Accelerator
Shikhaghildiyal
Dec 12, 2025 Place Microsoft Foundry Blog
1.3KViews
4likes
1Comment
Introducing Cohere Rerank 4.0 in Microsoft Foundry
These new retrieval models deliver state-of-the-art accuracy, multilingual coverage across 100+ languages, and breakthrough performance for enterprise search and retrieval-augmented generation (RAG) systems. With Rerank 4.0, customers can dramatically improve the quality of search, reduce hallucinations in RAG applications, and strengthen the reasoning capabilities of their AI agents, all with just a few lines of code. Why Rerank Models Matter for Enterprise AI Retrieval is the foundation of grounded AI systems. Whether you are building an internal assistant, a customer-facing chatbot, or a domain-specific knowledge engine, the quality of the retrieved documents determines the quality of the final answer. Traditional embeddings get you close, but reranking is what gets you the right answer. Rerank improves this step by reading both the query and document together (cross-encoding), producing highly precise semantic relevance scores. This means: More accurate search results More grounded responses in RAG pipelines Lower generative model usage , reducing cost Higher trust and quality across enterprise workloads Introducing Cohere Rerank 4.0 Fast and Rerank 4.0 Pro Microsoft Foundry now offers two versions of Rerank 4.0 to meet different enterprise needs: Rerank 4.0 Fast Best balance of speed and accuracy Same latency as Cohere Rerank 3.5, with significantly higher accuracy Ideal for high-traffic applications and real-time systems Rerank 4.0 Pro Highest accuracy across all benchmarks Excels at complex, reasoning-heavy, domain-specific retrieval Tuned for industries like finance, healthcare, manufacturing, government, and energy Multilingual & Cross-Domain Performance Rerank 4.0 delivers unmatched multilingual and cross-domain performance, supporting more than 100 languages and enabling powerful cross-lingual search across complex enterprise datasets. The models achieve state-of-the-art accuracy in 10 of the world’s most important business languages, including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, and Spanish, making them exceptionally well suited for global organizations with multilingual knowledge bases, compliance archives, or international operations. Effortless Integration: Add Rerank to Any System One of the biggest benefits of Rerank 4.0 is how easy it is to adopt. You can add reranking to: Existing enterprise search Vector DB pipelines Keyword search systems Hybrid retrieval setups RAG architectures Agent workflows No infrastructure changes required. Just a few lines of code.This makes it one of the fastest ways to meaningfully upgrade grounding, precision, and search quality in enterprise AI systems. Better RAG, Better Agents, Better Outcomes In Foundry, customers can pair Cohere Rerank 4.0 with Azure Search, vector databases, Agent Service, Azure Functions, Foundry orchestration, and any LLM—including GPT-4.1, Claude, DeepSeek, and Mistral—to deliver more grounded copilots, higher-fidelity agent actions, and better reasoning from cleaner context windows. This reduces hallucinations, lowers LLM spend, and provides a foundational upgrade for mission-critical AI systems. Built for Enterprise: Security, Observability, Governance As a direct from Azure model, Rerank 4.0 is fully integrated with: Azure role-based access control (RBAC) Virtual network isolation Customer-managed keys Logging & observability Entra ID authentication Private deployments You can run Rerank 4.0 in environments that meet the strictest enterprise security and compliance needs. Optimized for Enterprise Models & High-Value Industries Rerank 4.0 is built for sectors where accuracy matters: Finance - Delivers precise retrieval for complex disclosures, compliance documents, and regulatory filings. Healthcare- Accurately retrieves clinical notes, biomedical literature, and care protocols for safer, more reliable insights. Manufacturing- Surfaces the right engineering specs, manuals, and parts data to streamline operations and reduce downtime. Government & Public Sector - Improves access to policy documents, case archives, and citizen service information with semantic precision. Energy- Understands industrial logs, safety manuals, and technical standards to support safer and more efficient operations. Pricing Model Name Deployment Type Azure Resource Region Price /1K Search Units Availability Cohere Rerank 4.0 Pro Global Standard All regions (Check this page for region details) $2.50 Public Preview, Dec 11, 2025 Cohere Rerank 4.0 Fast Global Standard All regions (Check this page for region details) $2.00 Public Preview, Dec 11, 2025 Get Started Today Cohere Rerank 4.0 Fast and Rerank 4.0 Pro are now available in Microsoft Foundry. Rerank 4.0 is one of the simplest and highest impact upgrades you can make to your enterprise AI stack, bringing better retrieval, better agents, and more trustworthy AI to every application.
Naomi Moneypenny
Dec 11, 2025 Place Microsoft Foundry Blog
2KViews
2likes
0Comments