microsoft foundry
21 TopicsBlack Forest Labs FLUX.2 Visual Intelligence for Enterprise Creative now on Microsoft Foundry
Black Forest Labs’ (BFL) FLUX.2 is now available on Microsoft Foundry. Building on FLUX1.1 [pro] and FLUX.1 Kontext [pro], we’re excited to introduce FLUX.2 [pro] which continues to push the frontier for visual intelligence. FLUX.2 [pro] delivers state-of-the-art quality with pre-optimized settings, matching the best closed models for prompt adherence and visual fidelity while generating faster at lower cost. Prompt: "Cinematic film still of a woman walking alone through a narrow Madrid street at night, warm street lamps, cool blue shadows, light rain reflecting on cobblestones, moody and atmospheric, shallow depth of field, natural skin texture, subtle film grain and introspective mood" This prompt shines because it taps into FLUX.2 [pro]'s cinematic‑lighting engine, letting the model fuse warm street‑lamp glow and cool shadows into a visually striking, film‑grade composition. What’s game-changing about FLUX.2 [pro]? FLUX.2 is designed for real-world creative workflows where consistency, accuracy, and iteration speed determine whether AI generation can replace traditional production pipelines. The model understands lighting, perspective, materials, and spatial relationships. It maintains characters and products consistent across up to 10 reference images simultaneously. It adheres to brand constraints like exact hex colors and legible text. The result: production-ready assets with fewer touchups and stronger brand fidelity. What’s New: Production‑grade quality up to 4MP: High‑fidelity, coherent scenes with realistic lighting, spatial logic, and fine detail suitable for product photography and commercial use cases. Multi‑reference consistency: Reference up to 10 images simultaneously with the best character, product, and style consistency available today. Generate dozens of brand-compliant assets where identity stays perfectly aligned shot to shot. Brand‑accurate results: Exact hex‑color matching, reliable typography, and structured controls (JSON, pose guidance) mean fewer manual fixes and stronger brand compliance. Strong prompt fidelity for complex directions: Improved adherence to complex, structured instructions including multi-part prompts, compositional constraints, and JSON-based controls. 32K token context supports long, detailed workflows with exact positioning specifications, physics-aware lighting, and precise compositional requirements in a single prompt. Optimized inference: FLUX.2 [pro] delivers state-of-the-art quality with pre-optimized inference settings, generating faster at lower cost than competing closed models. FLUX.2 transforms creative production economics by enabling workflows that weren't possible with earlier systems. Teams ship complete campaigns in days instead of weeks, with fewer manual touchups and stronger brand fidelity at scale. This performance stems from FLUX.2's unified architecture, which combines generation and editing in a single latent flow matching model. How it Works FLUX.2 combines image generation and editing in a single latent flow matching architecture, coupling a Mistral‑3 24B vision‑language model (VLM) with a rectified flow transformer. The VLM brings real‑world knowledge and contextual understanding, while the flow transformer models spatial relationships, material properties, and compositional logic that earlier architectures struggled to render. FLUX.2’s architecture unifies visual generation and editing, fuses language‑grounded understanding with flow‑based spatial modeling, and delivers production‑ready, brand‑safe images with predictable control especially when you need consistent identity, exact colors, and legible typography at high resolution. Technical details can be found in the FLUX.2 VAE blog post. Top enterprise scenarios & patterns to try with FLUX.2 [pro] The addition of FLUX.2 [pro] is the next step in the evolution for delivering faster, richer, and more controllable generation unlocking a new wave of creative potential for enterprises. Bring FLUX.2 [pro] into your workflow and transform your creative pipeline from concept to production by trying out these patterns: Enterprise scenarios Patterns to try E‑commerce hero shots Start with a small set of references (product front, material/texture, logo). Prompt for a studio hero shot on a white seamless background, three‑quarter view, softbox key + subtle rim light. Include exact hex for brand accents and specify logo placement. Output at 4MP. Product variants at scale Reuse the hero references; ask for specific colorway, angle, and background variants (e.g., “Create {COLOR} variant, {ANGLE} view, {BG} background”). Keep brand hex and logo position constant across variants. Campaign consistency (character/product identity) Provide 5–10 reference images for the character/product (faces, outfits, mood boards). Request the same identity across scenes with consistent lighting/style (e.g., cinematic warm daylight) and defined environments (e.g., urban rooftop). Marketing templates & localization Define a template (e.g., 3‑column grid: left image, right text). Set headline/body sizes (e.g., 24pt/14pt), contrast ≥ 4.5:1, and brand font. Swap localized copy per locale while keeping layout and spacing consistent. Best practices to get to production readiness with Microsoft Foundry FLUX.2 [pro] brings state-of-the-art image quality to your fingertips. In Microsoft Foundry, you can turn those capabilities into predictable, governed outcomes by standardizing templates, managing references, enforcing brand rules, and controlling spend. These practices below leverage FLUX.2 [pro]’s visual intelligence and turn them into repeatable recipes, auditable artifacts, and cost‑controlled processes within a governed Foundry pipeline. Best Practice What to do Foundry tip Approved templates Create 3–5 templates (e.g., hero shot, variant gallery, packaging, social card) with sections for Composition (camera, lighting, environment), Brand (hex colors, logo placement), Typography (font, sizes, contrast), and Output (resolution, format). Store templates in Foundry as approved artifacts; version them and restrict edits via RBAC. Versioned reference sets Keep 3–10 references per subject (product: front/side/texture; talent: face/outfit/mood) and link them to templates. Save references in governed Foundry storage; reference IDs travel with the job metadata. Resolution staging Use a three‑stage plan: Concept (1–2MP) → Review (2–3MP) → Final (4MP). Leverage FLUX.1 [pro] and FLUX1.1 Kontext [pro] before the Final stage for fast iteration and cost control Enforce stage‑based quotas and cap max resolution per job; require approval to move to 4MP. Automated QA & approvals Run post‑generation checks for color match, text legibility, and safe‑area compliance; gate final renders behind a review step. Use Foundry workflows to require sign‑off at the Review stage before Final stage. Telemetry & feedback Track latency, success rate, usage, and cost per render; collect reviewer notes and refine templates. Dashboards in Foundry: monitor job health, cost, and template performance. Foundry Models continues to grow with cutting-edge additions to meet every enterprise need—including models from Black Forest Labs, OpenAI, and more. From models like GPT‑image‑1, FLUX.2 [pro], and Sora 2, Microsoft Foundry has become the place where creators push the boundaries of what’s possible. Watch how Foundry transforms creative workflows with this demo: Customer Stories As seen at Ignite 2025, real‑world customers like Sinyi Realty have already demonstrated the efficiency of Black Forest Lab’s models on Microsoft Foundry by choosing FLUX.1 Kontext [pro] for its superior performance and selective editing. For their new 'Clear All' feature, they preferred a model that preserves the original room structure and simply removes clutter, rather than generating a new space from scratch, saving time and money. Read the story to learn more. “We wanted to stay in the same workspace rather than having to maintain different platforms,” explains TeWei Hsieh, who works in data engineering and data architecture. “By keeping FLUX Kontext model in Foundry, our data scientists and data engineers can work in the same environment.” As customers like Sinyi Realty have already shown, BFL FLUX models raise the bar for speed, precision, and operational efficiency. With FLUX.2 now on Microsoft Foundry, organizations can bring that same competitive edge directly into their own production pipelines. FLUX.2 [pro] Pricing Foundry Models are fully hosted and managed on Azure. FLUX.2 [pro] is available through pay-as-you-go and on Global Standard deployment type with the following pricing: Generated image: The first generated megapixel (MP) is charged $0.03. Each subsequent megapixel is charged $0.015. Reference image(s): We charge $0.015 for each megapixel. Important Notes: For pricing, resolution is always rounded up to the next megapixel, separately for each reference image and for the generated image. 1 megapixel is counted as 1024x1024 pixels For multiple reference images, each reference image is counted as 1 megapixel Images exceeding 4 megapixels are resized to 4 megapixels Reference the Foundry Models pricing page for pricing. Build Trustworthy AI Solutions Black Forest Labs models in Foundry Models are delivered under the Microsoft Product Terms, giving you enterprise-grade security and compliance out of the box. Each FLUX endpoint offers Content Safety controls and guardrails. Runtime protections include built-in content-safety filters, role-based access control, virtual-network isolation, and automatic Azure Monitor logging. Governance signals stream directly into Azure Policy, Purview, and Microsoft Sentinel, giving security and compliance teams real-time visibility. Together, Microsoft's capabilities let you create with more confidence, knowing that privacy, security, and safety are woven into every Black Forest Labs deployment from day one. Getting Started with FLUX.2 in Microsoft Foundry If you don’t have an Azure subscription, you can sign up for an Azure account here. Search for the model name in the model catalog in Foundry under “Build.” FLUX.2-pro Open the model card in the model catalog. Click on deploy to obtain the inference API and key. View your deployment under Build > Models. You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground. You can use the API and key with various clients. Learn More ▶️ RSVP for the next Model Monday LIVE on YouTube or On-Demand 👩💻 Explore FLUX.2 Documentation on Microsoft Learn 👋 Continue the conversation on Discord1KViews0likes2CommentsUnable to publish Foundry agent to M365 copilot or Teams
I’m encountering an issue while publishing an agent in Microsoft Foundry to M365 Copilot or Teams. After creating the agent and Foundry resource, the process automatically created a Bot Service resource. However, I noticed that this resource has the same ID as the Application ID shown in the configuration. Is this expected behavior? If not, how should I resolve it? I followed the steps in the official documentation: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/publish-copilot?view=foundry Despite this, I keep getting the following error: There was a problem submitting the agent. Response status code does not indicate success: 401 (Unauthorized). Status Code: 401 Any guidance on what might be causing this and how to fix it would be greatly appreciated.Solved117Views0likes3CommentsOpen AI’s GPT-5.1-codex-max in Microsoft Foundry: Igniting a New Era for Enterprise Developers
Announcing GPT-5.1-codex-max: The Future of Enterprise Coding Starts Now We’re thrilled to announce the general availability of OpenAI's GPT-5.1-codex-max in Microsoft Foundry Models; a leap forward that redefines what’s possible for enterprise-grade coding agents. This isn’t just another model release; it’s a celebration of innovation, partnership, and the relentless pursuit of developer empowerment. At Microsoft Ignite, we unveiled Microsoft Foundry: a unified platform where businesses can confidently choose the right model for every job, backed by enterprise-grade reliability. Foundry brings together the best from OpenAI, Anthropic, xAI, Black Forest Labs, Cohere, Meta, Mistral, and Microsoft’s own breakthroughs, all under one roof. Our partnership with Anthropic is a testament to our commitment to giving developers access to the most advanced, safe, and high-performing models in the industry. And now, with GPT-5.1-codex-max joining the Foundry family, the possibilities for intelligent applications and agentic workflows have never been greater. GPT 5.1-codex-max is available today in Microsoft Foundry and accessible in Visual Studio Code via the Foundry extension . Meet GPT-5.1-codex-max: Enterprise-Grade Coding Agent for Complex Projects GPT-5.1-codex-max is engineered for those who build the future. Imagine tackling complex, long-running projects without losing context or momentum. GPT-5.1-codex-max delivers efficiency at scale, cross-platform readiness, and proven performance with top scores on SWE-Bench (77.9), the gold standard for AI coding. With GPT-5.1-codex-max, developers can focus on creativity and problem-solving, while the model handles the heavy lifting. GPT-5.1-codex-max isn’t just powerful; it’s practical, designed to solve real challenges for enterprise developers: Multi-Agent Coding Workflows: Automate repetitive tasks across microservices, maintaining shared context for seamless collaboration. Enterprise App Modernization: Effortlessly refactor legacy .NET and Java applications into cloud-native architectures. Secure API Development: Generate and validate secure API endpoints, with `compliance checks built-in for peace of mind. Continuous Integration Support: Integrate GPT-5.1-codex-max into CI/CD pipelines for automated code reviews and test generation, accelerating delivery cycles. These use cases are just the beginning. GPT-5.1-codex-max is your partner in building robust, scalable, and secure solutions. Foundry: Platform Built for Developers Who Build the Future Foundry is more than a model catalog—it’s an enterprise AI platform designed for developers who need choice, reliability, and speed. • Choice Without Compromise: Access the widest range of models, including frontier models from leading model providers. • Enterprise-Grade Infrastructure: Built-in security, observability, and governance for responsible AI at scale. • Integrated Developer Experience: From GitHub to Visual Studio Code, Foundry connects with tools developers love for a frictionless build-to-deploy journey. Start Building Smarter with GPT-5.1-codex-max in Foundry The future is here, and it’s yours to shape. Supercharge your coding workflows with GPT-5.1-codex-max in Microsoft Foundry today. Learn more about Microsoft Foundry: aka.ms/IgniteFoundryModels. Watch Ignite sessions for deep dives and demos: ignite.microsoft.com. Build faster, smarter, and with confidence on the platform redefining enterprise AI.4KViews3likes5CommentsBeyond the Model: Empower your AI with Data Grounding and Model Training
Discover how Microsoft Foundry goes beyond foundational models to deliver enterprise-grade AI solutions. Learn how data grounding, model tuning, and agentic orchestration unlock faster time-to-value, improved accuracy, and scalable workflows across industries.443Views5likes3CommentsIntroducing OpenAI’s GPT-image-1.5 in Microsoft Foundry
Developers building with visual AI can often run into the same frustrations: images that drift from the prompt, inconsistent object placement, text that renders unpredictably, and editing workflows that break when iterating on a single asset. That’s why we are excited to announce OpenAI's GPT Image 1.5 is now generally available in Microsoft Foundry. This model can bring sharper image fidelity, stronger prompt alignment, and faster image generation that supports iterative workflows. Starting today, customers can request access to the model and start building in the Foundry platform. Meet GPT Image 1.5 AI driven image generation began with early models like OpenAI's DALL-E, which introduced the ability to transform text prompts into visuals. Since then, image generation models have been evolving to enhance multimodal AI across industries. GPT Image 1.5 represents continuous improvement in enterprise-grade image generation. Building on the success of GPT Image 1 and GPT Image 1 mini, these enhanced models introduce advanced capabilities that cater to both creative and operational needs. The new image models offer: Text-to-image: Stronger instruction following and highly precise editing. Image-to-image: Transform existing images to iteratively refine specific regions Improved visual fidelity: More detailed scenes and realistic rendering. Accelerated creation times: Up to 4x faster generation speed. Enterprise integration: Deploy and scale securely in Microsoft Foundry. GPT Image 1.5 delivers stronger image preservation and editing capabilities, maintaining critical details like facial likeness, lighting, composition, and color tone across iterative changes. You’ll see more consistent preservation of branded logos and key visuals, making it especially powerful for marketing, brand design, and ecommerce workflows—from graphics and logo creation to generating full product catalogs (variants, environments, and angles) from a single source image. Benchmarks Based on an internal Microsoft dataset, GPT Image 1.5 performs higher than other image generation models in prompt alignment and infographics tasks. It focuses on making clear, strong edits – performing best on single-turn modification, delivering the higher visual quality in both single and multi-turn settings. The following results were found across image generation and editing: Text to image Prompt alignment Diagram / Flowchart GPT Image 1.5 91.2% 96.9% GPT Image 1 87.3% 90.0% Qwen Image 83.9% 33.9% Nano Banana Pro 87.9% 95.3% Image editing Evaluation Aspect Modification Preservation Visual Quality Face Preservation Metrics BinaryEval SC (semantic) DINO (Visual) BinaryEval AuraFace Single-turn GPT image 1 99.2% 51.0% 0.14 79.5% 0.30 Qwen image 81.9% 63.9% 0.44 76.0% 0.85 GPT Image 1.5 100% 56.77% 0.14 89.96% 0.39 Multi-turn GPT Image 1 93.5% 54.7% 0.10 82.8% 0.24 Qwen image 77.3% 68.2% 0.43 77.6% 0.63 GPT image 1.5 92.49% 60.55% 0.15 89.46% 0.28 Using GPT Image 1.5 across industries Whether you’re creating immersive visuals for campaigns, accelerating UI and product design, or producing assets for interactive learning GPT Image 1.5 gives modern enterprises the flexibility and scalability they need. Image models can allow teams to drive deeper engagement through compelling visuals, speed up design cycles for apps, websites, and marketing initiatives, and support inclusivity by generating accessible, high‑quality content for diverse audiences. Watch how Foundry enables developers to iterate with multimodal AI across Black Forest Labs, OpenAI, and more: Microsoft Foundry empowers organizations to deploy these capabilities at scale, integrating image generation seamlessly into enterprise workflows. Explore the use of AI image generation here across industries like: Retail: Generate product imagery for catalogs, e-commerce listings, and personalized shopping experiences. Marketing: Create campaign visuals and social media graphics. Education: Develop interactive learning materials or visual aids. Entertainment: Edit storyboards, character designs, and dynamic scenes for films and games. UI/UX: Accelerate design workflows for apps and websites. Microsoft Foundry provides security and compliance with built-in content safety filters, role-based access, network isolation, and Azure Monitor logging. Integrated governance via Azure Policy, Purview, and Sentinel gives teams real-time visibility and control, so privacy and safety are embedded in every deployment. Learn more about responsible AI at Microsoft. Pricing Model Pricing (per 1M tokens) - Global GPT-image-1.5 Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $32 Cost efficiency improves as well: image inputs and outputs are now cheaper compared to GPT Image 1, enabling organizations to generate and iterate on more creative assets within the same budget. For detailed pricing, refer here. Getting started Learn more about image generation, explore code samples, and read about responsible AI protections here. Try GPT Image 1.5 in Microsoft Foundry and start building multimodal experiences today. Whether you’re designing educational materials, crafting visual narratives, or accelerating UI workflows, these models deliver the flexibility and performance your organization needs.4.7KViews1like1CommentFine-tuning at Ignite 2025: new models, new tools, new experience
Fine‑tuning isn’t just “better prompts.” It’s how you tailor a foundation model to your domain and tasks to get higher accuracy, lower cost, and faster responses -- then run it at scale. As Agents become more critical to businesses, we’re seeing growing demand for fine tuning to ensure agents are low latency, low cost, and call the right tools and the right time. At Ignite 2025, we saw how Docusign fine-tuned models that powered their document management system to achieve major gains: more than 50% cost reduction per document, 2x faster inference time, and significant improvements in accuracy. At Ignite, we launched several new features in Microsoft Foundry that make fine‑tuning easier, more scalable, and more impactful than ever with the goal of making agents unstoppable in the real world: New Open-Source models – Qwen3 32B, Ministral 3B, GPT-OSS-20B and Llama 3.3 70B – to give users access to Open-Source models in the same low friction experience as OpenAI Synthetic data generation to jump start your training journey – just upload your documents and our multi-agent system takes care of the rest Developer Training tier to reduce the barrier to entry by offering discounted training (50% off global!) on spot capacity Agentic Reinforcement Fine-tuning with GPT-5: leverage tool calling during chain of thought to teach reasoning models to use your tools to solve complex problems And if that wasn’t enough, we also released a re-imagined fine tuning experience in Foundry (new), providing access to all these capabilities in a simplified and unified UI. New Open-Source Models for Fine-tuning (Public Preview): Bringing open-source innovation to your fingertips We’ve expanded our model lineup to new open-source models you can fine-tune without worrying about GPUs or compute. Ministral-3B and Qwen3 32B are now available to fine-tune with Supervised Fine-Tuning (SFT) in Microsoft Foundry, enabling developers to adapt open-source models to their enterprise-specific domains with ease. Look out for Llama 3.3 70B and GPT-OSS-20B, coming next week! These OSS models are offered through a unified interface with OpenAI via the UI or Foundry SDK which means the same experience, regardless of model choice. These models can be used alongside your favorite Foundry tools, from AI Search to Evaluations, or to power your agents. Note: New OSS models are only available in "New" Foundry – so upgrade today! Like our OpenAI models, Open-Source models in Foundry charge per-token for training, making it simple to forecast and estimate your costs. All models are available on Global Standard tier, making discoverability easy. For more details on pricing, please see our Microsoft Foundry Models pricing page. Customers like Co-Star Group have already seen success leveraging fine tuning with Mistral models to power their home search experience on Homes.com. They selected Ministral-3B as a small, efficient model to power high volume, low latency processing with lower costs and faster deployment times than Frontier models – while still meeting their needs for accuracy, scalability, and availability thanks to fine tuning in Foundry. Synthetic data generation (Public Preview): Create high-quality training data automatically Developers can now generate high-quality, domain-specific synthetic datasets to close those persistent data gaps with synthetic data generation. One of the biggest challenges we hear teams face during fine-tuning is not having enough data or the right kind of data because it’s scarce, sensitive, or locked behind compliance constraints (think healthcare and finance). Our new synthetic data generation capability solves this by giving you a safe, scalable way to create realistic, diverse datasets tailored to your use case so you can fine-tune and evaluate models without waiting for perfect real-world data. Now, you can produce realistic question–answer pairs from your documents, or simulate multi‑turn tool‑use dialogues that include function calls without touching sensitive production data. How it works: Fine‑tuning datasets: Upload a reference file (PDF/Markdown/TXT) and Foundry converts it into SFT‑formatted Q&A pairs that reflect your domain’s language and nuances so your model learns from the right examples. Agent tool‑use datasets: Provide an OpenAPI (Swagger) spec, and Foundry simulates multi‑turn assistant–user conversations with tool calls, producing SFT‑ready examples that teach models to call your APIs reliably. Evaluation datasets: Generate distinct test queries tailored to your scenarios so you can measure model and agent quality objectively—separate from your training data to avoid false confidence. Agents succeed when they reliably understand domain intent and call the right tools at the right time. Foundry’s synthetic data generation does exactly that: it creates task‑specific training and test data so your agent learns from the right examples and you can prove it works before you go live so they are reliable in the real world. Developer Training Tier (Public Preview): 50% discount on training jobs Fine-tuning can be expensive, especially when you may need to run multiple experiments to create the right model for your production agents. To make it easier than ever to get started, we’re introducing Developer Training tier – providing users with a 50% discount when they choose to run workloads on pre-emptible capacity. It also lets users iterate faster: we support up to 10 concurrent jobs on Developer tier, making it ideal for running experiments in parallel. Because it uses reclaimable capacity, jobs may be pre‑empted and automatically resumed, so they may take longer to complete. When to use Developer Training tier: When cost matters - great for early experimentation or hyperparameter tuning thanks to 50% lower training cost. When you need high concurrency - supports up to 10 simultaneous jobs, ideal for running multiple experiments in parallel. When the workload is non‑urgent - suitable for jobs that can tolerate pre-emption and longer, capacity-dependent runtimes. Agentic Reinforcement Fine‑Tuning (RFT) (Private Preview): Train reasoning models to use your tools through outcome based optimization Building reliable AI agents requires more than copying correct behavior; models need to learn which reasoning paths lead to successful outcomes. While supervised fine-tuning trains models to imitate demonstrations, reinforcement fine-tuning optimizes models based on whether their chain of thought actually generates a successful outcome. It teaches them to think in new ways, about new domains – to solve complex problems. Agentic RFT applies this to tool-using workflows: the model generates multiple reasoning traces (including tool calls and planning steps), receives feedback on which attempts solved the problem correctly, and updates its reasoning patterns accordingly. This helps models learn effective strategies for tool sequencing, error recovery, and multi-step planning—behaviors that are difficult to capture through demonstrations alone. The difference now is that you can provide your own custom tools for use during chain of thought: models can interact with your own internal systems, retrieve the data they need, and access your proprietary APIs to solve your unique problems. Agentic RFT is currently available in private preview for o4-mini and GPT-5, with configurable reasoning effort, sampling rates, and per-run telemetry. Request access at aka.ms/agentic-rft-preview. What are customers saying? Fine-tuning is critical to achieve the accuracy and latency needed for enterprise agentic workloads. Decagon is used by many of the world’s most respected enterprises to build, manage and scale AI agents that can resolve millions of customer inquiries across chat, email, and voice – 24 hours a day, seven days a week. This experience is powered by fine-tuning: “Providing accurate responses with minimal latency is fundamental to Decagon’s product experience. We saw an opportunity to reduce latency while improving task-specific accuracy by fine-tuning models using our proprietary datasets. Via fine-tuning, we were able to exceed the performance of larger state of the art models with smaller, lighter-weight models which could be served significantly faster.” -- Cyrus Asgari, Lead Research Engineer for fine-tuning at Decagon But it’s not just agent-first startups seeing results. Companies like Discover Bank are using fine tuned models to provide better customer experiences with personal banking agents: We consolidated three steps into one, response times that were previously five or six seconds came down to one and a half to two seconds on average. This approach made the system more efficient and the 50% reduction in latency made conversations with Discovery AI feel seamless. - Stuart Emslie, Head of Actuarial and Data Science at Discovery Bank Fine-tuning has evolved from an optimization technique to essential infrastructure for production AI. Whether building specialized agents or enhancing existing products, the pattern is clear: custom-trained models deliver the accuracy and speed that general-purpose models can't match. As techniques like Agentic RFT and synthetic data generation mature, the question isn't whether to fine-tune, but how to build the systems to do it systematically. Learn More 🧠 Get Started with fine-tuning with Azure AI Foundry on Microsoft Learn Docs ▶️ Watch On-Demand: https://ignite.microsoft.com/en-US/sessions/BRK188?source=sessions 👩 Try the demos: aka.ms/FT-ignite-demos 👋 Continue the conversation on Discord592Views0likes0CommentsIntroducing Cohere Rerank 4.0 in Microsoft Foundry
These new retrieval models deliver state-of-the-art accuracy, multilingual coverage across 100+ languages, and breakthrough performance for enterprise search and retrieval-augmented generation (RAG) systems. With Rerank 4.0, customers can dramatically improve the quality of search, reduce hallucinations in RAG applications, and strengthen the reasoning capabilities of their AI agents, all with just a few lines of code. Why Rerank Models Matter for Enterprise AI Retrieval is the foundation of grounded AI systems. Whether you are building an internal assistant, a customer-facing chatbot, or a domain-specific knowledge engine, the quality of the retrieved documents determines the quality of the final answer. Traditional embeddings get you close, but reranking is what gets you the right answer. Rerank improves this step by reading both the query and document together (cross-encoding), producing highly precise semantic relevance scores. This means: More accurate search results More grounded responses in RAG pipelines Lower generative model usage , reducing cost Higher trust and quality across enterprise workloads Introducing Cohere Rerank 4.0 Fast and Rerank 4.0 Pro Microsoft Foundry now offers two versions of Rerank 4.0 to meet different enterprise needs: Rerank 4.0 Fast Best balance of speed and accuracy Same latency as Cohere Rerank 3.5, with significantly higher accuracy Ideal for high-traffic applications and real-time systems Rerank 4.0 Pro Highest accuracy across all benchmarks Excels at complex, reasoning-heavy, domain-specific retrieval Tuned for industries like finance, healthcare, manufacturing, government, and energy Multilingual & Cross-Domain Performance Rerank 4.0 delivers unmatched multilingual and cross-domain performance, supporting more than 100 languages and enabling powerful cross-lingual search across complex enterprise datasets. The models achieve state-of-the-art accuracy in 10 of the world’s most important business languages, including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, and Spanish, making them exceptionally well suited for global organizations with multilingual knowledge bases, compliance archives, or international operations. Effortless Integration: Add Rerank to Any System One of the biggest benefits of Rerank 4.0 is how easy it is to adopt. You can add reranking to: Existing enterprise search Vector DB pipelines Keyword search systems Hybrid retrieval setups RAG architectures Agent workflows No infrastructure changes required. Just a few lines of code.This makes it one of the fastest ways to meaningfully upgrade grounding, precision, and search quality in enterprise AI systems. Better RAG, Better Agents, Better Outcomes In Foundry, customers can pair Cohere Rerank 4.0 with Azure Search, vector databases, Agent Service, Azure Functions, Foundry orchestration, and any LLM—including GPT-4.1, Claude, DeepSeek, and Mistral—to deliver more grounded copilots, higher-fidelity agent actions, and better reasoning from cleaner context windows. This reduces hallucinations, lowers LLM spend, and provides a foundational upgrade for mission-critical AI systems. Built for Enterprise: Security, Observability, Governance As a direct from Azure model, Rerank 4.0 is fully integrated with: Azure role-based access control (RBAC) Virtual network isolation Customer-managed keys Logging & observability Entra ID authentication Private deployments You can run Rerank 4.0 in environments that meet the strictest enterprise security and compliance needs. Optimized for Enterprise Models & High-Value Industries Rerank 4.0 is built for sectors where accuracy matters: Finance - Delivers precise retrieval for complex disclosures, compliance documents, and regulatory filings. Healthcare- Accurately retrieves clinical notes, biomedical literature, and care protocols for safer, more reliable insights. Manufacturing- Surfaces the right engineering specs, manuals, and parts data to streamline operations and reduce downtime. Government & Public Sector - Improves access to policy documents, case archives, and citizen service information with semantic precision. Energy- Understands industrial logs, safety manuals, and technical standards to support safer and more efficient operations. Pricing Model Name Deployment Type Azure Resource Region Price /1K Search Units Availability Cohere Rerank 4.0 Pro Global Standard All regions (Check this page for region details) $2.50 Public Preview, Dec 11, 2025 Cohere Rerank 4.0 Fast Global Standard All regions (Check this page for region details) $2.00 Public Preview, Dec 11, 2025 Get Started Today Cohere Rerank 4.0 Fast and Rerank 4.0 Pro are now available in Microsoft Foundry. Rerank 4.0 is one of the simplest and highest impact upgrades you can make to your enterprise AI stack, bringing better retrieval, better agents, and more trustworthy AI to every application.2.7KViews2likes0CommentsUnlocking Efficient and Secure AI for Android with Foundry Local
The ability to run advanced AI models directly on smartphones is transforming the mobile landscape. Foundry Local for Android simplifies the integration of generative AI models, allowing teams to deliver sophisticated, secure, and low-latency AI experiences natively on mobile devices. This post highlights Foundry Local for Android as a compelling solution for Android developers, helping them efficiently build and deploy powerful on-device AI capabilities within their applications. The Challenges of Deploying AI on Mobile Devices On-device AI offers the promise of offline capabilities, enhanced privacy, and low-latency processing. However, implementing these capabilities on mobile devices introduces several technical obstacles: Limited computing and storage: Mobile devices operate with constrained processing power and storage compared to traditional PCs. Even the most compact language models can occupy significant space and demand substantial computational resources. Efficient solutions for model and runtime optimization are critical for successful deployment. Concerns about the app size: Integrating large AI models and libraries can dramatically increase application size, reducing install rates and degrading other app features. It remains a challenge to provide advanced AI capabilities while keeping the application compact and efficient. Complexity of development and integration: Most mobile development teams are not specialized in machine learning. The process of adapting, optimizing, and deploying models for mobile inference can be resource intensive. Streamlined APIs and pre-optimized models simplify integration and accelerate time to market. Introducing Foundry Local for Android Foundry Local is designed as a comprehensive on-device AI solution, featuring pre-optimized models, a cross-platform inference engine, and intuitive APIs for seamless integration. Initially announced at //Build 2025 with support for Windows and MacOS desktops, Foundry Local now extends its capabilities to Android in private preview. You can sign up for the private preview https://aka.ms/foundrylocal-androidprp for early evaluation and feedback. To meet the demands of production deployments, Foundry Local for Android is architected as a dedicated Android app paired with an SDK. The app manages model distribution, hosts the AI runtime, and operates as a specialized background service. Client applications interface with this service using a lightweight Foundry Local Android SDK, ensuring minimal overhead and streamlined connectivity. One Model, Multiple Apps: Foundry Local centralizes model management, ensuring that if multiple applications utilize the same model in Foundry Local, it is downloaded and stored only once. This approach optimizes storage and streamlines resource usage. Minimal App Footprint: Client applications are freed from embedding bulky machine learning libraries and models. This avoids ballooning app size and memory usage. Run Separately from Client Apps: The Foundry Local operates independently of client applications. Developers benefit from continuous enhancements without the need for frequent app releases. Customer Story: PhonePe PhonePe, one of India's largest consumer payments platforms that enables access to payments and financial services to hundreds of millions of people across the country. With Foundry Local, PhonePe is enabling AI that allows their users to gain deeper insights into their transactions and payments behavior directly on their mobile device. And because inferencing happens locally, all data stays private and secure. This collaboration addresses PhonePe's key priority of delivering an AI experience that upholds privacy. Foundry Local enables PhonePe to differentiate their app experience in a competitive market using AI while ensuring compliance with privacy commitments. Explore their journey here: PhonePe Product Showcase at Microsoft Ignite 2025 Call to Action Foundry Local equips Android apps with on-device AI, supporting the development of smarter applications for the future. Developers are able to build efficient and secure AI capabilities into their apps, even without extensive expertise in artificial intelligence. See more about Foundry Local in action in this episode of Microsoft Mechanics: https://aka.ms/FL_IGNITE_MSMechanics We look forward to seeing you light up AI capabilities in your Android app with Foundry Local. Don’t miss our private preview: https://aka.ms/foundrylocal-androidprp. We appreciate your feedback, as it will help us make our product better. Thanks to the contribution from NimbleEdge which delivers real-time, on-device personalization for millions of mobile devices. NimbleEdge's mobile technology expertise helps Foundry Local deliver a better experience for Android users.304Views0likes0CommentsPantone’s Palette Generator enhances creative exploration with agentic AI on Azure
Color can be powerful. When creative professionals shape the mood and direction of their work, color plays a vital role because it provides context and cues for the end product or creation. For more than 60 years, creatives from all areas of design—including fashion, product, and digital—have turned to Pantone color guides to translate inspiration into precise, reproducible color choices. These guides offer a shared language for colors, as well as inspiration and communication across industries. Once rooted in physical tools, Pantone has evolved to meet the needs of modern creators through its trend forecasting, consulting services, and digital platform. Today, Pantone Connect and its multi-agent solution called the Pantone Palette Generator seamlessly bring color inspiration and accuracy into everyday design workflows (as well as the New York City mayoral race). Simply by typing in a prompt, designers can generate palettes in seconds. Available in Pantone Connect, the tool uses Azure services like Microsoft Foundry, Azure AI Search, and Azure Cosmos DB to serve up the company’s vast collection of trend and color research from the color experts at the Pantone Color Institute. reached in seconds instead of days. Now, with Microsoft Foundry, creatives can use agents to get instant color palettes and suggestions based on human insights and trend direction.” Turning Pantone’s color legacy into an AI offering The Palette Generator accelerates the process of researching colors and helps designers find inspiration or validate some of their ideas through trend-backed research. “Pantone wants to be where our customers are,” says Rohani Jotshi, Director of Software Engineering and Data at Pantone. “As workflows become increasingly digital, we wanted to give our customers a way to find inspiration while keeping the same level of accuracy and trust they expect from Pantone.” The Palette Generator taps into thousands of articles from Pantone’s Color Insider library, as well as trend guides and physical color books in a way that preserves the company’s color standards science while streamlining the creative process. Built entirely on Microsoft Foundry, the solution uses Azure AI Search for agentic retrieval-augmented generation (RAG) and Azure OpenAI in Foundry Models to reason over the data. It quickly serves up palette options in response to questions like “Show me soft pastels for an eco-friendly line of baby clothes” or “I want to see vibrant metallics for next spring.” Over the course of two months, the Pantone team built the initial proof of concept for the Palette Generator, using GitHub Copilot to streamline the process and save over 200 hours of work across multiple sprints. This allowed Pantone’s engineers to focus on improving prompt engineering, adding new agent capabilities, and refining orchestration logic rather than writing repetitive code. Building a multi-agent architecture that accelerates creativity The Pantone team worked with Microsoft to develop the multi-agent architecture, which is made up of three connected agents. Using Microsoft Agent Framework—an open source development kit for building AI orchestration systems—it was a straightforward process to bring the agents together into one workflow. “The Microsoft team recommended Microsoft Agent Framework and when we tried it, we saw how it was extremely fast and easy to create architectural patterns,” says Kristijan Risteski, Solutions Architect at Pantone. “With Microsoft Agent Framework, we can spin up a model in five lines of code to connect our agents.” When a user types in a question, they interact with an orchestrator agent that routes prompts and coordinates the more specialized agents. Behind the scenes an additional agent retrieves contextually relevant insights from Pantone’s proprietary Color Insider dataset. Using Azure AI Search with vectorized data indexing, this agent interprets the semantics of a user’s query rather than relying solely on keywords. A third agent then applies rules from color science to assemble a balanced palette. This agent ensures the output is a color combination that meets harmony, contrast, and accessibility standards. The result is a set of Pantone-curated colors that match the emotional and aesthetic tone of the request. “All of this happens in seconds,” says Risteski. To manage conversation flow and achieve long-term data persistence, Pantone uses Azure Cosmos DB, which stores user sessions, prompts, and results. The database not only enables designers to revisit past palette explorations but also provides Pantone with valuable usage intelligence to refine the system over time. “We use Azure Cosmos DB to track inputs and outputs,” says Risteski. “That data helps us fine-tune prompts, measure engagement, and plan how we’ll train future models.” Improving accuracy and performance with Azure AI Search With Azure AI Search, the Palette Generator can understand the nuance of color language. Instead of relying solely on keyword searches that might miss the complexity of words like “vibrant” or “muted,” Pantone’s team decided to use a vectorized index for more accurate palette results. Using the built-in vectorization capability of Azure AI Search, the team converted their color knowledge base—including text-based color psychology and trend articles—into numerical embeddings. “Overall, vector search gave us better results because it could understand the intent of the prompt, not just the words,“ says Risteski. “If someone types, ‘Show me colors that feel serene and oceanic,’ the system understands intent. It finds the right references across our color psychology and trend archives and delivers them instantly.” The team also found ways to reduce latency as they evolved their proof of concept. Initially, they encountered slow inference times and performance lags when retrieving search results. By switching from GPT-4.1 to GPT-5, latency improved. And using Azure AI Search to manage ranking and filtering results helped reduce the number of calls to the large language model (LLM). “With Azure, we just get the articles, put them in a bucket, and say ‘index it now,’ says Risteski. “It takes one or two minutes—and that’s it. The results are so much better than traditional search.” Moving from inspiration to palettes faster The Palette Generator has transformed how designers and color enthusiasts interact with Pantone’s expertise. What once took weeks of research and review can now be done in seconds. “Typically, if someone wanted to develop a palette for a product launch, it might take many months of research,” says Jotshi. “Now, they can type one sentence to describe their inspiration then immediately find Pantone-backed insight and options. Human curation will still be hugely important, but a strong set of starting options can significantly accelerate the palette development process.” Expanding the palette: The next phase for Pantone’s design agent Rapidly launching the Palette Generator in beta has redefined what the Pantone engineering team thought was possible. “We’re a small development team, but with Azure we built an enterprise-grade AI system in a matter of weeks,” says Risteski. “That’s a huge win for us.” Next up, the team plans to migrate the entire orchestration layer to Azure Functions, moving to a fully scalable, serverless deployment. This will allow Pantone to run its agents more efficiently, handle variable workloads automatically, and integrate seamlessly with other Azure products such as Microsoft Foundry and Azure Cosmos DB. At the same time, Pantone plans to expand its multi-agent system to include new specialized agents, including one focused on palette harmony and another focused on trend prediction.515Views1like0Comments