model inference
10 TopicsHow Do We Know AI Isn’t Lying? The Art of Evaluating LLMs in RAG Systems
🔍 1. Why Evaluating LLM Responses is Hard In classical programming, correctness is binary. Input Expected Result 2 + 2 4 ✔ Correct 2 + 2 5 ✘ Wrong Software is deterministic — same input → same output. LLMs are probabilistic. They generate one of many valid word combinations, like forming sentences from multiple possible synonyms and sentence structures. Example: Prompt: "Explain gravity like I'm 10" Possible responses: Response A Response B Gravity is a force that pulls everything to Earth. Gravity bends space-time causing objects to attract. Both are correct. Which is better? Depends on audience. So evaluation needs to look beyond text similarity. We must check: ✔ Is the answer meaningful? ✔ Is it correct? ✔ Is it easy to understand? ✔ Does it follow prompt intent? Testing LLMs is like grading essays — not checking numeric outputs. 🧠 2. Why RAG Evaluation is Even Harder RAG introduces an additional layer — retrieval. The model no longer answers from memory; it must first read context, then summarise it. Evaluation now has multi-dimensions: Evaluation Layer What we must verify Retrieval Did we fetch the right documents? Understanding Did the model interpret context correctly? Grounding Is the answer based on retrieved data? Generation Quality Is final response complete & clear? A simple story makes this intuitive: Teacher asks student to explain Photosynthesis. Student goes to library → selects a book → reads → writes explanation. We must evaluate: Did they pick the right book? → Retrieval Did they understand the topic? → Reasoning Did they copy facts correctly without inventing? → Faithfulness Is written explanation clear enough for another child to learn from? → Answer Quality One failure → total failure. 🧩 3. Two Types of Evaluation 🔹 Intrinsic Evaluation — Quality of the Response Itself Here we judge the answer, ignoring real-world impact. We check: ✔ Grammar & coherence ✔ Completeness of explanation ✔ No hallucination ✔ Logic flow & clarity ✔ Semantic correctness This is similar to checking how well the essay is written. Even if the result did not solve the real problem, the answer could still look good — that’s why intrinsic alone is not enough. 🔹 Extrinsic Evaluation — Did It Achieve the Goal? This measures task success. If a customer support bot writes a beautifully worded paragraph, but the user still doesn’t get their refund — it failed extrinsically. Examples: System Type Extrinsic Goal Banking RAG Bot Did user get correct KYC procedure? Medical RAG Was advice safe & factual? Legal search assistant Did it return the right section of the law? Technical summariser Did summary capture key meaning? Intrinsic = writing quality. Extrinsic = impact quality. A production-grade RAG system must satisfy both. 📏 4. Core RAG Evaluation Metrics (Explained with Very Simple Analogies) Metric Meaning Analogy Relevance Does answer match question? Ask who invented C++? → model talks about Java ❌ Faithfulness No invented facts Book says started 2004, response says 1990 ❌ Groundedness Answer traceable to sources Claims facts that don’t exist in context ❌ Completeness Covers all parts of question User asks Windows vs Linux → only explains Windows Context Recall / Precision Correct docs retrieved & used Student opens wrong chapter Hallucination Rate Degree of made-up info “Taj Mahal is in London” 😱 Semantic Similarity Meaning-level match “Engine died” = “Car stopped running” 💡 Good evaluation doesn’t check exact wording. It checks meaning + truth + usefulness. 🛠 5. Tools for RAG Evaluation 🔹 1. RAGAS — Foundation for RAG Scoring RAGAS evaluates responses based on: ✔ Faithfulness ✔ Relevance ✔ Context recall ✔ Answer similarity Think of RAGAS as a teacher grading with a rubric. It reads both answer + source documents, then scores based on truthfulness & alignment. 🔹 2. LangChain Evaluators LangChain offers multiple evaluation types: Type What it checks String or regex Basic keyword presence Embedding based Meaning similarity, not text match LLM-as-a-Judge AI evaluates AI (deep reasoning) LangChain = testing toolbox RAGAS = grading framework Together they form a complete QA ecosystem. 🔹 3. PyTest + CI for Automated LLM Testing Instead of manually validating outputs, we automate: Feed preset questions to RAG Capture answers Run RAGAS/LangChain scoring Fail test if hallucination > threshold This brings AI closer to software-engineering discipline. RAG systems stop being experiments — they become testable, trackable, production-grade products. 🚀 6. The Future: LLM-as-a-Judge The future of evaluation is simple: LLMs will evaluate other LLMs. One model writes an answer. Another model checks: ✔ Was it truthful? ✔ Was it relevant? ✔ Did it follow context? This enables: Benefit Why it matters Scalable evaluation No humans needed for every query Continuous improvement Model learns from mistakes Real-time scoring Detect errors before user sees them This is like autopilot for AI systems — not only navigating, but self-correcting mid-flight. And that is where enterprise AI is headed. 🎯 Final Summary Evaluating LLM responses is not checking if strings match. It is checking if the machine: ✔ Understood the question ✔ Retrieved relevant knowledge ✔ Avoided hallucination ✔ Provided complete, meaningful reasoning ✔ Grounded answer in real source text RAG evaluation demands multi-layer validation — retrieval, reasoning, grounding, semantics, safety. Frameworks like RAGAS + LangChain evaluators + PyTest pipelines are shaping the discipline of measurable, reliable AI — pushing LLM-powered RAG from cool demo → trustworthy enterprise intelligence. Useful Resources What is Retrieval-Augmented Generation (RAG) : https://azure.microsoft.com/en-in/resources/cloud-computing-dictionary/what-is-retrieval-augmented-generation-rag/ Retrieval-Augmented Generation concepts (Azure AI) : https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/concepts/retrieval-augmented-generation RAG with Azure AI Search – Overview : https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview Evaluate Generative AI Applications (Microsoft Learn – Learning Path) : https://learn.microsoft.com/en-us/training/paths/evaluate-generative-ai-apps/ Evaluate Generative AI Models in Microsoft Foundry Portal : https://learn.microsoft.com/en-us/training/modules/evaluate-models-azure-ai-studio/ RAG Evaluation Metrics (Relevance, Groundedness, Faithfulness) : https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-evaluators/rag-evaluators RAGAS – Evaluation Framework for RAG Systems : https://docs.ragas.io/69Views0likes0CommentsAnnouncing Fireworks AI on Microsoft Foundry
We’re excited to announce that starting today, Microsoft Foundry customers can access high performance, low latency inference performance of popular open models hosted on the Fireworks cloud from their Foundry projects, and even deploy their own customized versions, too! As part of the Public Preview launch, we’re offering the most popular open models for serverless inference in both pay-per-token (US Data Zone) and provisioned throughput (Global Provisioned Managed) deployments. This includes: Minimax M2.5 🆕 OpenAI’s gpt-oss-120b MoonshotAI’s Kimi-K2.5 DeepSeek-v3.2 For customers that have been looking for a path to production with models they’ve post-trained, you can now import your own fine-tuned versions of popular open models and deploy them at production scale with Fireworks AI on Microsoft Foundry. Serverless (pay-per-token) For customers wanting per-token pricing, we’re launching with Data Zone Standard in the United States. You can make model deployments for Foundry resources in the following regions: East US East US 2 Central US North Central US West US West US 3 Depending on your Azure subscription type, you’ll automatically receive either a 250K or 25K tokens per minute (TPM) quota limit per region and model. (Azure Student and Trial subscriptions will not receive quota at this time.) Per-token pricing rates include input, cached input, and output tokens priced per million tokens. Model Input Tokens ($/1M tokens) Cached Tokens ($/1M tokens) Output Tokens ($/1M tokens) gpt-oss-120b $0.17 $0.09 $0.66 kimi-k2.5 $0.66 $0.11 $3.30 deepseek-v3.2 $0.62 $0.31 $1.85 minimax-m2.5 $0.33 $0.03 $1.32 As we work together with Fireworks to launch the latest OSS models, the supported models will evolve as popular research labs push the frontier! Provisioned Throughput For customers looking to shift or scale production workloads on these models, we’re launching with support for Global provisioned throughput. (Data Zone support will be coming soon!) Provisioned throughput for Fireworks models works just like it does for Foundry models: PTUs are designed to deliver consistent performance in terms of time between token latency. Your existing quota for Global PTUs works as does any reservation commitments! gpt-oss-120b Kimi-K2.5 DeepSeek-v3.2 MiniMax-M2.5 Global provisioned minimum deployment 80 800 1,200 400 Global provisioned scale increment 40 400 600 200 Input TPM per PTU 13,500 530 1,500 3,000 Latency Target Value 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ ^ Calculated as p50 request latency on a per 5 minute basis. Custom Models Have you post-trained a model like gpt-oss-120b for your particular use case? With Fireworks on Foundry you can deploy, govern, and scale your custom models all within your Foundry project. This means full fine-tuned versions of models from the following families can be imported and deployed as part of preview: Qwen3-14B OpenAI gpt-oss-120b Kimi K2 and K2.5 DeepSeek v3.1 and v3.2 The new Custom Models page in the Models experience lets you initiate the import process for copying your model weights into your Foundry project. > Models -> Custom Models. For performing a high-speed transfer of the files into Foundry, we’ve added a new feature to Azure Developer CLI (azd) for facilitating the transfer of a directory of model weights. The Foundry UI will give you cli arguments to copy and paste for quickly running azd ai models create pointed to your Foundry project. Enabling Fireworks AI on Microsoft Foundry in your Subscription While in preview, customers must opt-in to integrate their Microsoft Foundry resources with the Fireworks inference cloud to perform model deployments and send inference requests. Opt-in is self-service and available in the Preview features panel within your Azure portal. For additional details on finding and enabling the preview feature, please see the new product documentation for Fireworks on Foundry. Frequently Asked Questions How are Fireworks AI on Microsoft Foundry models different than Foundry Models? Models provided direct from Azure include some open-source models as well as proprietary models from labs like Black Forest Labs, Cohere, and xAI, and others. These models undergo rigorous model safety and risks assessments based on Microsoft’s Responsible AI standard. For customers needing the latest open-source models from emerging frontier labs, break-neck speed, or the ability to deploy their own post-trained custom models, Fireworks delivers best-in-class inference performance. Whether you’re focused on minimizing latency or just staying ahead of the trends, Fireworks AI on Microsoft Foundry gives you additional choice in the model catalog. Still need to quantify model safety and risk? Foundry provides a suite of observability tools with built-in risk and safety evaluators, letting you build AI systems confidently. How is model retirement handled? Customers using serverless per-token offers of models via Fireworks on Foundry will receive notice no less than 30 days before potential model retirement. You’ll be recommended to upgrade to either an equivalent, longer-term supported Azure Direct model or a newer model provided by Fireworks. For customers looking to use models beyond the retirement period, they may do so via Provisioned throughput deployments. How can I get more quota? For TPM quota, you may submit requests via our current Fireworks on Foundry quota form. For PTU quota, please contact your Microsoft account team. Can you support my custom model? Let’s talk! In general, if your model meets Fireworks’ current requirements, we have you covered. You can either reach out to your Microsoft account team or your contacts you may already have with Fireworks.1.8KViews1like0CommentsIntroducing GPT-5.4 in Microsoft Foundry
Today, we’re thrilled to announce that OpenAI’s GPT‑5.4 is now generally available in Microsoft Foundry: a model designed to help organizations move from planning work to reliably completing it in production environments. As AI agents are applied to longer, more complex workflows; consistency and follow‑through become as important as raw intelligence. GPT‑5.4 combines stronger reasoning with built in computer use capabilities to support automation scenarios, and dependable execution across tools, files, and multi‑step workflows at scale. GPT-5.4: Enhanced Reliability in Production AI GPT-5.4 is designed for organizations operating AI in real production environments, where consistency, instruction adherence, and sustained context are critical to success. The model brings together advances in reasoning, coding, and agentic workflows to help AI systems not only plan tasks but complete them with fewer interruptions and reduced manual oversight. Compared with earlier generations, GPT-5.4 emphasizes stability across longer interactions, enabling teams to deploy agentic AI with greater confidence in day-to-day production use. GPT-5.4 introduces advancements that aim for production grade AI: More consistent reasoning over time, helping maintain intent across multi‑turn and multi‑step interactions Enhanced instruction alignment to reduce prompt tuning and oversight Latency improved performance for responsive, real-time workflows Integrated computer use capabilities for structured orchestration of tools, file access, data extraction, guarded code execution, and agent handoffs More dependable tool invocation reducing prompt tuning and human oversight Higher‑quality generated artifacts, including documents, spreadsheets, and presentations with more consistent structure Together, these improvements support AI systems that behave more predictably as tasks grow in length and complexity. From capability to real-world outcomes GPT‑5.4 delivers practical value across a wide range of production scenarios where follow‑through and reliability are essential: Agent‑driven workflows, such as customer support, research assistance, and business process automation Enterprise knowledge work, including document drafting, data analysis, and presentation‑ready outputs Developer workflows, spanning code generation, refactoring, debugging support, and UI scaffolding Extended reasoning tasks, where logical consistency must be preserved across longer interactions Teams benefit from reduced task drift, fewer mid‑workflow failures, and more predictable outcomes when deploying GPT‑5.4 in production. GPT-5.4 Pro: Deeper analysis for complex decision workflows GPT‑5.4 Pro, a premium variant designed for scenarios where analytical depth and completeness are prioritized over latency. Additional capabilities include: Multi‑path reasoning evaluation, allowing alternative approaches to be explored before selecting a final response Greater analytical depth, supporting problems with trade‑offs or multiple valid solutions Improved stability across long reasoning chains, especially in sustained analytical tasks Enhanced decision support, where rigor and thoroughness outweigh speed considerations Organizations typically select GPT‑5.4 Pro when deeper analysis is required such as scientific research and complex problems, while GPT‑5.4 remains the right choice for workloads that prioritize reliable execution and agentic follow‑through. Microsoft Foundry: Enterprise‑Grade Control from Day One GPT‑5.4 and GPT‑5.4 Pro are available through Microsoft Foundry, which provides the operational controls organizations need to deploy AI responsibly in production environments. Foundry supports policy enforcement, monitoring, version management, and auditability, helping teams manage AI systems throughout their lifecycle. By deploying GPT‑5.4 through Microsoft Foundry, organizations can integrate advanced agentic capabilities into existing environments while aligning with security, compliance, and operational requirements from day one. Customer Spotlight Get Started with GPT-5.4 in Microsoft Foundry GPT‑5.4 sets a new bar for production‑ready AI by combining stronger reasoning with dependable execution. Through enterprise‑grade deployment in Microsoft Foundry, organizations can move beyond experimentation and confidently build AI systems that complete complex work at scale. Computer use capabilities will be introduced shortly after launch. GPT‑5.4 <272K input tokens context length in Microsoft Foundry is priced at $2.50 per million input tokens, $0.25 per million cached input tokens, and $15.00 per million output tokens. The GPT‑5.4 >272K input tokens context length in Microsoft Foundry is priced at $5.00 per million input tokens, $0.50 per million cached input tokens, and $22.50 per million output tokens. The GPT-5.4 is available at launch in Standard Global and Standard Data Zone (US), with additional deployment options coming soon. GPT‑5.4 Pro is priced at $30.00 per million input tokens, and $180.00 per million output tokens, and is available at launch in Standard Global. Build agents for real-world workloads. Start building with GPT‑5.4 in Microsoft Foundry today.18KViews3likes2CommentsIntroducing GPT-5.3 Chat in Microsoft Foundry: A more grounded way to chat at enterprise scale
OpenAI’s GPT‑5.3 Chat marks the next step in the GPT‑5 series, designed to deliver more dependable, context‑aware chat experiences for enterprise workloads. The model emphasizes steadier instruction handling and clearer responses, supporting high‑volume, real‑world conversations with greater consistency. GPT‑5.3 Chat is now available via API in Microsoft Foundry, where teams will be able to deploy production‑ready chat and agent experiences that are standardized, governed, and built to scale across the enterprise. What’s new in GPT‑5.3 Chat GPT‑5.3 Chat centers on predictable behavior, relevance, and response quality, helping teams build chat experiences that operate reliably across end‑to‑end workflows while aligning with enterprise safety and compliance expectations. Fewer dead ends, more resolved conversations Reduces unnecessary refusals by responding more proportionately when safe context is available Supports compliant reformulation to keep interactions moving forward Enables end‑to‑end resolution in support, IT, and policy‑driven workflows Grounded answers you can operationalize Combines built‑in web search with model reasoning to surface relevant, actionable information Prioritizes relevance and context over long lists of loosely related results Keeps responses actionable while maintaining enterprise controls and traceability Consistent outputs at scale Improved tone, explanation quality, and instruction following Easier to template, govern, and monitor across apps Less downstream cleanup as usage scales Built for production in Microsoft Foundry Production‑grade infrastructure Observability, failover, quota management, and performance monitoring Designed for real workloads—not experiments Consistent behavior across regions and use cases without re‑architecting Smarter scaling with quota tiers Automatic quota increases with sustained usage Fewer rate‑limit interruptions as demand grows Flexible tiers from Free through Tier 6 Security and compliance by default Identity, access controls, policy enforcement, and data boundaries built in Meets regulated‑industry requirements out of the box Teams can move fast without compromising trust GPT-5.3 Chat in Microsoft Foundry is priced at $1.75 per million input tokens, $0.175 per million cached input tokens, and $14.00 per million output tokens. Ready to build with GPT‑5.3 Chat in Foundry? Start turning reliable conversations into real applications. Explore GPT-5.3 Chat in Microsoft Foundry and begin building production ready‑ chat and agent experiences today.6.4KViews1like1CommentUnlocking Efficient and Secure AI for Android with Foundry Local
The ability to run advanced AI models directly on smartphones is transforming the mobile landscape. Foundry Local for Android simplifies the integration of generative AI models, allowing teams to deliver sophisticated, secure, and low-latency AI experiences natively on mobile devices. This post highlights Foundry Local for Android as a compelling solution for Android developers, helping them efficiently build and deploy powerful on-device AI capabilities within their applications. The Challenges of Deploying AI on Mobile Devices On-device AI offers the promise of offline capabilities, enhanced privacy, and low-latency processing. However, implementing these capabilities on mobile devices introduces several technical obstacles: Limited computing and storage: Mobile devices operate with constrained processing power and storage compared to traditional PCs. Even the most compact language models can occupy significant space and demand substantial computational resources. Efficient solutions for model and runtime optimization are critical for successful deployment. Concerns about the app size: Integrating large AI models and libraries can dramatically increase application size, reducing install rates and degrading other app features. It remains a challenge to provide advanced AI capabilities while keeping the application compact and efficient. Complexity of development and integration: Most mobile development teams are not specialized in machine learning. The process of adapting, optimizing, and deploying models for mobile inference can be resource intensive. Streamlined APIs and pre-optimized models simplify integration and accelerate time to market. Introducing Foundry Local for Android Foundry Local is designed as a comprehensive on-device AI solution, featuring pre-optimized models, a cross-platform inference engine, and intuitive APIs for seamless integration. Initially announced at //Build 2025 with support for Windows and MacOS desktops, Foundry Local now extends its capabilities to Android in private preview. You can sign up for the private preview https://aka.ms/foundrylocal-androidprp for early evaluation and feedback. To meet the demands of production deployments, Foundry Local for Android is architected as a dedicated Android app paired with an SDK. The app manages model distribution, hosts the AI runtime, and operates as a specialized background service. Client applications interface with this service using a lightweight Foundry Local Android SDK, ensuring minimal overhead and streamlined connectivity. One Model, Multiple Apps: Foundry Local centralizes model management, ensuring that if multiple applications utilize the same model in Foundry Local, it is downloaded and stored only once. This approach optimizes storage and streamlines resource usage. Minimal App Footprint: Client applications are freed from embedding bulky machine learning libraries and models. This avoids ballooning app size and memory usage. Run Separately from Client Apps: The Foundry Local operates independently of client applications. Developers benefit from continuous enhancements without the need for frequent app releases. Customer Story: PhonePe PhonePe, one of India's largest consumer payments platforms that enables access to payments and financial services to hundreds of millions of people across the country. With Foundry Local, PhonePe is enabling AI that allows their users to gain deeper insights into their transactions and payments behavior directly on their mobile device. And because inferencing happens locally, all data stays private and secure. This collaboration addresses PhonePe's key priority of delivering an AI experience that upholds privacy. Foundry Local enables PhonePe to differentiate their app experience in a competitive market using AI while ensuring compliance with privacy commitments. Explore their journey here: PhonePe Product Showcase at Microsoft Ignite 2025 Call to Action Foundry Local equips Android apps with on-device AI, supporting the development of smarter applications for the future. Developers are able to build efficient and secure AI capabilities into their apps, even without extensive expertise in artificial intelligence. See more about Foundry Local in action in this episode of Microsoft Mechanics: https://aka.ms/FL_IGNITE_MSMechanics We look forward to seeing you light up AI capabilities in your Android app with Foundry Local. Don’t miss our private preview: https://aka.ms/foundrylocal-androidprp. We appreciate your feedback, as it will help us make our product better. Thanks to the contribution from NimbleEdge which delivers real-time, on-device personalization for millions of mobile devices. NimbleEdge's mobile technology expertise helps Foundry Local deliver a better experience for Android users.563Views0likes0CommentsFoundry Agent Service at Ignite 2025: Simple to Build. Powerful to Deploy. Trusted to Operate.
The upgraded Foundry Agent Service delivers a unified, simplified platform with managed hosting, built-in memory, tool catalogs, and seamless integration with Microsoft Agent Framework. Developers can now deploy agents faster and more securely, leveraging one-click publishing to Microsoft 365 and advanced governance features for streamlined enterprise AI operations.9.7KViews3likes1CommentRosettaFold3 Model at Ignite 2025: Extending Frontier of Biomolecular Modeling in Microsoft Foundry
Today at Microsoft Ignite 2025, we are excited to launch RosettaFold3 (RF3) on Microsoft Foundry - making a new generation of multi-molecular structure prediction models available to researchers, biotech innovators, and scientific teams worldwide. RF3 was developed by the Baker lab and DiMaio lab from the Institute for Protein Design (IPD) at the University of Washington, in collaboration with Microsoft’s AI for Good lab and other research partners. RF3 is now available in Foundry Models, offering scalable access to a new generation of biomolecular modeling capabilities. Try RF3 now in Foundry Models A new multi-molecular modeling system, now accessible in Foundry Models RF3 represents a leap forward in biomolecular structure prediction. Unlike previous generation models focused narrowly on proteins, RF3 can jointly model: Proteins (enzymes, antibodies, peptides) Nucleic acids (DNA, RNA) Small molecules/ligands Multi-chain complexes This unified modeling approach allows researchers to explore entire interaction systems—protein–ligand docking, protein–RNA assembly, protein–DNA binding, and more—in a single end-to-end workflow. Key advances in RF3 RF3 incorporates several advancements in protein and complex prediction, making it the state-of-the-art open-source model. Joint atom-level modeling across molecular types RF3 can simultaneously model all atom types across proteins, nucleic acids, and ligands—enabled by innovations in multimodal transformers and generative diffusion models. Unprecedented control: atom-level conditioning Users can provide the 3D structure of a ligand or compound, and RF3 will fold a protein around it. This atom-level conditioning unlocks: Targeted drug-design workflows Protein pocket and surface engineering Complex interaction modeling Example showing how RF3 allows conditioning on user inputs offering greater control of the model’s predictions. Broad templating support for structure-guided design RF3 allows users to guide structure prediction using: Distance constraints Geometric templates Experimental data (e.g., cryo-EM) This flexibility is limited in other models and makes RF3 ideal for hybrid computation–wet-lab workflows. Extensible foundation for scientific and industrial research RF3 can be adapted to diverse application areas—including enzyme engineering, materials science, agriculture, sustainability, and synthetic biology. Use cases RF3’s multimolecular modeling capabilities have broad applicability beyond fundamental biology. The model enables breakthroughs across medicine, materials science, sustainability, and defense—where structure-guided design directly translates into measurable innovation. Sector Illustrative Use Cases Medicine Gene therapy research: RF3 enables the design of custom proteins that bind specific DNA sequences for targeted genome repair. Materials Science Inspired by natural protein fibers such as wool and silk, IPD researchers are designing synthetic fibers with tunable mechanical properties and texture—enabling sustainable textiles and advanced materials. Sustainability RF3 supports enzyme design for plastic degradation and waste recycling, contributing to circular bioeconomy initiatives. Disease & Vaccine Development RF3-powered workflows will contribute to structure-guided vaccine design, building on IPD’s prior success with the SKYCovione COVID-19 nanoparticle vaccine developed with SK Bioscience and GSK. Crop Science and Food security Support for gene-editing technology (due to protein-DNA binding prediction capabilities) for agricultural research, design of small proteins called Anti-Microbial Peptides or Anti-Fungal peptides to fight crop diseases and tree diseases such as citrus greening. Defense & Biosecurity Enables detection and rapid countermeasure design against toxins or novel pathogens; models of this class are being studied for biosafety applications (Horvitz et al., Science, 2025). Aerospace & Extreme Environments Supports design of lightweight, self-healing, and radiation-resistant biomaterials capable of functioning under non-terrestrial conditions (e.g., high temperature, pressure, or radiation exposure). RF3 has the potential to lower the cost of exploratory modeling, raise success rates in structure-guided discovery, and expand biomolecular AI into domains that were previously limited by sparse experimental structures or difficult multimolecular interactions. Because the model and training framework are open and extensible, partners can also adapt RF3 for their own research, making it a foundation for the next generation of biomolecular AI on Microsoft Foundry. Get started today RosettaFold3 (RF3) brings advanced multimolecular modeling capabilities into Foundry Models, enabling researchers and biotech teams to run structure-guided workflows with greater flexibility and speed. Within Microsoft Foundry, you can integrate RF3 into your existing scientific processes—combining your data, templates, and downstream analysis tools in one connected environment. Start exploring the next frontier of biomolecular modeling with RosettaFold3 in the Foundry Models. You can also discover other early-stage AI innovations in Foundry Labs. If you’re attending Microsoft Ignite 2025, or watching on demand, be sure to check out our session: Session: AI Frontier in Foundry Labs: Experiment Today, Lead Tomorrow About the session: “Curious about the next wave of AI breakthroughs? Get a sneak peek into the future of AI with Azure AI Foundry Labs—your front door to experimental models, multi-agent orchestration prototypes, Agent Factory blueprints, and edge innovations. If you’re a researcher eager to test, validate, and influence what’s next in enterprise AI, this session is your launchpad. See how Labs lets you experiment fast, collaborate with innovators, and turn new ideas into real impact.”748Views0likes0CommentsThe Future of AI: "Wigit" for computational design and prototyping
Discover how AI is revolutionizing software prototyping. Learn how Wigit, an internal AI-powered tool created with Azure AI Foundry, enables anyone—from designers to product managers—to create live, interactive prototypes in minutes. This blog explores how AI democratizes tool creation, accelerates innovation, and transforms static workflows into dynamic, collaborative environments.2.1KViews0likes0CommentsBuilding AI Apps with the Foundry Local C# SDK
What Is Foundry Local? Foundry Local is a lightweight runtime designed to run AI models directly on user devices. It supports a wide range of hardware (CPU, GPU, NPU) and provides a consistent developer experience across platforms. The SDKs are available in multiple languages, including Python, JavaScript, Rust, and now C#. Why a C# SDK? The C# SDK brings Foundry Local into the heart of the .NET ecosystem. It allows developers to: Download and manage models locally. Run inference using OpenAI-compatible APIs. Integrate seamlessly with existing .NET applications. This means you can build intelligent apps that run offline, reduce latency, and maintain data privacy—all without sacrificing developer productivity. Bootstrap Process: How the SDK Gets You Started One of the most developer-friendly aspects of the C# SDK is its automatic bootstrap process. Here's what happens under the hood when you initialise the SDK: Service Discovery and Startup The SDK automatically locates the Foundry Local installation on the device and starts the inference service if it's not already running. Model Download and Caching If the specified model isn't already cached locally, the SDK will download the most performant model variant (e.g. GPU, CPU, NPU) for the end user's hardware from the Foundry model catalog. This ensures you're always working with the latest optimised version. Model Loading into Inference Service Once downloaded (or retrieved from cache), the model is loaded into the Foundry Local inference engine, ready to serve requests. This streamlined process means developers can go from zero to inference with just a few lines of code—no manual setup or configuration required. Leverage Your Existing AI Stack One of the most exciting aspects of the Foundry Local C# SDK is its compatibility with popular AI tools such as: OpenAI SDK - Foundry local provides an OpenAI compliant chat completions (and embedding) API meaning. If you’re already using `OpenAI` chat completions API, you can reuse your existing code with minimal changes. Semantic Kernel - Foundry Local also integrates well with Semantic Kernel, Microsoft’s open-source framework for building AI agents. You can use Foundry Local models as plugins or endpoints within Semantic Kernel workflows—enabling advanced capabilities like memory, planning, and tool calling. Quick Start Example Follow these three steps: 1. Create a new project Create a new C# project and navigate to it: dotnet new console -n hello-foundry-local cd hello-foundry-local 2. Install NuGet packages Install the following NuGet packages into your project: dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0 dotnet add package OpenAI --version 2.2.0-beta.4 3. Use the OpenAI SDK with Foundry Local The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK. Copy-and-paste the following code into a C# file named Program.cs: using Microsoft.AI.Foundry.Local; using OpenAI; using OpenAI.Chat; using System.ClientModel; using System.Diagnostics.Metrics; var alias = "phi-3.5-mini"; var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias); var model = await manager.GetModelInfoAsync(aliasOrModelId: alias); ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey); OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions { Endpoint = manager.Endpoint }); var chatClient = client.GetChatClient(model?.ModelId); var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'"); Console.Write($"[ASSISTANT]: "); foreach (var completionUpdate in completionUpdates) { if (completionUpdate.ContentUpdate.Count > 0) { Console.Write(completionUpdate.ContentUpdate[0].Text); } } Run the code using the following command: dotnet run Final thoughts The Foundry Local C# SDK empowers developers to build intelligent, privacy-preserving applications that run anywhere. Whether you're working on desktop, mobile, or embedded systems, this SDK offers a robust and flexible way to bring AI closer to your users. Ready to get started? Dive into the official documentation: Getting started guide C# Reference documentation You can also make contributions to the C# SDK by creating a PR on GitHub: Foundry Local on GitHub1.1KViews0likes0CommentsAzure AI Foundry Models: Futureproof Your GenAI Applications
Years of Rapid Growth and Innovation The Azure AI Foundry Models journey started with the launch of Models as a Service (MaaS) in partnership with Meta Llama at Ignite 2023. Since then, we’ve rapidly expanded our catalog and capabilities: 2023: General Availability of the model catalog and launch of MaaS 2024: 1800+ models available including Cohere, Mistral, Meta, G42, AI21, Nixtla and more, with 250+ OSS models deployed on managed compute 2025 (Build): 10000+ models, new models sold directly by Microsoft, more managed compute models and expanded partnerships, introduction of advanced tooling like Model Leaderboard, Model Router, MCP Server, and Image Playground GenAI Trends Reshaping the Model Landscape To stay ahead of the curve, Azure AI Foundry Models is designed to support the most important trends in GenAI: Emergence of Reasoning-Centric Models Proliferation of Agentic AI and Multi-agent systems Expansion of Open-Source Ecosystems Multimodal Intelligence Becoming Mainstream Rise of Small, Efficient Models (SLMs) These trends are shaping a future where enterprises need not just access to models—but smart tools to pick, combine, and deploy the best ones for each task. A Platform Built for Flexibility and Scale Azure AI Foundry is more than a catalog—it’s your end-to-end platform for building with AI. You can: Explore over 10000+ models, including foundation, industry, multimodal, and reasoning models along with agents. Deploy using flexible options like PayGo, Managed Compute, or Provisioned Throughput (PTU) Monitor and optimize performance with integrated observability and compliance tooling Whether you're prototyping or scaling globally, Foundry gives you the flexibility you need. Two Core Model Categories 1. Models Sold Directly by Microsoft These models are hosted and billed directly by Microsoft under Microsoft Product Terms. They offer: Enterprise-grade SLAs and reliability Deep Azure service integration Responsible AI standards Flexible usage of reserved quota by using Azure AI Foundry Provisioned Throughput (PTU) across direct models including OpenAI, Meta, Mistral, Grok, DeepSeek and Black Forest Labs. Reduce AI workload costs on predictable consumption patterns with Azure AI Foundry Provisioned Throughput reservations. Learn more here Coming to the family of direct models from Azure: Grok 3 / Grok 3 Mini (from xAI) Flux Pro 1.1 Ultra (from Black Forest Labs) Llama 4 Scout & Maverick (from Meta) Codestral 2501, OCR (from Mistral) 2. Models from Partners & Community These models come from the broader ecosystem, including open-source and monetized partners. They are deployed as Managed Compute or Standard PayGo, and include models from Cohere, Paige and Saifr. We also have new industry models joining this ecosystem of partner and community models NVIDIA NIMs: ProteinMPNN, RFDiffusion, OpenFold2, MSA Paige AI: Virchow 2G, Virchow 2G-mini Microsoft Research: EvoDiff, BioEmu-1 Expanded capabilities that make model choice simpler and faster Azure AI Foundry Models isn’t just about more models. We’re introducing tools to help developers intelligently navigate model complexity: 1. Model Leaderboard Easily compare model performance across real-world tasks with: Transparent benchmark scores Task-specific rankings (summarization, RAG, classification, etc.) Live updates as new models are evaluated Whether you want the highest accuracy, fastest throughput, or best price-performance ratio—the leaderboard guides your selection. 2. Model Router Don’t pick just one—let Azure do the heavy lifting. Automatically route queries to the best available model Optimize based on speed, cost, or quality Supports dynamic fallback and load balancing This capability is a game-changer for agents, copilots, and apps that need adaptive intelligence. 3. Image/Video Playground A new visual interface for: Testing image generation models side-by-side Tuning prompts and decoding settings Evaluating output quality interactively This is particularly useful for multimodal experimentation across marketing, design, and research use cases. 4. MCP Server Enables model-aware orchestration, especially for agentic workloads: Tool use integration Multi-model planning and reasoning Unified coordination across model APIs A Futureproof Foundation With Azure AI Foundry Models, you're not just selecting from a list of models—you’re stepping into a full-stack, flexible, and future-ready AI environment: Choose the best model for your needs Deploy on your terms—serverless, managed, or reserved Rely on enterprise-grade performance, security, and governance Stay ahead with integrated innovation from Microsoft and the broader ecosystem The AI future isn’t one-size-fits-all—and neither is Azure AI Foundry. Explore Today : Azure AI Foundry9KViews1like0Comments