model catalog
87 TopicsRAFT: A new way to teach LLMs to be better at RAG
In this article, we will look at the limitations of RAG and domain-specific Fine-tuning to adapt LLMs to existing knowledge and how a team of UC Berkeley researchers, Tianjun Zhang and Shishir G. Patil, may have just discovered a better approach.110KViews7likes5CommentsMeta’s next generation model, Llama 3.1 405B is now available on Azure AI
Microsoft, in collaboration with Meta, is launching Llama 3.1 405B, now available via Azure AI’s Models as a Service. Also introducing fine-tuned versions of Llama 3.1 8B and 70B. Leverage powerful AI for synthetic data generation and distillation. Access these models and more through Azure AI Studio and popular developer tools like prompt flow, OpenAI, LangChain, LiteLLM, and more. Streamline development and enhance efficiency with Azure AI.48KViews3likes7CommentsGPT‑5.1 in Foundry: A Workhorse for Reasoning, Coding, and Chat
The pace of AI innovation is accelerating, and developers—across startups and global enterprises—are at the heart of this transformation. Today marks a significant moment for enterprise AI innovation: Azure AI Foundry is unveiling OpenAI’s GPT-5.1 series, the next generation of reasoning, analytics, and conversational intelligence. The following models will be rolling out in Foundry today: GPT-5.1: adaptive, more efficient reasoning GPT-5.1-chat: chat with new chain-of-thought for end-users GPT-5.1-codex: optimized for long-running conversations with enhanced tools and agentic workflows GPT-5.1-codex-mini: a compact variant for resource-constrained environments What’s new with GPT-5.1 series The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance. GPT-5.1: Adaptive and Efficient Reasoning GPT-5.1 is the mainline model engineered to deliver adaptive, stepwise reasoning that adjusts its approach based on the complexity of each task. Core capabilities included: Adaptive reasoning for nuanced, context-aware thinking time Multimodal intelligence: supporting text, image, and audio inputs/outputs Enterprise-grade performance, security, and compliance This model’s flexibility empowers developers to tackle a wide spectrum of tasks—from simple queries to deep, multi-step workflows for enterprise-grade solutions. With its ability to intelligently balance speed, cost, and intelligence, GPT-5.1 sets a new standard for both performance and efficiency in AI-powered development. GPT-5.1-chat: Elevating Interactive Experiences with Smart, Safe Conversations GPT-5.1-chat powers fast, context-aware chat experiences with adaptive reasoning and robust safety guardrails. With chain-of-thought added in the chat for the first time, it brings an interactive experience to the next level. It’s tuned for safety and instruction-following, making it ideal for customer support, IT helpdesk, HR, and sales enablement. Multimodal chat (text, image, and audio) improves long-turn consistency for real problem solving, delivering brand-aligned, safe conversations, and supporting next-best-action recommendations. GPT-5.1-codex and GPT-5.1-codex-mini: Frontier Models for Agentic Coding GPT-5.1-codex builds on the foundation set by GPT-5-codex, advancing developer tooling with: Enhanced reasoning frameworks for stepwise, context-aware code analysis and generation; plus Enhanced tool handling for certain development scenario's Multimodal intelligence for richer developer experiences when coding With Foundry’s enterprise-grade security and governance, GPT-5.1-codex is ideal for automated code generation and review, accelerating development cycles with intelligent code suggestions, refactoring, and bug detection. GPT-5.1-codex-mini is a compact, efficient variant optimized for resource-constrained environments. It maintains near state-of-the-art performance, multimodal intelligence, and the same safety stack and tool access as GPT-5.1-codex, making it best for cost-effective, scalable solutions in education, startups, and cost-conscience settings. Together, these Codex models empower teams to innovate faster and with greater confidence. Selecting Your AI Engine: Match Model Strengths to Your Business Goals One of the advantages of the GPT-5.1 series is unified access to deep reasoning, adaptive chat, and advanced coding—all in one place. Here’s how to match model strengths to your needs: Opt for GPT-5.1 for general ai application use—tasks like analytics, research, legal/financial review, or consolidating large documents and codebases. It’s the model of choice for reliability and high-impact outputs. Go with GPT-5.1-chat for interactive assistants and product UX, especially when adaptive reasoning is required for complex cases. Reasoning hints and adaptive reasoning help with customer latency perception. Leverage GPT-5.1-codex for deep, stepwise reasoning in complex code generation, refactoring, or multi-step analysis—ideal for demanding agentic workflows and enterprise automation. Utilize GPT-5.1-codex-mini for efficient, cost-effective coding intelligence in broad-scale deployment, education, or resource-constrained environments—delivering near-mainline performance in a compact model. Deployment and Pricing Model Deployment Available Regions Pricing ($/million tokens) Input Cached Input Output GPT-5.1 Standard Global Global $1.25 $0.125 $10.00 Standard Data Zone Data Zone (US & EU) $1.38 $0.14 $11.00 GPT-5.1-chat Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex-mini Standard Global Global $0.25 $0.025 $2.00 Start Building Today The GPT-5.1 series is now available in Foundry Models. Whether you’re building for enterprise, small and medium-sized business, or launching the next digital-native app, these models and the Foundry platform are designed to help you innovate faster, safer, and at scale.32KViews3likes22CommentsNew Hugging Face Models on Azure AI: Multilingual, SLM and BioMed- July 2024 Update
In May, we announced a deepened partnership with Hugging Face and we continue to add more leading-edge Hugging Face models to the Azure AI model catalog on a monthly basis. We have 1550+ models in the Hugging Face collection and we added 20+ models in July including highly-ranked multilingual models, tuned for Asian languages like Mandarin, Japanese, Indonesian, Thai, Malay, Vietnamese.28KViews2likes0CommentsIntroducing GPT-5.4 in Microsoft Foundry
Today, we’re thrilled to announce that OpenAI’s GPT‑5.4 is now generally available in Microsoft Foundry: a model designed to help organizations move from planning work to reliably completing it in production environments. As AI agents are applied to longer, more complex workflows; consistency and follow‑through become as important as raw intelligence. GPT‑5.4 combines stronger reasoning with built in computer use capabilities to support automation scenarios, and dependable execution across tools, files, and multi‑step workflows at scale. GPT-5.4: Enhanced Reliability in Production AI GPT-5.4 is designed for organizations operating AI in real production environments, where consistency, instruction adherence, and sustained context are critical to success. The model brings together advances in reasoning, coding, and agentic workflows to help AI systems not only plan tasks but complete them with fewer interruptions and reduced manual oversight. Compared with earlier generations, GPT-5.4 emphasizes stability across longer interactions, enabling teams to deploy agentic AI with greater confidence in day-to-day production use. GPT-5.4 introduces advancements that aim for production grade AI: More consistent reasoning over time, helping maintain intent across multi‑turn and multi‑step interactions Enhanced instruction alignment to reduce prompt tuning and oversight Latency improved performance for responsive, real-time workflows Integrated computer use capabilities for structured orchestration of tools, file access, data extraction, guarded code execution, and agent handoffs More dependable tool invocation reducing prompt tuning and human oversight Higher‑quality generated artifacts, including documents, spreadsheets, and presentations with more consistent structure Together, these improvements support AI systems that behave more predictably as tasks grow in length and complexity. From capability to real-world outcomes GPT‑5.4 delivers practical value across a wide range of production scenarios where follow‑through and reliability are essential: Agent‑driven workflows, such as customer support, research assistance, and business process automation Enterprise knowledge work, including document drafting, data analysis, and presentation‑ready outputs Developer workflows, spanning code generation, refactoring, debugging support, and UI scaffolding Extended reasoning tasks, where logical consistency must be preserved across longer interactions Teams benefit from reduced task drift, fewer mid‑workflow failures, and more predictable outcomes when deploying GPT‑5.4 in production. GPT-5.4 Pro: Deeper analysis for complex decision workflows GPT‑5.4 Pro, a premium variant designed for scenarios where analytical depth and completeness are prioritized over latency. Additional capabilities include: Multi‑path reasoning evaluation, allowing alternative approaches to be explored before selecting a final response Greater analytical depth, supporting problems with trade‑offs or multiple valid solutions Improved stability across long reasoning chains, especially in sustained analytical tasks Enhanced decision support, where rigor and thoroughness outweigh speed considerations Organizations typically select GPT‑5.4 Pro when deeper analysis is required such as scientific research and complex problems, while GPT‑5.4 remains the right choice for workloads that prioritize reliable execution and agentic follow‑through. Microsoft Foundry: Enterprise‑Grade Control from Day One GPT‑5.4 and GPT‑5.4 Pro are available through Microsoft Foundry, which provides the operational controls organizations need to deploy AI responsibly in production environments. Foundry supports policy enforcement, monitoring, version management, and auditability, helping teams manage AI systems throughout their lifecycle. By deploying GPT‑5.4 through Microsoft Foundry, organizations can integrate advanced agentic capabilities into existing environments while aligning with security, compliance, and operational requirements from day one. Customer Spotlight Get Started with GPT-5.4 in Microsoft Foundry GPT‑5.4 sets a new bar for production‑ready AI by combining stronger reasoning with dependable execution. Through enterprise‑grade deployment in Microsoft Foundry, organizations can move beyond experimentation and confidently build AI systems that complete complex work at scale. Computer use capabilities will be introduced shortly after launch. GPT‑5.4 <272K input tokens context length in Microsoft Foundry is priced at $2.50 per million input tokens, $0.25 per million cached input tokens, and $15.00 per million output tokens. The GPT‑5.4 >272K input tokens context length in Microsoft Foundry is priced at $5.00 per million input tokens, $0.50 per million cached input tokens, and $22.50 per million output tokens. The GPT-5.4 is available at launch in Standard Global and Standard Data Zone (US), with additional deployment options coming soon. GPT‑5.4 Pro is priced at $30.00 per million input tokens, and $180.00 per million output tokens, and is available at launch in Standard Global. Build agents for real-world workloads. Start building with GPT‑5.4 in Microsoft Foundry today.21KViews4likes2Comments