copilot stack

15 Topics

Introducing Microsoft Agent Factory
Microsoft Agent Factory is a new program designed for organizations that want to move from experimentation to execution faster. With a single plan, organizations can build agents with Work IQ, Fabric IQ, and Foundry IQ using Microsoft Foundry and Copilot Studio. They can also deploy their agents anywhere, including Microsoft 365 Copilot, with no upfront licensing and provisioning required. Eligible organizations can also tap into hands-on engagement from top AI Forward Deployed Engineers (FDEs) and access tailored role-based training to boost AI fluency across teams.
CyrilBelikoff
Nov 18, 2025 Place Microsoft Foundry Blog
33KViews
13likes
0Comments
The Future of AI: Exploring Multi-Agent AI Systems
Image generated by Microsoft Designer
Marco_Casalaina
Sep 03, 2024 Place Microsoft Foundry Blog
26KViews
4likes
2Comments
The Future of AI: Computer Use Agents Have Arrived
Discover the groundbreaking advancements in AI with Computer Use Agents (CUAs). In this blog, Marco Casalaina shares how to use the Responses API from Azure OpenAI Service, showcasing how CUAs can launch apps, navigate websites, and reason through tasks. Learn how CUAs utilize multimodal models for computer vision and AI frameworks to enhance automation. Explore the differences between CUAs and traditional Robotic Process Automation (RPA), and understand how CUAs can complement RPA systems. Dive into the future of automation and see how CUAs are set to revolutionize the way we interact with technology.
Marco_Casalaina
Apr 07, 2025 Place Microsoft Foundry Blog
14KViews
6likes
0Comments
The Future of AI: Single Agent or Multi-Agent - How Should I Choose?
Image generated by Microsoft Designer
Marco_Casalaina
Oct 01, 2024 Place Microsoft Foundry Blog
8KViews
2likes
2Comments
Ignite 2024: Streamlining AI Development with an Enhanced User Interface, Accessibility, and Learning Experiences in Azure AI Foundry portal
Announcing Azure AI Foundry, a unified platform that simplifies AI development and management. The platform portal (formerly Azure AI Studio) features a revamped user interface, enhanced model catalog, new management center, improved accessibility and learning, making it easier than ever for Developers and IT Admins to design, customize, and manage AI apps and agents efficiently.
udimilo
Nov 19, 2024 Place Microsoft Foundry Blog
6.5KViews
2likes
0Comments
The Future of AI: Generative AI for...Time Series Forecasting?!? A Look at Nixtla TimeGEN-1
Have you ever wondered how meteorologists predict tomorrow's weather, or how businesses anticipate future sales? These predictions rely on analyzing patterns over time, known as time series forecasts. With advancements in artificial intelligence, forecasting the future has become more accurate and accessible than ever before. Understanding Time Series Forecasting Time series data is a collection of observations recorded at specific time intervals. Examples include daily temperatures, monthly sales figures, or hourly website visitors. By examining this data, we can identify trends and patterns that help us predict future events. Forecasting involves using mathematical models to analyze past data and make informed guesses about what comes next. Traditional Forecasting Methods: ARIMA and Prophet Two of the most popular traditional methods for doing time series forecasting are ARIMA and Prophet. ARIMA, which stands for AutoRegressive Integrated Moving Average, predicts future values based on past data. It involves making the data stationary by removing trends and seasonal effects, then applying statistical techniques. However, ARIMA requires manual setup of parameters like trends and seasonality, which can be complex and time-consuming. It's best suited for simple, one-variable data with minimal seasonal changes. Prophet, a forecasting tool developed by Facebook (now Meta), automatically detects trends, seasonality, and holiday effects in the data, making it more user-friendly than ARIMA. Prophet works well with data that has strong seasonal patterns and doesn't need as much historical data. However, it may struggle with more complex patterns or irregular time intervals. Introducing Nixtla TimeGEN-1: A New Era in Forecasting Nixtla TimeGEN-1 represents a significant advancement in time series forecasting. Unlike traditional models, TimeGEN-1 is a generative pretrained transformer model, much like the GPT models, but rather than working with language, it's specifically designed for time series data. It has been trained on over 100 billion data points from various fields such as finance, weather, energy, and web data. This extensive training allows TimeGEN-1 to handle a wide range of data types and patterns. One of the standout features of TimeGEN-1 is its ability to perform zero-shot inference. This means it can make accurate predictions on new datasets without needing additional training. It can also be fine-tuned on specific datasets for even better accuracy. TimeGEN-1 handles irregular data effortlessly, working with missing timestamps or uneven intervals. Importantly, it doesn't require users to manually specify trends or seasonal components, making it accessible even to those without deep technical expertise. The transformer architecture of TimeGEN-1 enables it to capture complex patterns in data that traditional models might miss. It brings the power of advanced machine learning to time series forecasting – and related tasks like anomaly detection – making the process more efficient and accurate. Real-World Comparison: TimeGEN-1 vs. ARIMA and Prophet To test these claims, I decided to run an experiment to compare the performance of TimeGEN-1 with ARIMA and Prophet. I used a retail dataset where the actual future values were known, which in data science parlance is known as a "backtest." In my dataset, ARIMA struggled to predict future values accurately due to its limitations with complex patterns. Prophet performed better than ARIMA by automatically detecting some patterns, but its predictions still didn't quite hit the mark. TimeGEN-1, however, delivered predictions that closely matched the actual data, significantly outperforming both ARIMA and Prophet. The accuracy of these models was measured using metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). TimeGEN-1 had the lowest MAE and RMSE, indicating higher accuracy. This experiment highlights how TimeGEN-1 can provide more precise forecasts, even when compared to established methods. The Team Behind TimeGEN-1: Nixtla Nixtla is a company dedicated to making advanced predictive insights accessible to everyone. It was founded by a team of experts passionate about simplifying forecasting processes while maintaining high accuracy and efficiency. The team includes Max Mergenthaler Canseco, CEO; Azul Garza, CTO; and Cristian Challu, CSO, experts in the forecasting field with extensive experience in machine learning and software engineering.< Their collective goal is to simplify the forecasting process, making powerful tools available to users with varying levels of technical expertise. By integrating TimeGEN-1 into easy-to-use APIs, they ensure that businesses and individuals can leverage advanced forecasting without needing deep machine learning knowledge. The Azure AI Model Catalog TimeGEN-1 is one of the 1700+ models that are now available in the Azure AI model catalog. The model catalog is continuously updated with the latest advancements, like TimeGEN-1, ensuring that users have access to the most cutting-edge tools. Its user-friendly interface makes it easy to navigate and deploy models, and Azure's cloud infrastructure provides the scalability needed to run these models, allowing users to handle large datasets and complex computations efficiently. In the following video, I show how Data Scientists and Developers can build time series forecasting models using data stored in Microsoft Fabric paired with the Nixtla TimeGEN-1 model. The introduction of Nixtla TimeGEN-1 marks a transformative moment in time series forecasting. Whether you're a data scientist, a business owner, or a student interested in AI, TimeGEN-1 opens up new possibilities for understanding and predicting future trends. Explore TimeGEN-1 and thousands of other models through the Azure AI model catalog today!
Marco_Casalaina
Nov 11, 2024 Place Microsoft Foundry Blog
5.1KViews
3likes
0Comments
The Future of AI: Oh! One! Applying o1 Models to Business Data
Image generated by Microsoft Designer
Marco_Casalaina
Oct 14, 2024 Place Microsoft Foundry Blog
4.8KViews
4likes
0Comments
The Future of AI: Power Your Agents with Azure Logic Apps
Building intelligent applications no longer requires complex coding. With advancements in technology, you can now create agents using cloud-based tools to automate workflows, connect to various services, and integrate business processes across hybrid environments without writing any code.
Marco_Casalaina
Jan 28, 2025 Place Microsoft Foundry Blog
3.7KViews
2likes
1Comment
The Future of AI: Customizing AI agents with the Semantic Kernel agent framework
The blog post Customizing AI agents with the Semantic Kernel agent framework discusses the capabilities of the Semantic Kernel SDK, an open-source tool developed by Microsoft for creating AI agents and multi-agent systems. It highlights the benefits of using single-purpose agents within a multi-agent system to achieve more complex workflows with improved efficiency. The Semantic Kernel SDK offers features like telemetry, hooks, and filters to ensure secure and responsible AI solutions, making it a versatile tool for both simple and complex AI projects.
TaoChen
Mar 12, 2025 Place Microsoft Foundry Blog
2.2KViews
3likes
0Comments
Three tiers of Agentic AI - and when to use none of them
Every enterprise has an AI agent. Almost none of them work in production. Walk into any enterprise technology review right now and you will find the same thing. Pilots running. Demos recorded. Steering committees impressed. And somewhere in the background, a quiet acknowledgment that the thing does not actually work at scale yet. OutSystems surveyed nearly 1,900 global IT leaders and found that 96% of organizations are already running AI agents in some capacity. Yet only one in nine has those agents operating in production at scale. The experiments are everywhere. The production systems are not. That gap is not a capability problem. The infrastructure has matured. Tool calling is standard across all major models. Frameworks like LangGraph, CrewAI, and Microsoft Agent Framework abstract orchestration logic. Model Context Protocol standardizes how agents access external tools and data sources. Google's Agent-to-Agent protocol now under Linux Foundation governance with over 50 enterprise technology partners including Salesforce, SAP, ServiceNow, and Workday standardizes how agents coordinate with each other. The protocols are in place. The frameworks are production ready. The gap is a selection and governance problem. Teams are building agents on problems that do not need them. Choosing the wrong tier for the ones that do. And treating governance as a compliance checkbox to add after launch, rather than an architectural input to design in from the start. The same OutSystems research found that 94% of organizations are concerned that AI sprawl is increasing complexity, technical debt, and security risk and only 12% have a centralized approach to managing it. Teams are deploying agents the way shadow IT spread through enterprises a decade ago: fast, fragmented, and without a shared definition of what production-ready actually means. I've built agentic systems across enterprise clients in logistics, retail, and B2B services. The failures I keep seeing are not technology failures. They are architecture and judgment failures problems that existed before the first line of code was written, in the conversation where nobody asked the prior question. This article is the framework I use before any platform conversation starts. What has genuinely shifted in the agentic landscape Three changes are shaping how enterprise agent architecture should be designed today and they are not incremental improvements on what existed before. The first is the move from single agents to multi-agent systems. Databricks' State of AI Agents report drawing on data from over 20,000 organizations, including more than 60% of the Fortune 500 found that multi-agent workflows on their platform grew 327% in just four months. This is not experimentation. It is production architecture shifting. A single agent handling everything routing, retrieval, reasoning, execution is being replaced by specialized agents coordinating through defined interfaces. A financial organization, for example, might run separate agents for intent classification, document retrieval, and compliance checking each narrow in scope, each connected to the next through a standardized protocol rather than tightly coupled code. The second is protocol standardization. MCP handles vertical connectivity how agents access tools, data sources, and APIs through a typed manifest and standardized invocation pattern. A2A handles horizontal connectivity how agents discover peer agents, delegate subtasks, and coordinate workflows. Production systems today use both. The practical consequence is that multi-agent architectures can be composed and governed as a platform rather than managed as a collection of one-off integrations. The third is governance as the differentiating factor between teams that ship and teams that stall. Databricks found that companies using AI governance tools get over 12 times more AI projects into production compared to those without. The teams running production agents are not running more sophisticated models. They built evaluation pipelines, audit trails, and human oversight gates before scaling not after the first incident. Tier 1 - Low-code agents: fast delivery with a defined ceiling The low-code tier is more capable than it was eighteen months ago. Copilot Studio, Salesforce Agentforce, and equivalent platforms now support richer connector libraries, better generative orchestration, and more flexible topic models. The ceiling is higher than it was. It is still a ceiling. The core pattern remains: a visual topic model drives a platform-managed LLM that classifies intent and routes to named execution branches. Connectors abstract credential management and API surface. A business team — analyst, citizen developer, IT operations — can build, deploy, and iterate without engineering involvement on every change. For bounded conversational problems, this is the fastest path from requirement to production. The production reality is documented clearly. Gartner data found that only 5% of Copilot Studio pilots moved to larger-scale deployment. A European telecom with dedicated IT resources and a full Microsoft enterprise agreement spent six months and did not deliver a single production agent. The visual builder works. The path from prototype to production, production-grade integrations, error handling, compliance logging, exception routing is where most enterprises get stuck, because it requires Power Platform expertise that most business teams do not have. The platform ceiling shows up predictably at four points. Async processing anything beyond a synchronous connector call, including approval chains, document pipelines, or batch operations cannot be handled natively. Full payload audit logs platform logs give conversation transcripts and connector summaries, not structured records of every API call and its parameters. Production volume concurrency limits and message throughput budgets bind faster than planning assumptions suggest. Root cause analysis in production you cannot inspect the LLM's confidence score or the alternatives it considered, which makes diagnosing misbehavior significantly harder than it should be. The correct diagnostic: can this use case be owned end-to-end by a business team, covered by standard connectors, with no latency SLA below three seconds and no payload-level compliance requirement? Yes, low code is the correct tier. Not a compromise. If no on any point, continue. If low-code is the right call for your use case: Copilot Studio quickstart Tier 2 - Pro-code agents: the architecture the current landscape demands The defining pattern in production pro-code architecture today is multi-agent. Specialized agents per domain, coordinating through MCP for tool access and A2A for peer-to-peer delegation, with a governance layer spanning the entire system. What this looks like in practice: a financial organization handling incoming compliance queries runs separate agents for intent classification, document retrieval, and the compliance check itself. None of these agents tries to do all three jobs. Each has a narrow responsibility, a defined input/output contract typed against a JSON Schema, and a clear handoff boundary. The 327% growth in multi-agent workflows reflects production teams discovering that the failure modes of monolithic agents topic collision, context overflow, degraded classification as scope expands are solved by specialization, not by making a single agent more capable. The discipline that makes multi-agent systems reliable is identical to what makes single-agent systems reliable, just enforced across more boundaries: the LLM layer reasons and coordinates; deterministic tool functions enforce. In a compliance pipeline, no LLM decides whether a document satisfies a regulatory requirement. That evaluation runs in a deterministic tool with a versioned rule set, testable outputs, and an immutable audit log. The LLM orchestrates the sequence. The tool produces the compliance record. Mixing these letting an LLM evaluate whether a rule pass collapses the audit trail and introduces probabilistic outputs on questions that have regulatory answers. MCP is the tool interface standard today. An MCP server exposes a typed manifest any compliant agent runtime can discover at startup. Tools are versioned, independently deployable, and reusable across agents without bespoke integration code. A2A extends this horizontally: agents advertise capability cards, discover peers, and delegate subtasks through a standardised protocol. The practical consequence is that multi-agent systems built on both protocols can be composed and governed as a platform rather than managed as a collection of one-off integrations. Observability is the architectural element that separates teams shipping production agents from teams perpetually in pilot. Build evaluation pipelines, distributed traces across all agent boundaries, and human review gates before scaling. The teams that add these after the first production incident spend months retrofitting what should have been designed in. If pro-code is the right call for your use case: Foundry Agent Service The hybrid pattern: still where production deployments land The shift to multi-agent architecture does not change the hybrid pattern it deepens it. Low-code at the conversational surface, pro-code multi-agent systems behind it, with a governance layer spanning both. On a logistics client engagement, the brief was a sales assistant for account managers shipment status, account health, and competitive context inside Teams. The business team wanted everything in Copilot Studio. Engineering wanted a custom agent runtime. Both were wrong. What we built: Copilot Studio handled all high-frequency, low-complexity queries shipment tracking, account status, open cases through Power Platform connectors. Zero custom code. That covered roughly 78% of actual interaction volume. Requests requiring multi-source reasoning competitive positioning on a specific lane, churn risk across an account portfolio, contract renewal analysis delegated via authenticated HTTP action to a pro-code multi-agent service on Azure. A retrieval agent pulled deal history and market intelligence through MCP-exposed tools. A synthesis agent composed the recommendation with confidence scoring. Structured JSON back to the low-code layer, rendered as an adaptive card in Teams. The HITL gate was non-negotiable and designed before deployment, not added after the first incident. No output reached a customer without a manager approval step. The agent drafts. A human sends. This boundary low-code for conversational volume, pro-code for reasoning depth maps directly to what the research shows separates teams that ship from teams that stall. The organizations running agents in production drew the line correctly between what the platform can own and what engineering needs to own. Then they built governance into both sides before scaling. The four gates - the prior question that still gets skipped Run every candidate use case through these four checks before the platform conversation begins. None of the recent infrastructure improvements change what they are checking, because none of them change the fundamental cost structure of agentic reasoning. Gate 1 - is the logic fully deterministic? If every valid output for every valid input can be enumerated in unit tests, the problem does not need an LLM. A rules engine executes in microseconds at zero inference cost and cannot produce a plausible-but-wrong answer. NeuBird AI's production ops agents which have resolved over a million alerts and saved enterprises over $2 million in engineering hours work because alert triage logic that can be expressed as rules runs in deterministic code, and the LLM only handles cases where pattern-matching is insufficient. That boundary is not incidental to the system's reliability. It is the reason for it. Gate 2 - is zero hallucination tolerance required? With over 80% of databases now being built by AI agents per Databricks' State of AI Agents report the surface area for hallucination-induced data errors has grown significantly. In domains where a wrong answer is a compliance event financial calculation, medical logic, regulatory determinations irreducible LLM output uncertainty is disqualifying regardless of model version or prompt engineering effort. Exit to deterministic code or classical ML with bounded output spaces. Gate 3 - is a sub-100ms latency SLA required? LLM inference is faster than it was eighteen months ago. It is not fast enough for payment transaction processing, real-time fraud scoring, or live inventory management. A three-agent system with MCP tool calls has a P50 latency measured in seconds. These problems need purpose-built transactional architecture. Gate 4 - is regulatory explainability required? A2A enables complex agent coordination and delegation. It does not make LLM reasoning reproducible in a regulatory sense. Temperature above zero means the same input produces different outputs across invocations. Regulators in financial services, healthcare, and consumer credit require deterministic, auditable decision rationale. Exit to deterministic workflow with structured audit logging at every Five production failure modes - one of them new The four original anti-patterns are still showing up in production. A fifth has been added by scale. Routing data retrieval through a reasoning loop. A direct API call returns account status in under 10ms. Routing the same request through an LLM reasoning step adds hundreds of milliseconds, consumes tokens on every call, and introduces output parsing on data that is already structured. The agent calls a structured tool. The tool calls the API. The agent never acts as the integration layer. Encoding business rules in prompts. Rules expressed in prompt text drift as models update. They produce probabilistic output across invocations and fail in ways that are difficult to reproduce and diagnose. A rule that must evaluate correctly every time belongs in a deterministic tool function unit-tested, version-controlled, independently deployable via MCP. No approval gate on CRUD operations. CRUD operations without a human approval step will eventually misfire on the input that testing did not cover. The gate needs to be designed before deployment, not added after the first incident involving a financial posting, a customer-facing communication, or a data deletion. Monolithic agent for all domains. A single agent accumulating every domain leads predictably to topic collision, context overflow, and maintenance that becomes impossible as scope expands. Specialized agents per domain, coordinating through A2A, is the architecture that scales. Ungoverned agent sprawl. This is the new one and currently the most prevalent. OutSystems found 94% of organizations concerned about it, with only 12% having a centralized response. Teams building agents independently across fragmented stacks, without shared governance, evaluation standards, or audit infrastructure, produce exactly the same organizational debt that shadow IT created but with higher stakes, because these systems make autonomous decisions rather than just storing and retrieving data. The fix is treating governance as an architectural input before deployment, not a compliance requirement after something breaks. The infrastructure is ready. The judgment is not. The tier decision sequence has not changed. Does the problem need natural language understanding or dynamic generation? No — deterministic system, stop. Can a business team own it through standard connectors with no sub-3-second latency SLA and no payload-level compliance requirement? Yes — low-code. Does it need custom orchestration, multi-agent coordination, or audit-grade observability? Yes — pro-code with MCP and A2A. Does it need both a conversational surface and deep backend reasoning? Hybrid, with a governance layer spanning both. What has changed is that governance is no longer optional infrastructure to add when you have time. The data is unambiguous. Companies with governance tools get over 12 times more AI projects into production than those without. Evaluation pipelines, distributed tracing across agent boundaries, human oversight gates, and centralised agent lifecycle management are not overhead. They are what converts experiments into production systems. The teams still stuck in pilot are not stuck because the technology failed them. They are stuck because they skipped this layer. The protocols are standardised. The frameworks are mature. The infrastructure exists. None of that is what is holding most enterprise agent programmes back. What is holding them back is a selection problem disguised as a technology problem — teams building agents before asking whether agents are warranted, choosing platforms before running the four gates, and treating governance as a checkpoint rather than an architectural input. I have built agents that should have been workflow engines. Not because the technology was wrong, but because nobody stopped early enough to ask whether it was necessary. The four gates in this article exist because I learned those lessons at clients' expense, not mine. The most useful thing I can offer any team starting an agentic AI project is not a framework selection guide. It is permission to say no — and a clear basis for saying it. Take the four gates framework to your next architecture review. If you have already shipped agents to production, I would like to hear what worked and what did not - comment below What to do next Three concrete steps depending on where you are right now. If you have pilots that have not reached production: Run them through the four gates in this article before the next sprint. Gate 1 alone will eliminate a meaningful percentage of them. The ones that survive all four are your real candidates for production investment. Download the attached file for gated checklist and take it into your next architecture review. If you are starting a new agent project: Do not open a platform before you have answered the gate questions. Once you have confirmed an agent is warranted and identified the tier, start here: Copilot Studio guided setup for low-code scenarios, or Foundry Agent Service for pro-code patterns with MCP and multi-agent coordination built in. Build governance infrastructure - evaluation pipeline, distributed tracing, HITL gates - before you scale, not after. If you have already shipped agents to production: Share what worked and what did not in the Azure AI Tech Community — tag posts with #AgentArchitecture. The most useful signal for teams still in pilot is hearing from practitioners who have been through production, not vendors describing what production should look like. References OutSystems — State of AI Development Report - https://www.outsystems.com/1/state-ai-development-report Databricks — State of AI Agents Report - https://www.databricks.com/resources/ebook/state-of-ai-agents Gartner — 2025 Microsoft 365 and Copilot Survey - https://www.gartner.com/en/documents/6548002 (Paywalled primary source — publicly reported via techpartner.news: https://www.techpartner.news/news/gartner-microsoft-copilot-hype-offset-by-roi-and-readiness-realities-618118) Anthropic — Model Context Protocol (MCP) - https://modelcontextprotocol.io Google Cloud — Agent-to-Agent Protocol (A2A) . https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability NeuBird AI — Production Operations Deployment Announcement NeuBird AI Closes $19.3M Funding Round to Scale Agentic AI Across Enterprise Production Operations ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. https://arxiv.org/abs/2210.03629 Enterprise Integration Patterns — Gregor Hohpe & Bobby Woolf, Addison-Wesley https://www.enterpriseintegrationpatterns.com
sgangaramani
Apr 22, 2026 Place Microsoft Foundry Blog
2.1KViews
4likes
1Comment