agents
213 TopicsMicrosoft Power Platform community call - June 2026
💡 Power Platform monthly community call focuses on different extensibility options for builders, makers and developers within the Power Platform. Typically demos are from our awesome community members who showcase the art of possible within the Power Platform capabilities. 👏 Looking to catch up on the latest news and updates, including cool community demos, this call is for you! 📅 On 17th of June we'll have following agenda: Power Platform Updates & Events Latest on Power Platform samples Elliot Margot (Witivio) - Process Mining + Copilot Studio: Stop Reading Dashboards, Start Asking Questions Sailaja Mantripragada (Low Code Power) - From Prompt to a Filled-In Word Template: Automating Deep Customer Research with Copilot Studio and Agent Flows John Liu (Rapid Circle) - Using Copilot Cowork with MCP to build Power Automate flows 📅 Download recurrent invite from https://aka.ms/powerplatformcommunitycall 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/PowerPlatformMonthlyCall 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 👋 See you in the call! 📖 Resources: Previous community call recordings and demos from the Microsoft 365 & Power Platform community YouTube channel at https://aka.ms/community/videos Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home93Views0likes1CommentAgents That Test Agents: A Cloud-Native Skill-Eval Harness on Foundry Hosted Agents
Skills are an agent's must-have. So test them. A skill is the lightest way to give an agent durable, reusable behavior: a SKILL.md file you author once, store centrally in Foundry's versioned Skills API, and inject into a Hosted Agent's context — no code change, no redeploy. That's why skills have quietly become standard equipment for production agents. But the moment a skill carries real behavior, a hard question follows: how do you know it still works? When you edit a skill you can't feel whether you improved it or just changed it. It might stop triggering, skip a required section, or quietly produce a worse result on one model than another. The cure is the same discipline we use for any prompt — evaluate it: run the agent, capture what happened, and grade it against a small set of checks. This is exactly what azure_skill_eval does for one concrete skill: edu-video-script, which writes an education short-video script for a given knowledge point (the sample's smoke test asks it to script the "P vs NP problem"). And it does the whole thing cloud-native, on Foundry Hosted Agents. The scenario: one skill, two models, four hosted agents The skill under test is edu-video-script. The clever part of the harness is that it doesn't just check one run — it puts the skill on a stand and stresses it from three sides, using four Foundry Hosted Agents wired together by the Agent Framework FoundryAgent: Hosted agent Role skill-eval-business-agent-gpt System under test (SUT), running edu-video-script on gpt-5.5 skill-eval-business-agent-deepseek The same skill, running on DeepSeek-V4-Pro skill-eval-attacker-agent Multi-turn adversarial prompt generator skill-eval-judge-agent LLM-as-judge that returns a rubric score as JSON Two business agents run the same skill on different models, so every case becomes an apples-to-apples comparison: which model executes this skill better? The attacker and judge are the graders. What we measure (define "done" first) Good evals start from a checkable definition of done — outcome, process, style, efficiency. For an education-video script that means: Did it produce a valid script (outcome)? Did it actually follow the edu-video-script template (process/style)? Does it hold up when a user pushes on it across turns (robustness)? The harness answers these with three grading layers. 1. Deterministic checks first (validator.py) The cheapest, most explainable signal: does the output match the script template the skill is supposed to produce? validator.py runs fixed, deterministic template checks — no model needed. These catch the obvious regressions instantly and never cost a token. 2. The LLM judge (skill-eval-judge-agent) Template checks answer "did it do the basics?" but not "is the script any good?" — pacing, clarity, whether it teaches the concept. For that, a dedicated judge hosted agent grades the result and returns structured JSON so scores compare cleanly across runs and models: { "overall_pass": true, "score": 100, "checks": [] } Structured output is the point: stable fields (overall_pass, score, checks) diff cleanly between GPT and DeepSeek, and between today's skill version and last week's. 3. The multi-turn attacker (test_agent.py + skill-eval-attacker-agent) A skill that looks great on a clean prompt can still fall apart when a user pushes on it. The attacker agent generates adversarial prompts for a knowledge point using a chosen strategy — for example extreme length — and keeps the pressure on across multiple turns (max_turns, default 3). This is where you find out whether edu-video-script stays on-template under stress, not just on the happy path. # the attacker takes a knowledge point + a strategy, emits one user prompt azd ai agent invoke skill-eval-attacker-agent \ "Topic: P vs. NP problem Recommended attack strategy: Extreme length Please output the unique user prompt text." The eval loop, end to end runner.py is a ghcsdk-style pipeline that runs cases × models, with each side toggleable: pick all models / GPT only / DeepSeek only, run a single case (e.g. edge-03), and switch adversarial mode, single-turn vs multi-turn, and judge grading on or off. The same switches are query parameters on POST /api/run: model, only_case, use_attack, single_turn, use_judge, max_turns. The test set lives in shared/test_cases.py — 10 built-in edge cases (edge-01 … edge-10) exported to evals/evals.json. You don't need a giant benchmark; a small, sharp set catches regressions, and you grow it whenever a real failure shows up: python -m evals.export_evals # regenerate evals/evals.json from shared/test_cases.py Every SUT call goes through runtime.py, which follows the official Agent Framework hosted-agent sample: it opens a fresh hosted session per turn, invokes via Responses, and tears the session down afterward. # shared/runtime.py — the documented Foundry hosted-agent pattern project = AIProjectClient(endpoint=FOUNDRY_PROJECT_ENDPOINT, credential=cred, allow_preview=True) agent = FoundryAgent(project_client=project, name=agent_name, # e.g. skill-eval-business-agent-gpt allow_preview=True) session = project.beta.agents.create_session(agent_name=agent_name) # ... send the (possibly adversarial) prompt, collect the Responses output ... So a single case flows: runner → business agent (skill runs) → validator → judge, optionally with the attacker driving multiple turns first. Cloud-native by design — and why that matters for eval This is the part that makes the harness production-grade rather than a laptop script. The hard parts of an eval harness — provisioning agents, recording every run, scaling trials, governing access — are handled by Azure, not by you. Foundry Hosted Agents are the runtime. The SUT, attacker, and judge all run as managed hosted agents in your Foundry project. You bring the skill and the cases; Foundry hosts the agents, models, and sessions. The business agents deploy with host: azure.ai.agent and docker.remoteBuild: true, so azd deploy builds the containers in Azure Container Registry — local Docker doesn't even need to be running. The UI is serverless. A FastAPI app on Azure Container Apps lets you upload evals.json, watch progress live, and browse the dashboard — scale-to-zero when no one's running evals. Every run is durable. Results land in Azure Blob Storage (skill-eval-runs), one yymmdd-XXXXXX/ folder per run, with a newest-first runs.json index. Nothing lives only in a terminal scrollback. Access is identity-based. In the cloud, a user-assigned Managed Identity carries exactly two roles — Storage Blob Data Contributor + Azure AI User; locally it's AzureCliCredential. No keys in env files. It's reproducible infra. azd up runs infra/main.bicep to stand up Storage, the container, Log Analytics, the Container Apps environment, the identity, and the role assignments in one shot. The payoff: the scores you read came from the same hosted runtime you actually ship to — not a local approximation — and the run that produced them is sitting in Blob, comparable against every run before it. Run it Local (no deploy): conda activate agentdev cd Skill_eval/azure_skill_eval pip install -r requirements.txt cp .env.example .env # FOUNDRY_PROJECT_ENDPOINT + AZURE_STORAGE_* uvicorn webapp.app:app --reload --port 8000 Open http://localhost:8000, upload evals/evals.json, pick your models and modes, and click Run. Cloud (azd): azd auth login azd env new skill-eval-dev azd env set FOUNDRY_PROJECT_ENDPOINT https://<project>.services.ai.azure.com/api/projects/<project> azd env set MODEL_GPT gpt-5.5 azd env set MODEL_DEEPSEEK DeepSeek-V4-Pro azd up Provision the skill once, deploy the four hosted agents, then smoke-test them: python -m hosted_agent.provision_skills # upload edu-video-script to Foundry Skills azd deploy skill-eval-business-agent-gpt azd deploy skill-eval-business-agent-deepseek azd deploy skill-eval-attacker-agent azd deploy skill-eval-judge-agent azd ai agent invoke skill-eval-business-agent-gpt "Here is a script for an educational short video on the P vs. NP problem." Read the results Each run is self-contained on Blob: summary.json gives you the headline — pass rate and judge averages — and the per-{case}__{model}.json files let you open any single result and see exactly what the skill produced and why it passed or failed. The dashboard streams these straight from Blob via /api/runs/{run_id}/files/{filename}. Because GPT and DeepSeek ran the same cases, the comparison is right there in one folder. Takeaways A skill you can't evaluate is a skill you can't trust. edu-video-script is treated like code — versioned in Foundry, run, and graded. Stack your graders cheap-to-expensive. Deterministic template checks first (validator.py), then an LLM judge for quality, then a multi-turn attacker for robustness. Make the judge return structured JSON. overall_pass / score / checks compare cleanly across models and skill versions. Compare models on the same skill. Running GPT-5.5 and DeepSeek-V4-Pro side by side turns "which model?" from a guess into a measured answer. Let the platform carry the harness. Foundry Hosted Agents are the runtime; Azure Container Apps, Blob Storage, Managed Identity, and azd/Bicep make the whole loop reproducible and durable. Write the skill. Then build the harness that proves it. On Foundry, that second step is mostly configuration — and the result is a skill you can actually trust in production. Conclusion Skills moved agent behavior out of code and into versioned Markdown — a huge win for reuse, but only if you can prove a skill still works after every edit. azure_skill_eval answers that for edu-video-script by treating evaluation as a first-class, repeatable step rather than a gut check. The shape is simple and worth copying for any skill of your own: Pin down "done" as checkable criteria, then encode a small set of sharp cases (here, 10 edge cases). Grade in layers, cheap to expensive — deterministic template checks, then a structured LLM-judge rubric, then a multi-turn adversarial pass. Run the same cases across models (GPT-5.5 vs DeepSeek-V4-Pro) so model choice becomes a measurement, not a guess. Let the cloud carry it — Foundry Hosted Agents as the runtime, FastAPI on Azure Container Apps for the UI, Blob Storage for durable runs, Managed Identity for access, and azd/Bicep so the whole thing is reproducible. The result is a feedback loop where every skill change is confirmed, every regression is visible, and every score traces back to the same hosted runtime you ship to. That's the difference between building skills and being able to trust them — and on Foundry, the gap between the two is mostly configuration. Sample Code : https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/Skill_evalCopilot, Microsoft 365 & Power Platform Community call
💡 Copilot, Microsoft 365 & Power Platform weekly community call focuses on different use cases and features within the Copilot, Microsoft 365 and Power Platform - across Microsoft 365 Copilot, Copilot Studio, SharePoint, Power Apps and more. 👏 Looking to catch up on the latest news and updates, including cool community demos, this call is for you! 📅 On 18th of June we'll have following agenda: Copilot prompt of the week CommunityDays.org update Microsoft 365 Maturity model Latest on PnP Framework and Core SDK extension Latest on PnP PowerShell Latest on script samples Latest Copilot pro dev samples Latest on Power Platform samples Picture time with the Together Mode! Reshmee Auckloo (Avanade) – Insurance Claims Assist using AI in SharePoint with Copilot Studio Garry Trinder (Microsoft) – No API, No Problem: Building Declarative Agents with Dev Proxy David Warner (Quisitive) – Powerful Animations - VS Code Extension Updates for M365 and Power Apps 📅 Download recurrent invite from https://aka.ms/community/m365-powerplat-dev-call-invite 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/m365-powerplat-dev-call-join 👋 See you in the call! 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!92Views1like0CommentsCopilot, Microsoft 365 & Power Platform product updates call
💡Copilot, Microsoft 365 & Power Platform product updates call concentrates on the different use cases and features within the Microsoft 365 and in Power Platform. Call includes topics like Microsoft 365 Copilot, Copilot Studio, Microsoft Teams, Power Platform, Microsoft Graph, Microsoft Viva, Microsoft Search, Microsoft Lists, SharePoint, Power Automate, Power Apps and more. 👏 Weekly Tuesday call is for all community members to see Microsoft PMs, engineering and Cloud Advocates showcasing the art of possible with Microsoft 365 and Power Platform. 📅 On the 16th of June we'll have following agenda: News and updates from Microsoft Together mode group photo Vesa Juvonen – How to share and reuse SharePoint Skills - Introducing open-source SharePoint Skills Sahil Baid – Introduction to List Agent in Microsoft 365 Copilot Vesa Juvonen & Bert Jansen – Introduction to SPFx Copilot Apps 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/ms-speakers-call-join 🗓️ Download recurrent invite for this weekly call from https://aka.ms/community/ms-speakers-call-invite 👋 See you in the call! 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!53Views0likes0CommentsAgents League: The Esports-Inspired Hackathon Where AI Agents Battle for Glory
Ready to put your AI skills to the ultimate test? Agents League is here, a dynamic, esports-inspired developer challenge that brings the thrill of live competition to the world of agentic AI. Whether you're a seasoned AI developer or just getting started, this is your chance to build, compete, and win. What is Agents League? Agents League is a week-long hackathon running as part of AI Skills Fest (June 4–14, 2026). Unlike traditional hackathons, Agents League combines live AI coding battles, asynchronous project submissions, and a thriving Discord community all competing for a total prize pool of $55,000 USD. This isn't just about building it's about showcasing what's possible with agentic AI in a format that's fast, competitive, and globally accessible. Three Challenge Tracks Pick One or Compete in All 1. Creative Apps Build innovative applications using GitHub Copilot for AI-assisted development. Show off your creativity and demonstrate how AI can accelerate app creation from concept to code. 2. Reasoning Agents Create intelligent agents using Microsoft Foundry that solve complex problems through multi-step reasoning. This track is all about building agents that can think, plan, and execute. 3. Enterprise Agents Build business-ready knowledge agents integrated with Microsoft 365 Copilot, authored in Copilot Studio. Perfect for developers focused on real-world enterprise solutions. Live Microsoft Reactor Events—Don't Miss the Battles! The heart of Agents League beats through live Microsoft Reactor events. Watch experts go head-to-head in live coding battles, learn cutting-edge techniques, and get inspired for your own submissions: Event What You'll Learn Creative Apps Battle See GitHub Copilot in action as experts build innovative apps live Reasoning Agents Battle Watch multi-step reasoning agents come to life with Microsoft Foundry Enterprise Agents Battle Learn to build M365-integrated agents with Copilot Studio 👉 View the full event series Key Dates Registration Deadline: June 12, 2026, 12:00 PM PT Hacking Period: June 4–14, 2026 Submission Deadline: June 14, 2026, 11:59 PM PT What You Get Live coding battles with expert demonstrations Curated technical experiences and on-demand content Learning resources on Microsoft Learn and AI Skills Navigator Community support through Discord GitHub-based submissions for transparent, collaborative judging Why Participate? Agents League isn't just another hackathon. It's designed as a streamlined, competitive format that: ✅ Fits into your schedule with focused, time-boxed challenges ✅ Provides real-world product innovation experience ✅ Offers global accessibility—participate from anywhere ✅ Demonstrates the latest capabilities of agentic AI, including new IQ tools ✅ Connects you with a passionate developer community Ready to Enter the Arena? Register Now for Agents League Before you register: Review the Hackathon Rules and Regulations for prize categories and judging criteria Join the Microsoft Reactor event series for live battles and learning Check out the Microsoft Event Code of Conduct Join the Conversation Have questions? Want to connect with fellow competitors? Join the Agents League community on Discord and start strategizing with developers from around the world. Whether you're building creative apps, reasoning agents, or enterprise solutions—the arena awaits. May the best agent win! 🏆 Agents League hackathon is open to the public and offered at no cost. Government employees should check with their employers to ensure participation is permitted in accordance with applicable policies. Related Links: Agents League Hackathon Registration Microsoft Reactor Series AI Skills FestWe Gave Ourselves 20 Minutes to Build an AI Agent for a Lumber Company. The Timer's Still on Screen.
Here's a confession: most "build with AI" webinars are 60 minutes of slides, 5 minutes of a polished demo someone rehearsed for a week, and a closing CTA. You leave inspired but not really sure what you saw. So we tried something different. We put a visible countdown timer on the screen and gave ourselves 20 minutes to do two things, live: Build an AI agent that solves a real business problem Deploy a working AI application to Azure No edits to hide the awkward parts. No "and here's one I prepared earlier." Just the timer, the screen, and a working app at the end. The on-demand recording is up now. Here's what's in it and why you should carve out 20 minutes for it this week. The setup: why lumber? 🏘️ We needed a real business problem, not a toy one. So for the demo, we role-play as the owner of Contoso Lumber — a regional lumber business with a very specific, very real headache: Should we sell our inventory now, or hold it longer? Sell too early, miss a better price. Hold too long, eat storage costs. Lumber prices fluctuate with global competition, macro shifts, even the weather. In the past, decisions like this came from morning meetings and gut instinct, or maybe the occasional ad-hoc spreadsheet that nobody could reuse a month later. It's the kind of decision that should have an analyst behind it — except most growing businesses can't afford to hire one full-time. So we build the AI agent that does. (Yes, lumber. We know. Stick with us — the boring industry is exactly the point. If it works here, it works for your business too.) What we actually build (in 20 minutes flat) The webinar walks through the entire flow, end to end: Part 1 — The agent. We open Microsoft Foundry at ai.azure.com, browse the model leaderboard (there are over 11,000 models to choose from — we compare a few on the cost-vs-quality chart), pick one, write a plain-English instruction for the agent, upload a CSV of historical lumber pricing, and ask it a real question: "If I cannot sell one of my products today unless I offer my clients a 35% discount, and knowing the historical pricing data, should I still sell it?" The agent runs a break-even analysis and comes back with a reasoned recommendation — hold for 3–6 months, here's the math on why, here's where storage costs start eating the upside. Then we add voice mode (now you can ask the agent for pricing recs from a coffee shop on your phone), and lock down guardrails to block jailbreaks, prompt injection, data leakage, and — because we're feeling fancy — profanity in responses. Part 2 — The app. With the agent done, we pivot to deploying a full AI chat application to Azure. From scratch. Using exactly five commands in Azure Cloud Shell: azd auth login git clone <repo> cd <folder> azd up azd down # (this one's for when you're done — kills everything to avoid surprise bills) That's it. The template handles the Container Apps setup, the architecture-aligned-to-Well-Architected-Framework stuff, all the boilerplate that usually eats half a sprint. By the end of the segment, there's a working AI chatbot running on a real Azure URL. We even pause the timer when we're just explaining things, so you know the 20-minute clock is honest about build time, not talk time. Why this format is more useful than another slide deck A few things this webinar shows that a written tutorial can't: The Foundry UI is super navigable. You watch someone do it. You see where the buttons are. You see what the leaderboard looks like when you're comparing GPT-5.3 Codex against Kimi K2.5 on a cost-to-quality chart. (Spoiler: Kimi wins this particular trio. Your mileage will vary depending on your workload.) The "no-stitching" claim is real. Models, data, agents, guardrails, deployment — all in one place. You don't need to leave Foundry to wire seven products together. The webinar makes that concrete by showing you the actual flow without cutting. Five commands really is five commands. This is the part people are most skeptical about until they see it. azd up does the work. The infrastructure provisioning, the container app, the AI service hookup — all of it. You can delete it just as fast. azd down tears everything back down. Useful when you're experimenting and don't want a $40 surprise on your Azure bill next month. What's on screen at the end By the 20-minute mark: A published AI agent named for the lumber business, with guardrails, voice mode enabled, ready to be called from Teams, Microsoft 365 Copilot, or any application via endpoint A separate AI chat application deployed to Azure Container Apps, with a live URL Logs, observability, the full Foundry control plane — all available out of the box And in the closing minutes, four very concrete next steps for what you do next if this sparked an idea for your own business — including Azure Accelerate (if you want Microsoft experts in the room with you), the partner network, and the Microsoft marketplace if you'd rather buy than build. Watch the recording The on-demand recording is available now. Block 20 minutes — that's literally all it takes — and ideally watch with your Azure portal open in another tab so you can follow along. If you're the kind of person who learns by doing, pause at the agent-building section and try it yourself in parallel. Foundry is free to explore; the agent we build in the webinar costs cents to run. → Watch the on-demand webinar A few things we'd love feedback on If you watch it, we'd genuinely love to know: Did the timer help or distract? (We thought it would feel gimmicky. It turned out to be the most-mentioned thing in early feedback.) What use case from your business would you want to see in the next one? We're picking the next demo problem from comments. Was the lumber thing weirdly compelling or were you just here for the Azure parts? Drop a comment, tag us, or grab a partner and try building your own version this week. The timer's reset. Your 20 minutes start whenever you press play. Want to go deeper than the webinar? Two companion reads: From Idea to Impact: How Growing Businesses Scale with Azure (five real customer stories with the full architectures) and AI Made Simple: 3 Practical Moves for Growing Businesses (the structured playbook for figuring out what to build first).What’s New in Microsoft 365 Copilot | May 2026
Welcome to the May 2026 edition of What's New in Microsoft 365 Copilot! Every month, we highlight new features and enhancements to keep Microsoft 365 admins up to date with Copilot features that help your users be more productive and efficient in the apps they use every day.17KViews8likes4Comments