artificial intelligence
80 TopicsCloud Native Platforms: Evolve
Audience: Engineering leaders, platform architects, senior developers exploring how to operationalise AI in their teams Reading time: 8 minutes Series: Cloud Native Platforms. Build, Run, Evolve. This is Part 3 of 3. Cloud helped us scale infrastructure. AI is starting to do the same thing for the work around the code: the planning, the testing, the release communication, the incident triage, the writing that surrounds writing software. The conversation about AI in software has narrowed too quickly to "Copilot in the editor". The bigger story is happening across the lifecycle. Planning, design, development, testing, release, and operations are all being augmented at once. The platforms that adopt AI well are not the ones with the most usage. They are the ones with the clearest discipline around how it is used. This post is about that discipline. AI is changing how we engineer, not how we type AI is not changing how we write code. It is changing how we engineer software. Code generation is the surface. Underneath it, AI is reshaping the unit of leverage. The question is no longer how fast a developer can type. It is how well a workflow can be expressed as a reusable engineering asset. Six disciplines determine whether AI moves the needle on outcomes or just adds another tool to the stack. Figure 1. AI across the SDLC. Each phase has clear AI assist points and clear human-owned validations. The boundary is not negotiable. It is the design. 1. From assistance to augmentation Early AI tools focused on assisting individual developers. Code suggestions. Autocomplete. Quick refactors. The value was real but bounded by the editor. The shift now is into structured workflows that span the lifecycle. The unit of leverage is no longer a single suggestion. It is a sequence of actions executed reliably across phases. ("Agentic" later in this post means a system that makes its own next-step decisions inside guardrails. A workflow follows a fixed sequence; an agent chooses the path.) Code generation has become baseline, not differentiator Workflow generation is where the largest gains live Multi-step assistance with explicit human checkpoints Context that travels across tools, not just within one In practice The pattern that works: start with the single highest-volume writing task on the team (commit messages, code review comments, release notes, postmortem first drafts) and turn the AI assist for that task into a shared workflow rather than each individual's private trick. The cost is one engineer's afternoon documenting the workflow and the eval set. The return is that every engineer on the team inherits the work, and the task that used to consume an engineer's morning every two weeks becomes a background step in the release process. Workflow generation, not faster typing, is where the gains compound across a team. Code suggestions help one developer. Reusable workflows help the next ten. 2. AI across the SDLC, with guardrails AI now has a useful role at every phase of delivery. The role is different at each phase, and the guardrails are different too. Phase What AI helps with What humans must validate Plan Breaking down requirements, drafting acceptance criteria Domain context, business priorities, customer impact Build Code generation, refactoring, scaffolding Architectural fit, security boundaries, performance Test Test case generation, edge case discovery Coverage of business-critical paths, regulatory cases Release Release notes, changelog summaries, communication drafts Accuracy, tone, customer-facing claims Operate Log triage, incident summaries, runbook drafts Root cause attribution, action item ownership The guardrails are not optional decoration. They are the design. In practice The pattern that works: stage AI assists for release communication (changelog drafting, customer-facing release notes, internal release announcements) and require a human review before anything goes out. The draft arrives consistently, faster than a human could produce, and easier to compare across releases. The reviewer is not eliminated; the reviewer is moved from author to editor, which is where their judgment actually matters. Teams that adopt this pattern stop missing release-note deadlines and stop publishing inconsistent communication across products. 3. From prompts to reusable assets Many teams begin with prompt experimentation. Individuals find techniques that work for their tasks. The result is a patchwork of personal practices that do not survive a team change. The compounding value comes when prompts mature into reusable engineering assets. Figure 2. The maturity model from prompts to agents. The value compounds at the workflow stage and accelerates at the agent stage. The disciplines that make agents safe are the same ones that made workflows reliable. The maturity stages, in order of leverage: Prompts: ad-hoc, individual, hard to share Templates: parameterised prompts versioned with the project Workflows: multi-step sequences with clear inputs, outputs, checkpoints Agents: autonomous task chains operating within explicit guardrails The diagram is a maturity ladder, not a graduation. In practice teams operate at all four stages simultaneously for different tasks. A senior engineer may use a one-off prompt to explore a refactor, run a versioned template for commit messages, hand off to a workflow for release notes, and trigger an agent for routine PR triage, all in the same hour. The point of the ladder is not to leave earlier stages behind. It is to know which stage a given task belongs to and to invest accordingly. In practice The pattern that works: pick the three prompts your team uses every week, codify them as parameterised templates in the same repository as the application code, and treat them as engineering artefacts (reviewed, versioned, owned). New engineers inherit the team's accumulated practice instead of building their own from scratch. Quality becomes consistent because the variance between individuals shrinks. Investment pays back in weeks, not quarters, and the maturity ladder keeps producing returns as the team moves from templates to workflows to agents. 4. Agentic delivery, with guardrails that survive a security review The next stage is agentic. AI executes sequences of tasks within a defined scope. The risk is not that the agent will fail. It is that the system around the agent will not catch the failure, and that the failure modes are different in kind from traditional automation. Agents are non-deterministic, they can be manipulated through their inputs, and their actions can have side effects in systems the team does not own. Five guardrails make agentic delivery safe. The first four are necessary. The fifth is what carries the agent through a security review at a regulated enterprise. Identity and scope: the agent runs as a managed identity (or scoped service principal) with the smallest set of permissions that lets it do its job. Permissions are expressed as allowlists, not denylists. Tools fetched at runtime are subject to the same identity boundary as the agent itself. Input quarantine: anything the agent reads from a user-controlled source (work item bodies, PR descriptions, customer tickets) is treated as untrusted text. The agent does not execute instructions found in fetched content, and tool calls are validated against an output schema before execution. This is the prompt-injection mitigation, and it is the most common gap in agentic systems shipped today. Cost and blast-radius caps: every run has a maximum token budget, a maximum number of tool calls, and a maximum spend. Exceeding any cap aborts the run cleanly. Without caps, scoped credentials are not enough to bound the damage. Evaluations and traceability: agents are evaluated against a fixed test set before deployment, and on every prompt or model change. Every action is logged with inputs, outputs, the model and prompt versions used, and the reasoning trace where the model exposes one. Logs are redacted for secrets and personally identifiable information at write time. Reversibility taxonomy: actions are categorised by reversibility, not asserted to be reversible in general. A draft write to a private store is reversible. A post to a customer-facing channel is not reversible (deletion does not unsend). A database update may be reversible by a compensating transaction or not at all. Irreversible actions require human approval at the boundary, before they happen, not after. The agent is allowed to draft and stage. The human is the only one who is allowed to make the move that cannot be undone. In practice The pattern that works: start with one low-risk agent (release-notes drafter, PR triage assistant) running on read-only inputs, write-only-to-drafts permissions, and a hard cost cap per run. Require explicit human approval at the irreversible step. Wire up an evaluation set on day one, and rerun it on every prompt or model change. Treat regressions as failures, not warnings. The first agent the team ships is rarely the most valuable; it is the rehearsal that establishes the controls every later agent inherits. Teams that skip this rehearsal end up with an agent in production that no one feels safe extending. Implementation note An agent without a reversibility taxonomy and a regression eval set is a liability. The discipline is the same one that made workflows reliable: scoped identity, idempotency, traceability, and a clear boundary between machine action and human decision. The YAML below is illustrative, not a runtime contract; it is meant to show the shape of the controls a real agent definition would carry, not the syntax of any specific platform. # Agent run definition (illustrative; not a specific platform's syntax) name: release-notes-drafter trigger: pre-release identity: type: managed-identity scope: tenant=<tenant-id> resource=release-tools/<app-id> permissions: allow: - read: work-items in milestone (filter: state=Done) - read: pull-requests in milestone (filter: merged) - write: drafts/release-notes/${run-id} # Production channels are NOT in the allowlist. The agent cannot post. limits: max_tokens_per_run: 80000 max_tool_calls_per_run: 20 max_runtime_seconds: 300 max_cost_usd: 0.40 on_exceeded: abort_with_partial_artifact input_handling: treat_fetched_content_as: untrusted # Indirect prompt injection is mitigated by the layered discipline below, # not by a single feature flag. Each item is a separate control. enforce_instruction_hierarchy: true validate_tool_args_against_schema: true validate_outputs_against_schema: true steps: - fetch: completed work items in milestone - draft: release notes from items - validate: required fields present - request-review: from: release-manager idempotency_key: ${milestone-id}-${draft-hash} - on-approval: action: post-to-internal-channel reversibility: not-reversible requires: explicit-human-click # the agent does NOT click this audit: log_inputs: true log_outputs: true redact: - secrets # Pattern-based: handles structured PII like emails, phones, IDs. - pii_patterns: [email, phone, national-id, payment-card, ip-address] # Entity-based: required for unstructured PII like names. Pattern alone # cannot redact a customer name without an entity-recognition step. - pii_entities: ner-based # names, locations, organisations retain: 365_days # tune to your audit policy, not to the demo evaluation: test_set: tests/release-notes/eval-v3.jsonl on_prompt_change: rerun on_model_change: rerun fail_threshold: 5_percent_regression 5. Where AI still needs human judgment AI has clear boundaries. The boundaries are not embarrassing. They are the design. What must stay human-owned: Architectural trade-offs and design decisions Security validation and threat modelling Correctness for business-critical and regulatory paths Domain context that has not been written down Accountability for outcomes, not just outputs The goal is collaboration, not replacement. The teams that get the most value from AI are not the ones with the most automation. They are the ones with the clearest sense of where automation ends and judgment begins. In practice The pattern that works: name the human-owned items explicitly in the team's working agreement (architecture, security, regulatory correctness, accountability) and audit every AI workflow against that list. When a workflow asks the AI to make a decision in any of those categories, redesign it so the AI prepares the analysis and a human makes the call. Most teams over-trust AI for one of these areas in their first six months and learn the hard way. Naming the boundary up front prevents the lesson from being paid in production. The clarity is the value; the model behind the workflow is interchangeable. 6. Responsible AI is engineering work The first five disciplines decide whether AI moves the needle. The sixth decides whether the platform can defend the choices it makes with AI. Responsible AI is the engineering practice of building systems whose AI behaviour is fair, transparent, accountable, and safe by design, not by audit after the fact. Treating it as a compliance checkbox at the end of the project is how teams end up shipping AI workflows that fail security review, embarrass the company, or harm users. Six controls turn responsible AI from a policy into engineering work. These map directly onto the practices Microsoft and the broader industry have converged on, but the names matter less than the practice they enable. Fairness in inputs and outputs. The training data, eval set, and prompts are reviewed for systematic bias against any group the system serves. The eval set covers under-represented cases by design, not by accident, and regressions on those cases fail the build. Transparency to end users. When a user sees AI-generated content, they are told. When a decision is AI-assisted, the path from input to output is explainable in plain language, not just in a model card buried in documentation. Content safety filters. Inputs and outputs pass through safety classifiers (prompt injection, prohibited content, jailbreak patterns) before reaching the model and before reaching the user. Filtering decisions are logged and reviewable. Accountability ownership. Every AI workflow has a named owner who is accountable for its outcomes, not just its uptime. The owner has the authority to pause or roll back the workflow when harm is detected. Data minimisation and residency. The AI sees only the data it needs to do the task. Personally identifiable information and customer data are scoped, redacted, and kept inside the boundary the customer agreed to. Cross-tenant leakage is treated as a P1 incident, not a feature request. Harm evaluation alongside quality evaluation. The eval set measures harm potential (toxicity, hallucination on factual queries, leakage of confidential context) with the same rigour as it measures correctness. Both must pass for a release to ship. Figure 3. Responsible AI as a set of engineering controls around the AI workflow. The six controls fall into four categories: data discipline (fairness, data minimisation), model discipline (content safety, harm evaluation), deployment discipline (transparency to users), and governance (accountability ownership). All six are necessary; none is sufficient on its own. In practice The pattern that works: write the responsible AI plan before the first agent ships, not after the first incident. Pick one workflow that touches user data or generates customer-facing content, and use it as the reference implementation: fairness review on the eval set, content safety filters wrapping the model call, transparency annotation in the UI, redaction of identifying details in logs, harm evals running alongside quality evals on every change, and a named owner with explicit pause authority. The first such workflow takes longer to ship than the unconstrained version. Every workflow after it inherits the controls and ships faster than it would have without them. Teams that defer responsible AI to a future quarter end up retrofitting it under pressure, which is the most expensive way to do it. A scenario that ties it together Picture a platform team several months into using Copilot. Adoption is high. Productivity dashboards show gains. But defect rates are not improving and lead time is flat. Leadership asks the obvious question: is AI actually helping, or just feeling like help? The answer is not to stop using AI. It is to change how AI is measured. Move adoption metrics to the background. Move outcome metrics to the front: defect escape rate, lead time for change, change failure rate, mean time to recovery. In parallel, promote the individual prompts that have proved themselves to shared templates, and the templates to versioned workflows. Retrofit responsible AI controls onto the workflows that shipped first: content safety filters, harm evaluations alongside quality evaluations, transparency annotations on customer-facing output, and a named owner for each workflow. Six months later, the picture is different. Defect rate improves on the parts of the codebase where reusable workflows were introduced. Onboarding for new engineers is visibly faster. Release notes are consistent across teams. The shift is from celebrating use to tracking outcomes, and once the team measures what matters, the tooling decisions start making themselves. What teams get wrong The common pattern is measuring AI by usage, not by outcome. Adoption metrics tell you who tried Copilot. They do not tell you whether defects dropped, lead time improved, or release notes got better. The fix is not less AI. It is better measurement. The four metrics named in the scenario above (defect escape rate, lead time for change, change failure rate, mean time to recovery) come from the DORA research on software delivery performance and have become a useful default. Two warnings travel with them. First, attribution is hard: an AI workflow rolled out alongside a test refactor and a CI pipeline change cannot claim credit cleanly. Second, baselines matter more than headlines: a single quarter's improvement is not a trend, and a single team's gain is not the platform's gain. Outcome measurement done well needs a baseline window, an attribution discipline, and a kill criterion for workflows that are not paying back. Done poorly, it is just adoption metrics with better names. There is also the question of cost. AI usage carries a per-run token bill, an evaluation bill on every change, and (for agents) a cost cap that limits damage when something goes wrong. None of these are large compared to the engineering time saved when the workflow works. All of them are visible enough that a finance-aware reader will ask. Track them. Where to start The most concrete starter from this post: promote one personal prompt to a shared template. Pick the prompt that gets used most often (commit messages, code reviews, release notes, debugging assist), move it from someone's notes into the repository where the team versions everything else, and watch what changes when the next person on the team runs it. That is the smallest unit of the workflow shift this post argues for, and it is the step where prompts stop being individual practice and start becoming engineering assets. The shift The shift is from building systems to building smarter systems: AI does not replace engineers. It changes what an engineer's leverage looks like. The unit of value is the workflow, not the suggestion. The discipline that made platforms operable is the same discipline that makes AI useful. Responsible AI is not a compliance step. It is the sixth engineering discipline that lets the other five compound safely. The series ends here, but the arc is consistent across all three posts. The disciplines that make platforms scale are the same disciplines that make AI useful. Build with discipline. Run with discipline. Evolve with discipline. The tools change. The disciplines do not. Want to discuss? Where has AI moved the needle most in your delivery, and where has it disappointed you? Drop a comment with patterns you have seen in your environment. Every reply gets read. Previously in this series: Building Cloud Native Platforms That Scale: Patterns That Actually Work. Part 1 covered the design choices that make scale possible. Running Cloud Native Platforms: Why Day 2 Decides Everything. Part 2 covered the operational disciplines that decide production outcomes. This is the third and final post in the series.From Prompt to Production: Building Azure Architecture Diagrams with AI
Author: Arturo Quiroga, Senior Partner Solutions Architect — Microsoft Cloud architects spend significant time translating ideas into architecture diagrams. They toggle between Visio, draw.io, pricing calculators, and documentation. According to the 2024 Stack Overflow Developer Survey, 61% of developers spend more than 30 minutes a day searching for answers or solutions, time lost to context-switching rather than design. What if you could describe your architecture in plain English and get a diagram, cost estimate, and deployment guide in minutes? The Challenge: Fragmented Architecture Workflows Designing Azure architectures today typically involves multiple disconnected steps: Sketch the architecture in a diagramming tool Look up official Azure icons and drag them into place Research pricing across regions using the Azure Pricing Calculator Validate the design against the Well-Architected Framework (WAF) Write deployment documentation and Infrastructure as Code templates Compare alternative designs manually Each step lives in a different tool, and keeping them in sync as designs evolve is costly. The Azure Architecture Diagram Builder brings these workflows together in a single browser-based experience. How It Works Describe your architecture in natural language, for example "A HIPAA-compliant healthcare platform with FHIR APIs, event-driven processing, and multi-region disaster recovery", and the AI generates a diagram with grouped services, data flow connections, and logical organization. Figure 1. Enter a natural-language prompt describing your architecture. Curated example prompts help you get started, and you can optionally upload an existing diagram for the AI to analyze. The tool uses Azure OpenAI to power generation across multiple models, enabling you to choose the model that best fits your scenario — from fast iterations to deeper reasoning. Key Features AI-Powered Architecture Generation Describe what you need in plain English, and the AI creates an architecture diagram with: 714 official Azure service icons across 29 categories Smart grouping: services are logically organized (Frontend, Backend, Data, Security) Data flow connections: labeled edges showing how data moves through the system 13 curated example prompts: from simple web apps to complex enterprise scenarios like Zero Trust networks, Industrial IoT with 5,000+ sensors, and global multiplayer gaming backends Figure 2. A generated industrial IoT architecture. Top: the clean diagram view as initially produced. Bottom: the same diagram with per-service monthly cost overlays toggled on, plus a running subscription total in the toolbar. Architecture Image Import Already have an architecture on a whiteboard or in a screenshot? Upload the image and let the AI analyze it, mapping services to official Azure icons and recreating the architecture as an editable, interactive diagram. Figure 3. Upload a photo of a whiteboard sketch (top-right reference panel) and the AI recreates it as an editable diagram with official Azure service icons and labeled data flow connections. ARM Template Import Import existing ARM templates to visualize your current infrastructure. The AI parses resource definitions and dependencies, groups related resources into logical layers, and produces a meaningful diagram of what you actually have deployed — a fast way to document an inherited environment or sanity-check a template before deployment. Figure 4. ARM template import in action. Top: the parser status banner while resources and dependencies are being analyzed. Bottom: the resulting diagram, with resources auto-grouped into logical layers (Web Tier, Data Layer, Container Platform, Observability & Logging) and a Generated from: ARM Template badge linking the diagram back to its source file. Well-Architected Framework Validation Validate your architecture against all five WAF pillars — Security, Reliability, Performance Efficiency, Cost Optimization, and Operational Excellence. The validator provides: An overall WAF score with pillar-level breakdowns Specific findings with severity levels Actionable recommendations you can select and apply Select the recommendations you agree with, and the AI regenerates an improved architecture incorporating those changes. Figure 5. WAF validation results showing the overall score, per-pillar breakdowns, and individual findings with severity badges. Tick the recommendations you want and the AI rebuilds the diagram with those changes applied. Multi-Model Comparison Run the same architecture prompt through multiple AI models side-by-side and compare: Architecture Comparison: service counts, connection counts, groups, token usage, and latency Validation Comparison: WAF scores across models, severity breakdowns, and finding counts Apply Winner: pick the best result and apply it to the canvas with one click Present Critique: a talking avatar narrates the AI-generated ranking with live closed captions Figure 6. Multi-model comparison. Top: select the models and reasoning effort, then enter the prompt. Bottom: side-by-side results across all selected models with service counts, latency, token usage, and Fastest / Cheapest / Most Thorough badges. Multi-Region Cost Estimation Get cost estimates from the Azure Retail Prices API across 8 Azure regions: East US 2, Australia East, Canada Central, Brazil South, Mexico Central, West Europe, Sweden Central, and Southeast Asia. Features include: Color-coded cost legend (green / yellow / red thresholds) SKU and tier information for each service Export options: CSV, JSON, plain-text summary, and an analysis report with top cost drivers, Reserved Instance flags, and a ranked multi-region comparison table Figure 7. The cost legend overlay shows per-service pricing with color-coded thresholds. The region selector in the toolbar lets you re-price the entire architecture in any of eight Azure regions. Deployment Guide Generation with Bicep Generate step-by-step deployment documentation including: Prerequisites and Azure resource requirements Step-by-step deployment instructions Bicep templates for each service (Infrastructure as Code) Post-deployment verification steps Security configuration recommendations Figure 8. Each generated Deployment Guide opens with the architecture name, an estimated deployment time, and a prerequisites checklist covering subscription roles, CLI versions, Microsoft Entra ID permissions, and region requirements, followed by numbered, copy-ready deployment steps. Figure 9. The Infrastructure as Code section produces a main.bicep orchestrator plus a per-service module (Log Analytics, Key Vault, Cosmos DB, SQL Database, Event Hubs, Azure Functions, and more). The Download All Templates button packages everything into a ready-to-deploy folder. Workflow Animation & Avatar Presenter Visualize how data flows through your architecture with step-by-step animations that highlight services on the canvas as each step plays. When the Azure Speech Service is configured, a photorealistic talking avatar can narrate the workflow or present model comparison results, with live word-by-word closed captions in a draggable, resizable panel. Figure 10. A workflow step is highlighted on the canvas as the Avatar Presenter narrates that step. Live word-by-word closed captions appear in a draggable, resizable panel, useful for accessibility and stakeholder demos. Export Options Figure 11. A single-slide PowerPoint export, available in dark or light theme, ready to drop straight into a stakeholder deck. Format Use Case PNG Documentation, presentations SVG Scalable vector graphics PPTX Single PowerPoint slide (dark or light theme) Draw.io Edit in diagrams.net JSON Backup, version control CSV / ZIP Cost analysis with multi-region comparison Highlights The Azure Architecture Diagram Builder unifies the architecture design lifecycle in a single tool: End-to-end workflow: from natural-language description to deployable Bicep templates without tool switching Official Azure icons: 714 icons across 29 categories, mapped directly from the Azure service catalog Live pricing: queries the Azure Retail Prices API at design time rather than relying on static estimates WAF-integrated validation: architectural best practices built into the design loop rather than applied after the fact Multi-model flexibility: choose the AI model that best suits each task, with fast models for iteration and reasoning models for complex designs Open source: the source code is available for customization and contribution One-Command Deploy with Azure Developer CLI The fastest way to get your own instance running is with azd : # Install azd (once) brew tap azure/azd && brew install azd # macOS winget install microsoft.azd # Windows # Clone, configure, and deploy git clone https://github.com/Arturo-Quiroga-MSFT/azure-architecture-diagram-builder cd azure-architecture-diagram-builder azd auth login azd env set AZURE_OPENAI_ENDPOINT "https://your-resource.openai.azure.com/" azd env set AZURE_OPENAI_API_KEY "your-key" azd up # Provisions infrastructure + builds + deploys (~8 min) azd up provisions the following via Bicep: Resource Purpose Azure Container Registry Stores the Docker image Azure Container Apps Runs the app (nginx + token server) Log Analytics + Application Insights Monitoring and telemetry Azure Speech (S0) Avatar Presenter (optional, keyless auth via managed identity) Try It Today The Azure Architecture Diagram Builder is available now: Live demo: https://aka.ms/diagram-builder Source code: GitHub repository Documentation: See the Getting Started Guide for detailed setup instructions We welcome feedback and contributions. Use the GitHub Issues page to report bugs, suggest features, or share your experience. Tags: artificial intelligence · application · apps & devops · well architected · infrastructure821Views1like0CommentsWAR, Azure Advisor, and Us (Azure Arch Diagram Builder): Three Ways to Score an Azure Architecture
Author: Arturo Quiroga, Azure AI services Engineer - Senior Partner Solutions Architect — Microsoft A few days ago I published From Prompt to Production: Building Azure Architecture Diagrams with AI, introducing the open-source Azure Architecture Diagram Builder. One feature got more follow-up questions than any other: the Well-Architected Framework (WAF) validation. Architects from partners and customers — many of whom already use Azure Advisor and the Well-Architected Review — wanted to know exactly what scoring algorithm we use, how it compares to Microsoft's official tools, and whether they should be using all three. This post is that answer. It's a deep dive into how design-time WAF validation works, how Microsoft's two official WAF assessment algorithms work, and where each fits in the architecture lifecycle. TL;DR. Microsoft ships two WAF assessment vehicles — the Well-Architected Review (questionnaire, scored from human answers) and the Azure Advisor score (healthy-resources-÷-applicable-resources weighted per subcategory, with Defender Secure Score for Security and cost-weighted math for Cost). Both require either a human filling in a form or live Azure telemetry. Our app runs at design time on a diagram, before anything is deployed, using a hybrid pipeline: a deterministic rule pre-scan followed by an LLM refinement pass. Same five WAF pillars, different lifecycle stage. Complementary, not competitive. Why design-time validation matters Every cost overrun, reliability gap, and security incident I've ever debugged was cheaper to fix on a whiteboard than in production. Yet most WAF tooling assumes the architecture already exists — either because there are deployed resources to scan (Advisor) or because someone has built enough of it to answer 60 specific questions about it (WAR). That leaves a gap. Between "rough sketch" and "deployed resource group" there is no algorithmic WAF feedback loop. That's the gap the Diagram Builder fills. Microsoft's two official WAF assessment algorithms Before describing our approach, it's worth being precise about what Microsoft already ships, because the term "WAF assessment algorithm" can mean either of two very different things. 1. Azure Well-Architected Review (WAR) — questionnaire-based The Well-Architected Review is a free self-assessment hosted on Microsoft Learn. Aspect Detail Input Human answers to ~60 questions mapped to the WAF pillar checklists Workload variants Core WAR, plus AI/ML, IoT, SAP on Azure, Azure Stack Hub, SaaS, Mission Critical Scoring Derived from the answers — each "no" or unanswered question subtracts from the pillar score Output Per-pillar maturity score + prioritized recommendations + optional Advisor integration Improvement tracking "Milestones" (point-in-time snapshots) When to use Periodic deep reviews; greenfield design baselining; brownfield audits WAR is human-driven. The algorithm is essentially "how many of the recommended practices have you confirmed you do?" — which is exactly the right algorithm when the assessor is the workload team itself. 2. Azure Advisor Score — telemetry-based The Advisor score is the closest thing Microsoft ships to a real, deterministic WAF algorithm. It runs continuously over your deployed Azure resources. The math: Pillar-specific overrides: Security uses Microsoft Defender for Cloud's Secure Score model. Cost weights by retail $ cost of healthy resources, plus age-of-recommendation weighting; postponed/dismissed items are removed from the denominator. Reliability / Performance / Operational Excellence use the healthy-resources ratio above. Key terms: Healthy resource — a deployed resource with no open Advisor recommendation against it for that pillar. Total applicable — resources Advisor was able to evaluate (excludes dismissed/snoozed). Advisor is the right tool once you're in production. It cannot help you before deployment, because there is nothing to count as "healthy" or "applicable." The missing stage: design time Here's the lifecycle, with each tool's domain shaded: Design / Diagram — Diagram Builder validation runs here. Operate / Observe — Azure Advisor runs here continuously. Periodic Review — WAR runs here, typically quarterly or at major milestones. These three stages are sequential and complementary. Our app does not replace Advisor or WAR — it adds a feedback loop earlier in the lifecycle, where corrections are cheapest. How design-time validation works in the Azure Architecture Diagram Builder The validator is a two-phase hybrid pipeline: deterministic local rules first, then LLM refinement. The full source lives in three files: src/services/architectureValidator.ts — orchestrator and prompt src/services/wafPatternDetector.ts — topology + service rule engine src/data/wafRules.ts — the rule knowledge base Phase 1 — Deterministic rule pre-scan (~1 ms, no LLM) When you click Validate Architecture, the validator runs a fully client-side rule engine against the diagram's services, connections, and groups. There are two kinds of rules: Architecture-pattern rules These fire when a topology anti-pattern is detected: Pattern Detection trigger single-region No global LB (Traffic Manager / Front Door) with ≥3 services single-database Exactly one database service, no replication signal no-cache Compute + database present, no Redis/CDN no-monitoring No Azure Monitor / App Insights / Log Analytics no-identity No Microsoft Entra ID no-waf Public web tier without WAF / Front Door / App Gateway direct-db-access An edge from a frontend service directly into a database no-key-vault 4+ services and no Key Vault no-backup Database present, no Azure Backup / Recovery Services no-api-gateway 2+ compute services and no APIM / App Gateway / Front Door Service-specific rules Every service in the in the generated Azure Architecture diagram is matched against SERVICE_SPECIFIC_RULES by normalized type — App Service, Functions, AKS, Cosmos DB, SQL Database, Storage, Key Vault, and 22 more. The knowledge base at a glance Metric Count Total rules 73 Architecture-pattern rules 10 Service-specific rules 63 Distinct Azure services covered 29 Rules tagged Reliability 18 Rules tagged Security 34 Rules tagged Cost Optimization 5 Rules tagged Operational Excellence 7 Rules tagged Performance Efficiency 9 The preliminary score Each finding has a severity, and severity drives a fixed point deduction from a starting score of 100: Severity Deduction critical −12 high −7 medium −3 low −1 Result is floored at 10 (so even a deliberately bad architecture scores at least 10) and ceilinged at 95 (no findings ≠ perfect — there's always something the model might still catch). This is the deterministic baseline before the LLM ever sees the architecture, and it's what makes the pipeline reproducible. Phase 2 — LLM contextual refinement The pre-scan output, the topology, and the optional natural-language description are folded into a focused prompt sent to one of seven Azure OpenAI models (GPT-5.1 through 5.4, GPT-5.x Codex variants, DeepSeek V3.2 Speciale, Grok 4.1 Fast). The system prompt gives the model explicit scoring guardrails: Score based on what IS present, not what COULD be added. A well-connected architecture with appropriate services should score 60–80. Score below 50 only for critical gaps (no auth, no monitoring, single points of failure). Findings are improvement suggestions, not reasons to penalize the score severely. The model returns strict JSON: { "overallScore": 0-100, "summary": "2–3 sentence assessment", "pillars": [ { "pillar": "Reliability | Security | Cost Optimization | Operational Excellence | Performance Efficiency", "score": 0-100, "findings": [ { "severity": "critical | high | medium | low", "category": "...", "issue": "...", "recommendation": "...", "resources": ["service-name-1", "service-name-2"], "source": "rule-based | ai-analysis" } ] } ], "quickWins": [ /* same shape as findings */ ] } Two things to call out: Every finding is tagged rule-based or ai-analysis . That tag is the credibility lever. You can always see what the deterministic engine produced versus what the model contributed on top. If you don't trust the AI layer, you can ignore it entirely — the rule layer still stands. The LLM is given pattern hints, not the entire rule catalog. The prompt stays small and focused, which is roughly 3–5× faster and cheaper than asking the LLM to do everything from scratch. What the user sees On every run the modal reports: Overall WAF score (0–100) Per-pillar score × 5 (0–100 each) Severity breakdown — counts of critical / high / medium / low across all findings Quick wins — high-impact, low-effort items the model surfaces separately Hybrid metadata — local findings count, patterns detected, KB rules used, preliminary score, local elapsed ms AI metrics — model used, reasoning effort, prompt/completion/total tokens, elapsed time App Insights telemetry — an Architecture_Validated event with model, overall score, finding count, elapsed time Worked example Take this prompt, which I've used in demos with partners: "A multi-region web application: Azure Front Door in front of two App Service instances in West US 2 and East US 2, both reading from an Azure SQL Database with geo-replication, with Application Insights for telemetry. No Entra ID, no Key Vault." After generation, Validate Architecture runs: Phase 1 — pre-scan (deterministic), ~1 ms Patterns detected: no-identity , no-key-vault Findings produced: 8 (1 critical, 1 high, 3 medium, 3 low) Preliminary score: 100 − 12 − 7 − (3×3) − (1×3) = 69 Phase 2 — LLM refinement, ~6–9 s depending on model The model accepts the two pattern hints, validates them in context, and adds three more findings of its own: Finding Source Pillar Severity No Microsoft Entra ID for authentication rule-based Security critical No Key Vault for secret management rule-based Security high App Service slots not used for safe deploys ai-analysis Operational Excellence medium SQL DB geo-replication present but RTO/RPO not documented ai-analysis Reliability medium No CDN for static assets behind Front Door ai-analysis Performance Efficiency low Final scores returned by the model: Pillar Score Reliability 78 Security 52 Cost Optimization 80 Operational Excellence 70 Performance Efficiency 75 Overall 71 The Security score is the lowest because two of the highest-severity findings landed there — exactly what a human reviewer would flag first. Multi-model comparison Because the deterministic floor is identical across runs, the Validation Comparison view becomes a fair shootout of what each LLM adds on top of the same baseline. The same diagram is scored by all seven models, and the UI surfaces: Overall score per model Per-pillar score per model Severity-count deltas Number of ai-analysis findings each model contributed Quick wins each model identified This is genuinely useful for two reasons. First, it shows that LLM scores vary — typically by ±5–10 points on the same architecture — which is exactly why we publish the rule-based vs ai-analysis tag. Second, it lets architects pick the model whose review style matches their own. How we align with Microsoft's algorithms Alignment point What it means Same five pillars Identical names and scope to the official WAF Same source material Rules derived from WAF docs and Azure Architecture Center service guides Severity-graded findings Map conceptually to Advisor's high/medium/low impact recommendations Per-pillar + overall scoring Mirrors WAR/Advisor output shape, so the results feel familiar Where we deliberately differ — and why Concern Microsoft Diagram Builder Why we differ Needs deployed resources Advisor: yes No — works on a diagram We're a design-time tool; the architecture doesn't exist yet Needs human Q&A WAR: yes No — derived from the diagram One-click validation inside the design flow Healthy/Applicable ratio Advisor: yes No No resource-health signal exists pre-deployment Subcategory fixed weights Advisor: yes No explicit weights Severity is the de-facto weight (12/7/3/1) Defender Secure Score for Security Advisor: yes No Defender requires deployed resources Cost-weighted scoring Advisor: yes No (separate Cost Estimation feature) Cost is a separate pipeline in our app AI/LLM refinement Neither Yes Catches context-specific issues a static catalog misses, and explains findings in natural language Multi-model comparison Neither Yes Lets architects see scoring variance across models Honest limitations I'd rather you hear these from me than discover them in production: LLM scores drift. ±5–10 points across models on the same diagram is normal. Treat the score as directional, the findings as actionable. The rule-based tag is your anchor. No live telemetry. We can't know if your App Service is actually using availability zones — only that you have App Service in the diagram. Advisor will tell you the truth post-deployment. Generic ruleset. No specialized workload branches yet (AI/ML, IoT, SAP, SaaS). WAR has those. No milestone tracking. Each validation run is independent. Compare runs manually using the Validation Comparison view. Rule coverage is finite. 29 services and 73 rules is a strong start but not exhaustive — the LLM layer exists in part to compensate for that gap. How to use all three together A lifecycle that actually works: Design — Use the Diagram Builder to sketch the architecture and validate at design time. Iterate until the per-pillar scores look reasonable and the critical/high findings are addressed. Deploy — Generate Bicep from the diagram, deploy, and let Azure Advisor start scoring real resources. Operate — Use Azure Advisor continuously. Use Defender Secure Score for security posture. Periodic review — Run a Core WAR every quarter or at major milestones to capture the things only humans know (business context, tradeoffs, planned debt). None of these three replace the others. They cover different stages of the same loop. What's next A few things on the roadmap I'd love feedback on: Milestone tracking so design-time scores can be compared over time the way WAR milestones work. Workload-specific rulesets mirroring WAR's branches — starting with AI/ML. Direct Advisor handoff — once a diagram is deployed, surface the corresponding Advisor recommendations in the same UI to close the loop. Try it, fork it, tell me where it's wrong Live app: https://aka.ms/diagram-builder Source: github.com/Arturo-Quiroga-MSFT/azure-architecture-diagram-builder Useful references: Azure Well-Architected Framework pillars Azure Well-Architected Review tool Azure Advisor score — calculation Use Azure WAF assessments (Advisor) Complete an Azure Well-Architected Review assessment If you're a partner or customer architect who's already living in Advisor and WAR, I'd genuinely value your reaction — does the design-time stage feel like a real gap to you, or are you already covering it some other way? Open an issue on the repo or reply on LinkedIn. Posted on the Azure Architecture Blog · Comments and issues welcome on the repo.274Views0likes0CommentsGoverning Agent Sprawl: A Multi‑Region AI Agent Landing Zone on Azure (Reference Architecture)
It doesn’t take long for AI agents to get out of hand. In most enterprises, the first few agents are celebrated. A chatbot here. A document summarizer there. Then another team ships an agent that calls APIs. Someone else connects one to internal data. Within months, IT is staring at dozens—or hundreds—of autonomous systems running across subscriptions, regions, and tools. At that point, the questions stop being about model quality and start being uncomfortable operational ones: Who owns this agent? What data can it access? What happens if it misbehaves? Why did it just consume half our monthly token budget in a day? Developers can build an AI agent in minutes—the difficult part is understanding what agents are doing, how they perform, and whether they comply with organizational policy. Signals scatter across tools, context is lost, and governance becomes reactive. This reference architecture exists to solve that problem. It describes a multi‑region AI agent landing zone on Azure that treats agents as first‑class, governable workloads—provisioned automatically, constrained by policy, and observable from day one. The architectural principle: separate control from execution The design starts with a simple but non‑negotiable rule: Control plane concerns must be separated from runtime concerns. Azure landing zones already follow this model. Management groups, Azure Policy, and RBAC are global constructs. Workloads run in regions. This architecture applies the same discipline to AI agents. The runtime plane is where agents execute, models infer, and data flows—often in multiple Azure regions. The control plane is where identity, policy, safety, evaluation, and oversight live—independent of region. This separation is what allows teams to scale agents without losing control. Layer 1: Azure AI Gateway — governing every request The first control layer sits directly in the request path. The AI gateway in Azure API Management provides a policy‑enforcement and observability layer in front of AI models, agents, and tools. It is not a separate service—it extends Azure API Management. Everything flows through it: Microsoft Foundry model deployments Azure AI Model Inference API endpoints OpenAI‑compatible third‑party models Self‑hosted models MCP servers and A2A agent APIs (preview) What the gateway actually enforces This layer is intentionally narrow and operational: Token quotas and rate limits The llm-token-limit policy (GA) enforces tokens‑per‑minute or quota ceilings per consumer before requests reach the backend. This prevents one application—or one agent—from exhausting shared capacity. Content safety at ingress The llm-content-safety policy (GA) integrates Azure AI Content Safety to moderate prompts automatically. Unsafe requests never reach the model. Traffic routing and resiliency Azure API Management supports multi‑region gateway deployment (Premium tier). If a region fails, traffic routes to the next closest gateway automatically. Token usage, prompts, and completions are logged to Azure Monitor and Application Insights using built‑in policies such as llm-emit-token-metric. The gateway does not understand agent intent or business context. That is by design. It governs traffic, not behavior. Layer 2: Azure AI Foundry Control Plane — governing behavior at scale The second layer governs what agents do, not just how requests flow. Azure AI Foundry Control Plane provides a unified management surface for AI agents, models, and tools across projects and subscriptions. It is designed specifically for agentic systems. Foundry Control Plane is currently in public preview. What Foundry Control Plane adds Fleet‑wide inventory Every agent, model, and tool appears in a single, searchable view across projects. Continuous evaluation on production traffic Foundry runs evaluations that measure task adherence, groundedness, tool‑call accuracy, sensitive data exposure, and other agent‑specific risk dimensions. Centralized guardrails Policy is enforced across inputs, outputs, and tool interactions—not just prompts. Bulk remediation can be applied across the fleet. Security integration Foundry integrates with: Microsoft Entra for agent identity (Entra Agent ID) Microsoft Defender for threat signals Microsoft Purview for data protection and compliance visibility Foundry Control Plane also requires an AI Gateway to be configured for advanced governance scenarios—reinforcing the layered approach. Layer 3: Microsoft Agent 365 — enterprise oversight, not just Azure oversight The third layer exists because Azure governance alone is not enough. Agents don’t just call APIs. They act on behalf of users. They access enterprise data. They operate inside Microsoft 365 workflows. Microsoft Agent 365 is the tenant‑level control plane for AI agents. It brings agents under the same administrative model used for users and applications. Status: Frontier Preview General availability: May 1, 2026 Why this layer matters Agent 365 introduces controls that Azure alone cannot provide: Agent registry A single inventory of all agents in the tenant—including sanctioned and shadow agents. Unsanctioned agents can be quarantined. Identity‑first access control Every agent is issued an Entra agent ID. Conditional Access policies apply to agents the same way they do to users. Human‑in‑the‑loop oversight Agents surface in Microsoft 365 admin workflows, not just Azure portals. Security and compliance Defender and Purview extend threat detection and data protection policies to agent activity. Agent 365 does not replace Foundry Control Plane. It complements it—connecting agent operations to enterprise identity, compliance, and productivity systems. How the pieces work together Individually, these services are powerful. The architecture works because they are deliberately layered. External approval → automated provisioning When a use case is approved in an external governance system, it triggers an Azure DevOps pipeline using the REST API. That pipeline: Provisions subscriptions and resource groups Deploys Foundry projects Configures Azure API Management with AI Gateway policies Enables monitoring and logging Governance is applied before the first request is made. One policy model, many regions Azure landing zones are region‑agnostic at the governance layer. This architecture follows that guidance. Policies and RBAC apply globally AI Gateway enforces limits locally in each region Runtime services scale region by region Expanding to a new region does not introduce a new governance model—only new capacity. A single operational view Signals flow upward: AI Gateway emits traffic and usage metrics Foundry Control Plane correlates evaluations, guardrail enforcement, and security alerts Agent 365 aggregates tenant‑level identity, compliance, and threat signals Operations teams no longer hunt across dashboards. They work from one prioritized view, with context intact. What this architecture deliberately does not promise This is a reference architecture, not a silver bullet. It does not eliminate the need for: Clear agent ownership Business‑level approval processes Ongoing evaluation of agent usefulness What it does provide is a foundation—one that lets organizations scale agentic AI without accepting chaos as the cost of innovation. Closing thoughts Agent sprawl is not a tooling failure. It’s an architectural one. By separating control from execution, layering governance where it belongs, and aligning AI operations with existing Azure and Microsoft 365 control planes, this architecture gives enterprises a way to move fast without losing sight of what their agents are doing. That’s the difference between experimentation—and production. Co-Contributor: Jorge Pena Alarcon-Sr. Cloud & AI Specialist References (official Microsoft sources) Azure AI Gateway in Azure API Management Configure AI Gateway for Foundry Foundry Control Plane overview Microsoft Agent 365 announcement Agent 365 GA annoucement Azure landing zones and regions Azure DevOps pipeline REST API992Views1like1CommentHow to Modernise a Microsoft Access Database (Forms + VBA) to Node.JS, OpenAPI and SQL Server
Microsoft Access has played a significant role in enterprise environments for over three decades. Released in November 1992, its flexibility and ease of use made it a popular choice for organizations of all sizes—from FTSE250 companies to startups and the public sector. The platform enables rapid development of graphical user interfaces (GUIs) paired with relational databases, allowing users to quickly create professional-looking applications. Developers, data architects, and power users have all leveraged Microsoft Access to address various enterprise challenges. Its integration with Microsoft Visual Basic for Applications (VBA), an object-based programming language, ensured that Access solutions often became central to business operations. Unsurprisingly, modernizing these applications is a common requirement in contemporary IT engagements as thse solutions lead to data fragmentation, lack of integration into master data systems, multiple copies of the same data replicated across each access database and so on. At first glance, upgrading a Microsoft Access application may seem simple, given its reliance on forms, VBA code, queries, and tables. However, substantial complexity often lurks beneath this straightforward exterior. Modernization efforts must consider whether to retain the familiar user interface to reduce staff retraining, how to accurately re-implement business logic, strategies for seamless data migration, and whether to introduce an API layer for data access. These factors can significantly increase the scope and effort required to deliver a modern equivalent, especially when dealing with numerous web forms, making manual rewrites a daunting task. This is where GitHub Copilot can have a transformative impact, dramatically reducing redevelopment time. By following a defined migration path, it is possible to deliver a modernized solution in as little as two weeks. In this blog post, I’ll walk you through each tier of the application and give you example prompts used at each stage. 🏛️Architecture Breakdown: The N-Tier Approach Breaking down the application architecture reveals a classic N-Tier structure, consisting of a presentation layer, business logic layer, data access layer, and data management layer. 💫First-Layer Migration: Migrating a Microsoft Access Database to SQL Server The migration process began with the database layer, which is typically the most straightforward to move from Access to another relational database management system (RDBMS). In this case, SQL Server was selected to leverage the SQL Server Migration Assistant (SSMA) for Microsoft Access—a free tool from Microsoft that streamlines database migration to SQL Server, Azure SQL Database, or Azure SQL Database Managed Instance (SQLMI). While GitHub Copilot could generate new database schemas and insert scripts, the availability of a specialized tool made the process more efficient. Using SSMA, the database was migrated to SQL Server with minimal effort. However, it is important to note that relationships in Microsoft Access may lack explicit names. In such cases, SSMA appends a GUID or uses one entirely to create unique foreign key names, which can result in confusing relationship names post-migration. Fortunately, GitHub Copilot can batch-rename these relationships in the generated SQL scripts, applying more meaningful naming conventions. By dropping and recreating the constraints, relationships become easier to understand and maintain. SSMA handles the bulk of the migration workload, allowing you to quickly obtain a fully functional SQL Server database containing all original data. In practice, renaming and recreating constraints often takes longer than the data migration itself. Prompt Used: # Context I want to refactor the #file:script.sql SQL script. Your task is to follow the below steps to analyse it and refactor it according to the specified rules. You are allowed to create / run any python scripts or terminal commands to assist in the analysis and refactoring process. # Analysis Phase Identify: Any warning comments Relations between tables Foreign key creation References to these foreign keys in 'MS_SSMA_SOURCE' metadata # Refactor Phase Refactor any SQL matching the following rules: - Create a new script file with the same name as the original but with a `.refactored.sql` extension - Rename any primary key constraints to follow the format PK_{table_name}_{column_name} - Rename any foreign key constraints like [TableName]${GUID} to FK_{child_table}_{parent_table} - Rename any indexes like [TableName]${GUID} to IDX_{table_name}_{column_name} - Ensure any updated foreign keys are updated elsewhere in the script - Identify which warnings flagged by the migration assistant need addressed # Summary Phase Create a summary file in markdown format with the following sections: - Summary of changes made - List of warnings addressed - List of foreign keys renamed - Any other relevant notes 🤖Bonus: Introduce Database Automation and Change Management As we now had a SQL database, we needed to consider how we would roll out changes to the database and we could introduce a formal tool to cater for this within the solution which was Liquibase. Prompt Used: # Context I want to refactor #file:db.changelog.xml. Your task is to follow the below steps to analyse it and refactor it according to the specified rules. You are allowed to create / run any python scripts or terminal commands to assist in the analysis and refactoring process. # Analysis Phase Analyse the generated changelog to identify the structure and content. Identify the tables, columns, data types, constraints, and relationships present in the database. Identify any default values, indexes, and foreign keys that need to be included in the changelog. Identify any vendor specific data types / fucntions that need to be converted to common Liquibase types. # Refactor Phase DO NOT modify the original #file:db.changelog.xml file in any way. Instead, create a new changelog file called `db.changelog-1-0.xml` to store the refactored changesets. The new file should follow the structure and conventions of Liquibase changelogs. You can fetch https://docs.liquibase.com/concepts/data-type-handling.html to get available Liquibase types and their mappings across RDBMS implementations. Copy the original changesets from the `db.changelog.xml` file into the new file Refactor the changesets according to the following rules: - The main changelog should only include child changelogs and not directly run migration operations - Child changelogs should follow the convention db.changelog-{version}.xml and start at 1-0 - Ensure data types are converted to common Liquibase data types. For example: - `nvarchar(max)` should be converted to `TEXT` - `datetime2` should be converted to `TIMESTAMP` - `bit` should be converted to `BOOLEAN` - Ensure any default values are retained but ensure that they are compatible with the liquibase data type for the column. - Use standard SQL functions like `CURRENT_TIMESTAMP` instead of vendor-specific functions. - Only use vendor specific data types or functions if they are necessary and cannot be converted to common Liquibase types. These must be documented in the changelog and summary. Ensure that the original changeset IDs are preserved for traceability. Ensure that the author of all changesets is "liquibase (generated)" # Validation Phase Validate the new changelog file against the original #file:db.changelog.xml to ensure that all changesets are correctly refactored and that the structure is maintained. Confirm no additional changesets are added that were not present in the original changelog. # Finalisation Phase Provide a summary of the changes made in the new changelog file. Document any vendor specific data types or functions that were used and why they could not be converted to common Liquibase types. Ensure the main changelog file (`db.changelog.xml`) is updated to include the new child changelog file (`db.changelog-1-0.xml`). 🤖Bonus: Synthetic Data Generation Since the legacy system lacked synthetic data for development or testing, GitHub Copilot was used to generate fake seed data. Care was taken to ensure all generated data was clearly fictional—using placeholders like ‘Fake Name’ and ‘Fake Town’—to avoid any confusion with real-world information. This step greatly improved the maintainability of the project, enabling developers to test features without handling sensitive or real data. 💫Second-Layer Migration: OpenAPI Specifications With data migration complete, the focus shifted to implementing an API-driven approach for data retrieval. Adopting modern standards, OpenAPI specifications were used to define new RESTful APIs for creating, reading, updating, and deleting data. Because these APIs mapped directly to underlying entities, GitHub Copilot efficiently generated the required endpoints and services in Node.js, utilizing a repository pattern. This approach not only provided robust APIs but also included comprehensive self-describing documentation, validation at the API boundary, automatic error handling, and safeguards against invalid data reaching business logic or database layers. 💫Third-Layer Migration: Business Logic The business logic, originally authored in VBA, was generally straightforward. GitHub Copilot translated this logic into its Node.js equivalent and created corresponding tests for each method. These tests were developed directly from the code, adding a layer of quality assurance that was absent in the original Access solution. The result was a set of domain services mirroring the functionality of their VBA predecessors, successfully completing the migration of the third layer. At this stage, the project had a new database, a fresh API tier, and updated business logic, all conforming to the latest organizational standards. The final major component was the user interface, an area where advances in GitHub Copilot’s capabilities became especially evident. 💫Fourth Layer: User Interface The modernization of the Access Forms user interface posed unique challenges. To minimize retraining requirements, the new system needed to retain as much of the original layout as possible, ensuring familiar placement of buttons, dropdowns, and other controls. At the same time, it was necessary to meet new accessibility standards and best practices. Some Access forms were complex, spanning multiple tabs and containing numerous controls. Manually describing each interface for redevelopment would have been time-consuming. Fortunately, newer versions of GitHub Copilot support image-based prompts, allowing screenshots of Access Forms to serve as context. Using these screenshots, Copilot generated Government Digital Service Views that closely mirrored the original application while incorporating required accessibility features, such as descriptive labels and field selectors. Although the automatically generated UI might not fully comply with all current accessibility standards, prompts referencing WCAG guidelines helped guide Copilot’s improvements. The generated interfaces provided a strong starting point for UX engineers to further refine accessibility and user experience to meet organizational requirements. 🤖Bonus: User Story Generation from the User Interface For organizations seeking a specification-driven development approach, GitHub Copilot can convert screenshots and business logic into user stories following the “As a … I want to … So that …” format. While not flawless, this capability is invaluable for systems lacking formal requirements, giving business analysts a foundation to build upon in future iterations. 🤖Bonus: Introducing MongoDB Towards the end of the modernization engagement, there was interest in demonstrating migration from SQL Server to MongoDB. GitHub Copilot can facilitate this migration, provided it is given adequate context. As with all NoSQL databases, the design should be based on application data access patterns—typically reading and writing related data together. Copilot’s ability to automate this process depends on a comprehensive understanding of the application’s data relationships and patterns. # Context The `<business_entity>` entity from the existing system needs to be added to the MongoDB schema. You have been provided with the following: - #file:documentation - System documentation to provide domain / business entity context - #file:db.changelog.xml - Liquibase changelog for SQL context - #file:mongo-erd.md - Contains the current Mongo schema Mermaid ERD. Create this if it does not exist. - #file:stories - Contains the user stories that will the system will be built around # Analysis Phase Analyse the available documentation and changelog to identify the structure, relationships, and business context of the `<business_entity>`. Identify: - All relevant data fields and attributes - Relationships with other entities - Any specific data types, constraints, or business rules Determine how this entity fits into the overall MongoDB schema: - Should it be a separate collection? - Should it be embedded in another document? - Should it be a reference to another collection for lookups or relationships? - Explore the benefit of denormalization for performance and business needs Consider the data access patterns and how this entity will be used in the application. # MongoDB Schema Design Using the analysis, suggest how the `<business_entity>` should be represented in MongoDB: - The name of the MongoDB collection that will represent this entity - List each field in the collection, its type, any constraints, and what it maps to in the original business context - For fields that are embedded, document the parent collection and how the fields are nested. Nested fields should follow the format `parentField->childField`. - For fields that are referenced, document the reference collection and how the lookup will be performed. - Provide any additional notes on indexing, performance considerations, or specific MongoDB features that should be used - Always use pascal case for collection names and camel case for field names # ERD Creation Create or update the Mermaid ERD in `mongo-erd.md` to include the results of your analysis. The ERD should reflect: - The new collection or embedded document structure - Any relationships with other collections/entities - The data types, constraints, and business rules that are relevant for MongoDB - Ensure the ERD is clear and follows best practices for MongoDB schema design Each entity in the ERD should have the following layout: **Entity Name**: The name of the MongoDB collection / schema **Fields**: A list of fields in the collection, including: - Field Name (in camel case) - Data Type (e.g., String, Number, Date, ObjectId) - Constraints (e.g. indexed, unique, not null, nullable) In this example, Liquibase was used as a changelog to supply the necessary context, detailing entities, columns, data types, and relationships. Based on this, Copilot could offer architectural recommendations for new document or collection types, including whether to embed documents or use separate collections with cache references for lookup data. Copilot can also generate an entity relationship diagram (ERD), allowing for review and validation before proceeding. From there, a new data access layer can be generated, configurable to switch between SQL Server and MongoDB as needed. While production environments typically standardize on a single database model, this demonstration showcased the speed and flexibility with which strategic architectural components can be introduced using GitHub Copilot. 👨💻Conclusion This modernization initiative demonstrated how strategic use of automation and best practices can transform legacy Microsoft Access solutions into scalable, maintainable architectures utilizing Node.js, SQL Server, MongoDB, and OpenAPI. By carefully planning each migration layer—from database and API specifications to business logic—the team preserved core functionality while introducing modern standards and enhanced capabilities. GitHub Copilot played a pivotal role, not only speeding up redevelopment but also improving code quality through automated documentation, test generation, and meaningful naming conventions. The result was a significant reduction in development time, with a robust, standards-compliant system delivered in just two weeks compared to an estimated six to eight months using traditional manual methods. This project serves as a blueprint for organizations seeking to modernize their Access-based applications, highlighting the efficiency gains and quality improvements that can be achieved by leveraging AI-powered tools and well-defined migration strategies. The approach ensures future scalability, easier maintenance, and alignment with contemporary enterprise requirements.1.3KViews1like2CommentsArchitecture to Resilience: A Decision Guide
Start with the framework, accelerate with the tool Watch the video walkthrough The Application Resilience Framework originated from a practical gap we saw in resilience reviews: teams had architecture diagrams, monitoring data, incident history, and runbooks, but no consistent way to connect them into a measurable resilience model. The framework is intended to close that gap by turning architecture context into a structured lifecycle for risk identification, mitigation validation, health modeling, and governance. It aligns closely with the Reliability pillar of the Azure Well-Architected Framework, especially the guidance around identifying critical flows, performing Failure Mode Analysis, defining reliability targets, and building health models. The Application Resilience Framework Tool helps teams apply this framework faster by starting with artifacts they already have, such as data flow diagrams or sequence diagrams in Mermaid or image format. The tool extracts workflows, application components, platform components, dependencies, and initial failure modes, then guides the team through the decisions needed to make resilience measurable. From those artifacts, the tool creates the first version of a resilience model by extracting workflows, application components, platform components, dependencies, and initial failure modes. It then guides the team through one import step followed by four phases: Import Artifacts -> Phase 1: Failure Mode Analysis -> Phase 2: Mitigation and Validation -> Phase 3: Health Model Mapping -> Phase 4: Operations and Governance It is not a replacement for WAF guidance or Resilience Hub style assessments. It is a practical way to operationalize those concepts at the workload and workflow level, producing prioritized risks, mitigation plans, validation paths, health signals, dashboards, reports, and governance ownership. How to use this guide This guide follows the same flow as the tool. For each step, it covers: The decision: What needs to be decided? The options: What paths are available? The guidance: When each option fits Use this with the video walkthrough. The video shows the tool in action. This guide explains the choices behind each step. Question 1: What artifact should you import first? The import step creates the starting point for the model. Regardless of the input path, the output is the same: workflows that move into Phase 1: Failure Mode Analysis. Options Import option Best for What happens Data flow diagram System, module, data movement, and dependency views If imported as an image, the tool breaks it into sequence-style flows. Selected flows become workflows. Sequence diagram Transaction flow and service interaction views Converted directly into workflows. Mermaid input Diagrams maintained as code in Mermaid format Converted directly into workflows. Image input JPG or PNG diagrams Azure Foundry Vision models interpret the image and convert it into workflows. Manual entry Missing or incomplete diagrams User creates or corrects workflows manually. When to pick which Use data flow for system and dependency views. Use sequence diagrams for transaction or interaction views. Regardless of import path, the output is the same: workflows, components, dependencies, and initial failure modes ready for Phase 1. Question 2: Which workflows should be analyzed first? Phase 1 is Failure Mode Analysis. This is where the tool identifies what can fail and how important each failure is. Options Critical user flows: Login, checkout, payment, onboarding, request processing. High-risk platform flows: Database writes, queue processing, storage access, identity, messaging, external APIs. Known issue areas: Workflows with recent incidents, recurring alerts, or customer impact. When to pick which Start where failure creates the highest customer or business impact. The goal is not to model everything at once. The goal is to model the right thing first. Deliverables Failure Mode Analysis catalog RPV risk scores Criticality classification Question 3: How should failure modes be prioritized? After workflows and components are imported, the tool helps score each failure mode using Risk Priority Value or RPV, which uses the four factors of Impact, Likelihood, Detectability and Outage severity. Options Use generated failure modes and scores: Best for a fast first pass. Tune the RPV scores with engineering input: Best when workload context matters. Add custom failure modes: Best when known risks come from incidents, reviews, or customer experience. When to pick which Use the generated model to accelerate the first pass, then adjust it with real system knowledge. The goal is not to create the longest list of risks. The goal is to identify the risks that deserve attention first. Deliverables Failure Mode Catalog RPV Risk Scores Prioritized criticality list Question 4: Are mitigations defined or validated? Phase 2 is Mitigation and Validation. This is where each failure mode gets a response plan. Options Detection only: The team can detect the failure, but the response is not defined. Defined mitigation: The response is documented, such as retry, fallback, failover, scaling, restore, or rebalance. Validated mitigation: The response has been tested through a controlled validation or chaos test. When to pick which For low-risk items, documented mitigation may be enough. For critical and high-risk items, validation is the key. A mitigation that has not been tested is still an assumption. Deliverables Mitigation playbooks Chaos test plans Support playbooks Question 5: Which risks need health signals? Phase 3 is Health Model Mapping. This is where the tool connects risks to observability. A failure mode should not just sit in a document. It should map to a signal that can show whether the system is healthy, degraded, or unhealthy. Options Map all failure modes: Best for small systems or highly critical workloads. Map critical and high-risk failure modes first: Best for large systems. Track unmapped risks as gaps: Best when observability coverage is still improving. When to pick which Start with the highest RPV items. Every critical failure mode should have at least one signal, such as a metric, log, alert, availability check, or dependency signal. Deliverables Health model Signal definitions Coverage report Bicep templates Question 6: Should the health model be exported or deployed? Once the health model is built, the next decision is how to use it. Options Export for review: Best when the team needs to validate the model first. Generate monitoring templates: Best when the team wants repeatable implementation. Deploy to Azure: Best when the model is ready to become part of operations. Use outputs in downstream tools: Best when support, SRE, or incident response workflows need structured playbooks. When to pick which Export first if the model is still being reviewed. Deploy when component relationships, signals, and coverage are accurate enough for operational use. Question 7: How will governance keep the model current? Phase 4 is Operations and Governance. This is where the resilience model becomes an ongoing practice. Options One-time assessment: Useful for quick discovery but limited long term. Recurring review: Best for production workloads that change regularly. Closed-loop governance: Best when incidents, failed validations, and monitoring gaps feed back into the model. When to pick which For production systems, use a recurring governance cadence. Assign owners, track gaps, review dashboards, and update the model as the system changes. Deliverables Governance model Dashboards Reports and exports Runbooks Putting it together: three adoption patterns Once governance is defined, the tool can be used in different ways depending on the team’s maturity and objective. The three common adoption patterns are: Pattern A: Quick resilience review Import one critical workflow Generate failure modes Review RPV scores Identify top risks Export findings Best for fast architecture reviews or early customer conversations. Pattern B: Full workload assessment Import multiple workflows Build a full Failure Mode Catalog Define mitigations and recovery steps Create chaos test plans Map risks to signals Produce coverage reports Best for structured resilience assessments. Pattern C: Operational health model Build and tune the health model Export or deploy monitoring artifacts Track risk and signal coverage Review mitigation effectiveness Assign governance ownership Feed findings back into the model Best when the goal is continuous operational improvement. A short checklist before using the tool Which workflow should we import first? Do we have a data flow diagram, sequence diagram, or Mermaid file? What components and dependencies should be included? Which failure modes matter most? How should RPV be adjusted for this workload? Do critical failure modes have mitigations? Have those mitigations been validated? Are failure modes mapped to health signals? What coverage gaps remain? Should the health model be exported or deployed? Who owns ongoing review? How often should the model be updated? Closing thought The Application Resilience Framework Tool provides a practical way to move from architecture artifacts to measurable, continuously improving resilience. It starts with data flow or sequence diagrams, builds a structured view of the system, and guides teams through the decisions that matter: what can fail, how severe it is, how it is mitigated, how it is detected, and how it is governed. Tool repo: Application Resilience Framework Tool528Views0likes0CommentsHow MS Discovery Is Empowering Scientists to Do More
Research and development has traditionally been a slow, sequential, and largely manual endeavour. Scientists formulate hypotheses, design experiments, run computations in constrained environments, and document results, each stage dependent on the last, each transition requiring human review and intervention. Knowledge is fragmented across systems, insights are bottlenecked by individual capacity, and the gap between hypothesis and actionable outcome can span weeks or months. For organisations tackling complex scientific and operational challenges, from drug discovery to industrial process optimisation, this pace of iteration is simply no longer acceptable. At Microsoft, we recently introduced Microsoft Discovery, a platform that I believe fundamentally changes this model. Much like Microsoft 365 transformed the way knowledge workers collaborate and create, Microsoft Discovery is designed to simplify and empower the way scientists and researchers work. It provides a unified, end-to-end platform that integrates advanced artificial intelligence, high-performance computing, and knowledge management to support the full scientific reasoning lifecycle: knowledge gathering, hypothesis generation, experiment design, simulation, results analysis, and documentation. In this article, I want to share how we used Microsoft Discovery to automate a real-world simulation workflow for a mining organisation and what that experience taught our team about the future of AI-augmented science. What Is Microsoft Discovery? Microsoft Discovery is Microsoft's scientific AI platform, a solution designed to accelerate research and experimentation across the full innovation lifecycle. Rather than replacing scientific judgement, Discovery is designed to amplify human expertise, embedding AI assistance at each stage of the R&D process while maintaining governance, traceability, and scientific rigour. From Traditional R&D to AI-Augmented Science To appreciate what Discovery enables, it is important to understand where it fits in. In the traditional R&D model, knowledge discovery centres on manual literature reviews and historical data analysis. Researchers individually search, read, and synthesise information which is a time-intensive process where discovery is limited by each person's capacity to locate and interpret relevant material. Hypothesis generation and experimental design are expert-led and largely manual. Computational experimentation, where it exists, runs in fixed or constrained environments with limited parallelism. Analysis and iteration follow the same sequential pattern: execute, review, document, repeat. Microsoft Discovery changes this fundamentally. In the AI-cloud-enabled model it provides: Knowledge synthesis at scale — Researchers can explore literature, historical experiments, and organisational knowledge through a single interface, with intelligent indexing surfacing insights faster than manual search could ever achieve. AI-assisted hypothesis generation — Collaborative human-and-AI workflows support hypothesis exploration and feasibility assessment, while final decisions remain with the scientist. Cloud-scale experimentation — Elastic compute and parallel processing allow simulations and experiments to run at scale, with integrated tracking and reproducibility built in. Continuous feedback and human-in-the-loop governance — Results are analysed and compared more rapidly, enabling faster iteration, with AI-generated insights reviewed and validated by researchers before action. Governed knowledge assets — Experiment lineage, outcomes, and best practices are captured as reusable, governed assets, supporting long-term organisational learning. The net effect is a transition from slow, manual, and fragmented research processes to an agile, automated, and data-driven R&D model — one that improves research efficiency, increases the return on innovation investment, and enables faster, higher-impact solutions to complex challenges. In high level, the research and deveolopment loop we discussed and how Microsoft Discovery enriches it show in the following diagram. The Real-World Problem: Screening Thousands of Molecules To bring this to life, let me walk you through a real-world use case we worked on recently. A mining organisation needed to identify the best-performing oxidant compounds for a chemical reaction central to their operations. We will be talking about only a workflow that sits squarely in the simulation phase of the scientific loop — and it is a perfect example of the kind of work that Microsoft Discovery can strongly transform. How Scientists Did It Before In the traditional process, scientists would begin by selecting candidate molecules from established molecular libraries based on characteristics identified through literature review. These libraries can contain thousands of molecules, each defined in standard molecular file formats (such as XYZ or CIF files) that describe their three-dimensional atomic structures. From there, a researcher would manually work through a multi-step pipeline: Pre-processing and preparation: The selected molecular files are processed and prepared for quantum mechanical (QM) calculations. This involves filtering molecules based on properties like the types of metals present, electron count, and atomic weight — criteria that directly affect both the scientific relevance and the computational cost of the simulations. The output is a set of prepared input files (known as GJF files) ready for simulation. Running quantum mechanical simulations: The prepared input files are submitted to a computational chemistry tool (Gaussian 16) to perform Density Functional Theory (DFT) calculations. These simulations compute the electronic structure and energy states of each molecule across different charge and multiplicity configurations. Crucially, each molecule requires multiple independent simulation runs, and the computational cost scales rapidly with molecular complexity. With thousands of candidate molecules, this step alone can involve thousands of individual simulation jobs. Collecting and post-processing results: Once all simulations complete, the output log files are collected and processed. For each molecule, the lowest-energy charge and multiplicity combination is identified, and a set of quantum mechanical descriptors and classical molecular descriptors are extracted. These descriptors are then fed into a trained machine learning model to predict the redox potential of each compound, a key metric that indicates how effectively a molecule can act as an oxidant in the target reaction. Summarisation and filtering: Finally, the predicted redox potentials and other relevant characteristics are compiled into a summary, enabling researchers to identify the most promising candidates for further investigation and experimental validation. Every step in this pipeline required manual intervention: writing and adjusting scripts, verifying input and output files, monitoring job queues, handling failures, and stitching results together. A single researcher could easily spend days or weeks moving through this process — and any error at one stage meant going back and re-running subsequent steps. How We Automated This with Microsoft Discovery Agents When we looked at this workflow through the lens of Microsoft Discovery, the opportunity was clear. The scientific reasoning, selecting which molecules to test, interpreting redox potential results, deciding what to investigate next, should remain with the researcher. But the operational overhead of preparing files, submitting simulations, monitoring jobs, collecting results, and assembling summaries? That could be orchestrated by a team of AI agents. A Team of Agents, Working Together We designed a multi-agent architecture within Microsoft Discovery to automate this simulation workflow end to end. Here is how the team of agents operates: Router Agent: The entry point. When a researcher submits a request for example, asking to run QM calculations on a set of candidate molecules the Router Agent interprets the intent and orchestrates the downstream workflow. Planner Agent: Once the Router Agent identifies the task, the Planner Agent examines the input files provided by the researcher and formulates a step-by-step execution plan. It determines what needs to happen, in what order, and with what parameters, much like a project manager scoping out a piece of work. Gaussian Prep Agent: This agent handles the preparation step. It is intelligent enough to inspect the current molecular files, apply the necessary filtering criteria, and prepare them for simulation, generating the input files that the computational chemistry tool requires. What previously involved manual scripting and file-by-file verification is now handled autonomously. We used Microsoft Discovery tools to do the underlying execution with this agent. MPI Gaussian Agent: This is where the power of cloud-scale computing comes in. The Gaussian Agent submits the prepared simulation jobs and manages their execution using an MPI-based master-worker pattern. This approach enables massive parallel execution scaling out across the cloud to run thousands of simulations concurrently rather than sequentially. Given that the candidate molecule libraries can contain thousands of entries, and each molecule may require multiple simulation runs, this parallel execution capability is transformative. What might have taken days in a constrained local environment can now complete in a fraction of the time. Redox Potential Agent: Once the simulations are complete, this agent takes over. It processes the simulation outputs, identifies the optimal charge and multiplicity state for each molecule, extracts the relevant QM and classical descriptors, and runs them through the trained machine learning model to predict redox potentials. Summariser Agent: The final agent in the chain. It maps the predicted redox potentials back to the original molecules, applies any additional filtering criteria, and produces a clean, structured summary a JSON file that the researcher can immediately use to identify the most promising candidates and take them forward into the next phase of their work. What the Researcher Experiences From the scientist's perspective, the transformation is striking. Instead of spending days writing scripts, babysitting job queues, and manually stitching results together, they provide their input files and describe what they need. The agents take it from there planning, preparing, executing, processing, and summarising and deliver a curated output ready for scientific interpretation. The researcher's time is freed to focus on what matters most: thinking critically about the science. Which molecules look most promising? What does the redox potential distribution tell us? Should we adjust the filtering criteria and run another round? These are the high-value questions that require human expertise and now scientists can spend their time on exactly that, rather than on operational mechanics. The Bigger Picture: Accelerating the Entire Scientific Loop It is important to note that this simulation workflow is just one piece of the broader scientific loop. The full cycle of scientific research, from initial knowledge gathering and literature review, through hypothesis generation, experimental design, simulation, results analysis, and documentation involves many stages, each of which can benefit from the same kind of AI-augmented approach. Microsoft Discovery is designed to support this entire cycle. In our project, we did not stop at simulation. We also explored how agents can accelerate the knowledge gathering phase, helping researchers navigate vast bodies of literature and surface relevant prior work more efficiently. We looked at how AI can assist with hypothesis generation and evaluation, helping scientists reason about which directions are most promising before committing to expensive computations. And we examined how agents can support the analysis and reporting phases comparing results against hypotheses, generating visualisations, and even assisting with drafting research documents. What excites me most about Microsoft Discovery is not any single capability, but the cumulative effect of embedding AI assistance across every stage of the research process. Each phase that gets faster and more efficient creates a multiplier effect on the phases that follow. When knowledge gathering takes hours instead of weeks, researchers generate better hypotheses sooner. When simulations run at cloud scale in parallel, results arrive faster. When analysis is augmented by AI, iteration cycles tighten. The entire loop accelerates. Conclusion The way we approach scientific research is undergoing a fundamental shift. Large language models and the AI agents built from them are not replacing scientists, they are empowering them to work at a pace and scale that was previously unimaginable. Microsoft Discovery represents a new operating model for R&D. By combining advanced AI, high-performance cloud computing, and intelligent workflow orchestration, it enables researchers to offload the repetitive, time-consuming operational work to agents and invest their expertise where it has the greatest impact: in asking better questions, interpreting complex results, and pushing the boundaries of what we know. In the use case I have shared here, a team of six AI agents automated a simulation pipeline that would have taken a single researcher days of manual work. They prepared molecular input files, scaled out thousands of quantum mechanical simulations in parallel across the cloud, processed the results, predicted redox potentials using machine learning, and delivered a structured summary all with minimal human intervention. This is just the beginning. As AI agents become more capable and the tools surrounding them more mature, the potential to accelerate discovery across every scientific domain is immense. Whether you are in materials science, pharmaceuticals, energy, agriculture, or any field where complex R&D is central to progress, Microsoft Discovery offers a platform to do more, faster, and with greater confidence. The future of science is not about working harder. It is about working smarter with AI as your partner in discovery.277Views0likes0CommentsTransforming Video Content into Structured SOPs Using Graph-based RAG
Introduction In today’s digital-first environments, a large portion of enterprise knowledge lives inside video content, training sessions, onboarding walkthroughs, and recorded operational procedures. While videos are great for learning, they are not ideal for quick reference, compliance, or repeatable processes. Converting that knowledge into structured documentation like Standard Operating Procedures (SOPs) is often manual and time-consuming. What if this process could be automated using AI? The Problem Transcripts alone don’t solve the problem. When videos are converted into text, the output typically lacks: Clear structure (sections, headings, hierarchy) Context (relationships between steps, tools, and roles) Completeness (definitions and dependencies spread across the content) This leads to a common challenge: Teams spend significant effort manually reading transcripts, interpreting context, and restructuring them into usable documentation. As seen in modern architecture challenges, manual and repetitive configurations don’t scale well and increase maintenance effort Enter Graph-based RAG (GraphRAG) GraphRAG extends traditional RAG by building a knowledge graph instead of treating content as disconnected chunks. What GraphRAG Does Extracts entities (tools, systems, roles, concepts) Maps relationships between them Groups related concepts into logical sections Preserves context across the entire document Architecture Overview Below is the high-level pipeline: Video → Transcription → Knowledge Graph → LLM Generation → Structured SOP Implementation Approach (Step-by-Step) Stage 1: Knowledge Graph Construction Convert video to transcript Split transcript into chunks Feed chunks into GraphRAG GraphRAG performs: Text Unit Extraction Entity Recognition Relationship Mapping Community Detection Result: A structured knowledge graph representation of the transcript Stage 2: Structure Extraction From the knowledge graph: Sequential Steps Preserve procedural flow from transcript order Logical Sections Derived using community detection Key Concepts Identified using graph centrality (importance via connections) This creates a framework for the SOP Stage 3: Intelligent Document Generation Using Azure OpenAI, each SOP section is generated: Section Generated From Title & Purpose High-level concepts Scope Entity boundaries Definitions Entity descriptions Responsibilities Role-based entities Procedures Sequential steps References Linked content The key advantage: LLM is grounded in graph structure not raw text Key Benefits Context Preservation - Relationships between concepts are maintained across sections. Comprehensive Coverage - Community detection ensures important topics are not missed. Reduced Hallucination - LLM generation is grounded in structured knowledge. Scalability- Works for: 30-minute tutorials, 3-hour training sessions and Enterprise knowledge bases Real-World Impact (Example) In enterprise scenarios like pharmaceutical SOP generation: Processing time: ~15–20 minutes for a multi-hour video Output quality: 8–10 structured SOP sections Consistency: Terminology and relationships preserved Coverage: Minimal missing topics Where This Approach Works Best Training videos → SOPs Meeting recordings → action summaries Technical demos → documentation Interview recordings → knowledge bases Tutorials → reference guides Key Takeaway This approach represents a shift from text processing → knowledge understanding. By combining: Knowledge graphs (structure) LLMs (language generation) We can transform raw, unstructured content into usable, enterprise-grade knowledge assets. Resources https://microsoft.github.io/graphrag/index/overview/ Final Thoughts Have you explored GraphRAG or similar approaches in your projects? What challenges did you face? How did you handle unstructured knowledge? Share your experiences — let’s learn together.231Views0likes0CommentsCentralizing Enterprise API Access for Agent-Based Architectures
Problem Statement When building AI agents or automation solutions, calling enterprise APIs directly often means configuring individual HTTP actions within each agent for every API. While this works for simple scenarios, it quickly becomes repetitive and difficult to manage as complexity grows. The challenge becomes more pronounced when a single business domain exposes multiple APIs, or when the same APIs are consumed by multiple agents. This leads to duplicated configurations, higher maintenance effort, inconsistent behavior, and increased governance and security risks. A more scalable approach is to centralize and reuse API access. By grouping APIs by business domain using an API management layer, shaping those APIs through a Model Context Protocol (MCP) server, and exposing the MCP server as a standardized tool or connector, agents can consume business capabilities in a consistent, reusable, and governable manner. This pattern not only reduces duplication and configuration overhead but also enables stronger versioning, security controls, observability, and domain‑driven ownership—making agent-based systems easier to scale and operate in enterprise environments. Designing Agent‑Ready APIs with Azure API Management, an MCP Server, and Copilot Studio As enterprises increasingly adopt AI‑powered assistants and Copilots, API design must evolve to meet the needs of intelligent agents. Traditional APIs—often designed for user interfaces or backend integrations—can expose excessive data, lack intent-level abstraction, and increase security risk when consumed directly by AI systems. This document outlines a practical, enterprise-‑ready approach to organize APIs in Azure API Management (APIM), introduce a Model Context Protocol (MCP) server to shape and control context, and integrate the solution with Microsoft Copilot Studio. The goal is to make APIs truly agent-‑ready: secure, scalable, reusable, and easy to govern. Architecture at a glance Back-end services expose domain APIs. Azure API Management (APIM) groups and governs those APIs (products, policies, authentication, throttling, versions). An MCP server calls APIM, orchestrates/filters responses, and returns concise, model-friendly outputs. Copilot Studio connects to the MCP server and invokes a small set of predictable operations to satisfy user intents. Why Traditional API Designs Fall Short for AI Agents Enterprise APIs have historically been built around CRUD operations and service-‑to-‑service integration patterns. While this works well for deterministic applications, AI agents work best with intent-driven operations and context-aware responses. When agents consume traditional APIs directly, common issues include: overly verbose payloads, multiple calls to satisfy a single user intent, and insufficient guardrails for read vs. write operations. The result can be unpredictable agent behavior that is difficult to test, validate, and govern. Structuring APIs Effectively in Azure API Management Azure API Management (APIM) is the control plane between enterprise systems and AI agents. A well-‑structured APIM instance improves security, discoverability, and governance through products, policies, subscriptions, and analytics. Key design principles for agent consumption Organize APIs by business capability (for example, Customer, Orders, Billing) rather than technical layers. Expose agent-facing APIs via dedicated APIM products to enable controlled access, throttling, versioning, and independent lifecycle management. Prefer read-only operations where possible; scope write operations narrowly and protect them with explicit checks, approvals, and least-privilege identities. Read‑only APIs should be prioritized, while action‑oriented APIs must be carefully scoped and gated. The Role of the MCP Server in Agent‑Based Architectures APIM provides governance and security, but agents also need an intent-level interface and model-friendly responses. A Model Context Protocol (MCP) server fills this gap by acting as a mediator between Copilot Studio and APIM-exposed APIs. Instead of exposing many back-end endpoints directly to the agent, the MCP server can: orchestrate multiple API calls, filter irrelevant fields, enforce business rules, enrich results with additional context, and emit concise, predictable JSON outputs. This makes agent behavior more reliable and easier to validate. Instead of exposing multiple backend APIs directly to the agent, the MCP server aggregates responses, filters irrelevant data, enriches results with business context, and formats responses into LLM‑friendly schemas. By introducing this abstraction layer, Copilot interactions become simpler, safer, and more deterministic. The agent interacts with a small number of well‑defined MCP operations that encapsulate enterprise logic without exposing internal complexity. Designing an Effective MCP Server An MCP server should have a focused responsibility: shaping context for AI models. It should not replace core back-end services; it should adapt enterprise capabilities for agent consumption. What MCP should do An MCP server should be designed with a clear and focused responsibility: shaping context for AI models. Its primary role is not to replace backend services, but to adapt enterprise data for intelligent consumption. MCP does not orchestrate enterprise workflows or apply business logic. It standardizes how agents discover and invoke external tools and APIs by exposing them through a structured protocol interface. Orchestration, intent resolution, and policy-driven execution are handled by the agent runtime or host framework. It is equally important to understand what does not belong in MCP. Complex transactional workflows, long‑running processes, and UI‑specific formatting should remain in backend systems. Keeping MCP lightweight ensures scalability and easier maintenance. Call APIM-managed APIs and orchestrate multi-step retrieval when needed. Apply security checks and business rules consistently. Filter and minimize payloads (return only fields needed for the intent). Normalize and reshape responses into stable, predictable JSON schemas. Handle errors and edge cases with safe, descriptive messages. What MCP should not do Avoid implementing complex transactional workflows, long-running processes, or UI-specific formatting in MCP. Keep it lightweight so it remains scalable, testable, and easy to maintain. Step by step guide 1) Create an MCP server in Azure API Management (APIM) Open the Azure portal (portal.azure.com). Go to your API Management instance. In the left navigation, expand APIs. Create (or select) an API group for the business domain you want to expose (for example, Orders or Customers). Add the relevant APIs/operations to that API group. Create or select an APIM product dedicated for agent usage, and ensure the product requires a subscription (subscription key). Create an MCP server in APIM and map it to the API (or API group) you want to expose as MCP operations. In the MCP server settings, ensure Subscription key required is enabled. From the product’s Subscriptions page, copy the subscription key you will use in Copilot Studio. Screenshot placeholders: APIM API group, product configuration, MCP server mapping, subscription settings, subscription key location. * Note: Using an API Management subscription key to access MCP operations is one supported way to authenticate and consume enterprise APIs. However, this approach is best suited for initial setups, demos, or scenarios where key-based access is explicitly required. For production‑grade enterprise solutions, Microsoft recommends using managed identity–based access control. Managed identities for Azure resources eliminate the need to manage secrets such as subscription keys or client secrets, integrate natively with Microsoft Entra ID, and support fine‑grained role‑based access control (RBAC). This approach improves security posture while significantly reducing operational and governance overhead for agent and service‑to‑service integrations. Wherever possible, agents and MCP servers should authenticate using managed identities to ensure secure, scalable, and compliant access to enterprise APIs. 2) Create a Copilot Studio agent and connect to the APIM MCP server using a subscription key Copilot Studio natively supports Model Context Protocol (MCP) servers as tools. When an agent is connected to an MCP server, the tool metadata—including operation names, inputs, and outputs—is automatically discovered and kept in sync, reducing manual configuration and maintenance overhead. Sign in to Copilot Studio. Create a new agent and add clear instructions describing when to use the MCP tool and how to present results (for example, concise summaries plus key fields). Open Tools > Add tool > Model Context Protocol, then choose Create. Enter the MCP server details: Server endpoint URL: copy this from your MCP server in APIM. Authentication: select API Key. Header name: use the subscription key header required by your APIM configuration. Select Create new connection, paste the APIM subscription key, and save. Test the tool in the agent by prompting for a domain-specific task (for example, “Get order status for 12345”). Validate that responses are concise and that errors are handled safely. Screenshot placeholders: MCP tool creation screen, endpoint + auth configuration, connection creation, test prompt and response. Operational best practices and guardrails Least privilege by default: create separate APIM products and identities for agent scenarios; avoid broad access to internal APIs. Prefer intent-level operations: expose fewer, higher-level MCP operations instead of many low-level endpoints. Protect write operations: require explicit parameters, validation, and (when appropriate) approval flows; keep “read” and “write” tools separate. Stable schemas: return predictable JSON shapes and limit optional fields to reduce prompt brittleness. Observability: log MCP requests/responses (with sensitive fields redacted), monitor APIM analytics, and set alerts for failures and throttling. Versioning: version MCP operations and APIM APIs; deprecate safely. Security hygiene: treat subscription keys as secrets, rotate regularly, and avoid exposing them in prompts or logs. Summary As organizations scale agent‑based and Copilot‑driven solutions, directly exposing enterprise APIs to AI agents quickly becomes complex and risky. Centralizing API access through Azure API Management, shaping agent‑ready context via a Model Context Protocol (MCP) server, and consuming those capabilities through Copilot Studio establishes a clean and governable architecture. This pattern reduces duplication, enforces consistent security controls, and enables intent‑driven API consumption without exposing unnecessary backend complexity. By combining domain‑aligned API products, lightweight MCP operations, and least‑privilege identity‑based access, enterprises can confidently scale AI agents while maintaining strong governance, observability, and operational control. References Azure API Management (APIM) – Overview Azure API Management – Key Concepts Azure MCP Server Documentation (Model Context Protocol) Extend your agent with Model Context Protocol Managed identities for Azure resources – Overview447Views0likes0CommentsEnabling Agentic Data Governance with Hybrid Cloud Flexibility in Azure
The “Why” Do you manage data in a complex multi-cloud environment? Are you struggling with data silos, evolving regulations, and the pressure to maintain control and compliance across on-prem and multiple clouds? Do you ever wish an intelligent assistant could help shoulder the load of data governance? If so, I can relate. Let me tell you a story that might sound familiar. Meet Mark (pictured above). He is a data governance officer at Contoso (a fictional but very representative enterprise). Mark’s day job is ensuring data governance and compliance across his company’s vast hybrid cloud estate – think around ~2 million data assets sprawled across 12+ datacenters on-premises and in different public clouds. Regulatory requirements are constantly shifting. Customer data is increasingly sensitive. Each department and region has its own way of doing things. Mark is fighting an uphill battle with data silos and disconnected cloud operations. He bounces between a patchwork of tools – spreadsheets, cloud consoles, governance portals – trying to answer basic questions: Where is our data? Who’s using it? Are we in compliance? Armed with an old desk calculator and a pile of paper-based reports (a perfect 1990s backdrop), he is dealing with the data around him that has exploded in volume and complexity. What if Mark had a single pane of glass. The glass that reflects and acts. It reflects your governance state and enforces compliance – a self-hydrating pane of glass accompanied by a conversational AI. And he’s not alone. We’re all living in a data overload era. Every day, organizations generate and ingest more information than ever before. Transistors and mainframes gave way to the internet boom of the ’90s, then an explosion of mobile devices in the 2000s, social media in the 2010s, and now widespread cloud computing – all funneling data into our systems at an exponential rate. On top of that, a new wave of AI and conversational interfaces has arrived here in the mid-2020s, making data more accessible but also increasing expectations for real-time insight. It’s no wonder modern IT leaders feel overwhelmed. But these challenges are also opportunities. The way I see it, the incredible growth of data and cloud capabilities means we have a chance to reimagine data governance. The fact that I’m writing about this right now is no coincidence. My customers are looking to resolve problems in this space. In my conversations with them, I hear the same needs: We want better governance, more visibility, streamlined oversight… and cherry on top, we want it in an “agentic” fashion. In other words, they want to delegate the grunt work to the platform toolset augmented by AI, so they can focus on higher-value tasks. The “What” That vision – agentic data governance with hybrid cloud flexibility – became the driver for this work. This is a modular solution, and you have these building block style components (cloud services, governance tools, AI agents), which you can snap them together into an intended solution. Think of it as a jumpstart kit for continuous data governance across multiple clouds, with autonomous (“agentic”) assistance baked in that you can leverage and build upon. It’s not the final, productized solution – more a vision of what’s possible. Contoso’s Requirements These are the high-level requirements from Contoso: Data governance across clouds under one roof A single pane of glass dashboard consolidating reporting on the 5 governance domains: o Visibility on data residency and lineage o PII (Personally Identifiable Information) must run on a CC (Confidential Compute) o Security software (Defender) compliance o Resource tagging compliance (foundational for a good governance posture) o OS updates compliance Ability to enforce compliance in an agentic manner with a human in the loop Agentic enforcement of compliance pertaining to residency and confidential compute Solution – The breakdown The solution is comprised of 8 modules addressing these requirements. These solution modules are: Foundational (Landing zones, Data Sources, Operational setup, Policies, etc.) Dashboard Hydration + Agentic Reporting – Residency Compliance Dashboard Hydration + Agentic Reporting – Confidential Compute for PII Compliance Dashboard Hydration + Agentic Reporting – MS Defender Compliance Dashboard Hydration + Agentic Reporting – Resource Tag Compliance Dashboard Hydration + Agentic Reporting – OS Updates/Patch Compliance Enforce Compliance via Copilot Agent - Residency Compliance Enforce Compliance via Copilot Agent – CC PII Compliance Solution – The architecture view These are the main technical components that make up the solution architecture: Data sources of all shapes and sizes on the left, governed by the native Azure or the Arc plane. Additional Azure services across the bottom layer for the foundational governance posture Microsoft Purview, in the top middle, as the unified data governance platform Microsoft Fabric, in the bottom middle, as the end-to-end ingestion and analytics platform Microsoft Power Platform, on the right, as the low code/no code business flow and the copilot agent experience Solution – The end user view So how does Mark see this solution as a data governance officer? He doesn’t see all the intricacies of the solution integration and the logic execution. He sees two things: A Power BI dashboard running on Microsoft Fabric with A compliance dashboard with an overall score in each of the five compliance domains alongside scores for each of the data products across these domains Additional reporting views for more granular reporting Fabric-based pipeline that hydrates the underlying semantic models from various sources to keep the reports fresh and current A Copilot agent (in Teams) for both: Reporting on all compliance domains Enforcing in-scope compliance across selected domains The agent takes care of it - queries Fabric’s semantic model, calls Azure Function endpoints, updates Purview glossary terms, applies Azure tags, and sends Teams notifications. The “How” – Residency Compliance Let’s pick a few modules to walk through how these solution modules work together to give a cohesive agentic governance experience to Mark. It’s Monday morning, and Mark logs into the Contoso governance portal with a cup of coffee in hand. Instead of a dozen browser tabs, he has two main tools opened: the Data Governance Dashboard and the Contoso Governance Copilot agent. To address some inquiries that came as an assigned action to him, he interacted with the agent. During this interaction, not only did he validate if there were any residency missing in the unified data governance platform (Purview), but he was also able to address a mismatch between Purview and Azure resource, based on the designed principles. Here is the snippet of the chat: Now, under the hood, several components have worked on behalf of the agent in performing this governance checking and applying the necessary course of action: Even before Mark's conversation with the agent, an ongoing hydration process keeps the Fabric Power BI dashboard up to date. Dashboard Hydration + Agentic Reporting – Residency Compliance A Fabric notebook runs the residency scorecard code block through a pipeline. It reads two Lakehouse tables containing latest residency information from Purview and the approved region list Then, the notebook gets a Microsoft Entra bearer token Once acquired, the notebook then calls an Azure Function endpoint This endpoint, then searches for the Azure resources associated with the data products in Purview using an Azure resource tag. The notebook then compares the declared Purview residency with the approved region list and the associated resource’s region The notebook then calculates the final 0 / 25 / 50 / 75 / 100 residency compliance score and a reason. For example: A data product without an associated Azure resource gets a 0, while a data product whose residency in Purview is an approved region by Contoso, and also matches with the associated Azure resource, gets a 100. It then writes the results to the relevant residency compliance Lakehouse tables The dedicated compliance table then feeds to the semantic model for reporting The compliance Power BI dashboard is hydrated Enforce Compliance via Copilot Agent - Residency Compliance With the dashboard data regularly updated, the agent follows this logic, the updated reporting data, and the actions at its disposal, during the earlier conversation with Mark : Mark initiates the conversation with the agent The agent calls a Power Automate flow This flow retrieves Purview’s residency information stored in the Fabric semantic model 5, 6, 7 and 8. When Mark asks to investigate further on a data product, the agent carries the conversation using a topic, which then leverages a flow, which uses a Power Automate custom connector to access an Azure Function endpoint. This endpoint then retrieves latest glossary (residency) information about the data product in question, from Purview, and provides a preview back to the user 10, 11, 12, and 13. If the update criteria are met, and if there is no conflict, and with Mark’s blessings, the topic then calls another flow to access the Functions Purview Update endpoint, and make the glossary (residency) update in Purview for that data product The “How” – Confidential Compute for PII Compliance Dashboard Hydration + Agentic Reporting – Confidential Compute for PII Compliance The following snippet shows how Mark addresses the compliance risk with a critical data product (application), S/4 HANA, and performed the necessary compliance actions, such as tagging the associated resources and notifying the data product owners via Teams channel. The following diagram shows the under-the-hood hydration flow for confidential compute compliance: Enforce Compliance via Copilot Agent – CC PII Compliance Finally, the diagram below shows how Mark’s conversation flows through the main solution components: Outcome Stepping back, what did we accomplish for Mark and Contoso? We turned an onslaught of governance challenges into an opportunity to modernize how data is managed. This gave Mark: Centralized Visibility into data assets across the landscape through Purview and a unified dashboard Proactive compliance enabled with automated checks - controlled with Purview exports and Fabric pipeline schedules And compliance enforcement using an agent Hybrid Cloud Consistency. By using Azure Arc and a foundational data plane management setup Reduced Operational overhead with agentic reporting and compliance Though the solution is comprised of wide variety of components/services, it is built from standard building blocks and is relatively simple to implement. In total, the solution combined around a dozen Azure services and over 40 distinct components (from Purview catalogs to data pipelines, to custom functions and flows). You can choose to implement some or all the compliance domains. Or, better yet, build upon and create new domains and pave new paths. Wrap-up I believe many enterprises could take a similar journey. If you’re facing these issues, consider this an invitation to think differently about data governance. Start with the pieces you already have – your own building blocks of cloud services and data – and imagine what you could build. Chances are that a lot of the heavy lifting can be orchestrated with today’s technology. And with the rise of AI copilots, the dream of agentic data governance – where your policies are continuously enforced by smart agents – is no longer science fiction. It’s here, right now, waiting for you to take it for a spin. Next steps Watch the video narrative on SAP on Azure YouTube channel: Build it with the GitHub Repository: https://github.com/moazmirza/data-sov-and-hyb-cloud Comments/questions: Here, or @ LinkedIn /moazmirza Solution Selfies Azure Policy Compliance - Foundational Governance Posture Purview Data Product Catalog and Data Lineage Purview Governance Metadata à Fabric Lakehouse Fabric Semantic Model Additional Fabric Power BI Dashboard Copilot Studio Topic Flow Azure Function Endpoints351Views0likes0Comments