best practices
1672 TopicsSunderland City Profile: Frontier transformation in practice
Download the SmartCitiesWorld City Profile – Sunderland Cities everywhere are facing the same pressure: modernize infrastructure, grow the economy, and improve quality of life, without widening inequality. Sunderland offers a credible path forward. Once defined by shipbuilding and coal mining, Sunderland has spent the last four decades deliberately reinventing itself. Today, it is positioning itself as the UK’s leading smart city by investing in digital infrastructure, data, and low‑carbon innovation to drive inclusive, long‑term growth. The latest City Profile from SmartCitiesWorld captures how this strategy is being executed and why it matters for city leaders globally. A digital backbone built for outcomes, not optics Sunderland’s progress starts with a clear foundation: connectivity and data designed with purpose. Full‑fibre connectivity across the city Citywide 5G and LoRaWAN coverage A secure, cloud‑based smart city data platform Together, this stack enables real‑time visibility across transport, environment, and public services. More importantly, it shifts the city from reactive decision‑making to proactive, evidence‑led operations. The impact is measurable. Data and analytics now support: Safer, more predictable event planning Smarter traffic and mobility management Earlier environmental interventions More targeted social and health services From digital health hubs that reduce exclusion to intelligent transport pilots that cut emissions and improve safety, Sunderland is applying technology where it delivers the highest public value—not where it looks most impressive on a slide. What comes next: two opportunities to scale impact The City Profile also highlights where cities like Sunderland can go further. Two opportunities stand out. Move from smart services to predictive city operations With real‑time data already in place, the next step is predictive modeling—anticipating demand across social care, transport, energy, and public safety before pressure points emerge. Done right, this enables earlier investment decisions, lower long‑term costs, and better outcomes across services. Turn digital inclusion into a workforce engine Sunderland’s digital health hubs create a foundation for something bigger: linking access and digital skills directly to workforce development. By aligning inclusion efforts with local demand in advanced manufacturing, data, and clean energy, cities can convert access into sustained economic mobility. Why Sunderland’s approach matters Sunderland’s experience reinforces a critical point: smart city transformation is not about technology in isolation. It is about aligning infrastructure, data, governance, and community priorities around a shared vision for inclusive growth. For public‑sector leaders moving from ambition to execution, the full City Profile provides practical insight into the partnerships, operating models, and decisions behind Sunderland’s approach. It’s a useful reference for anyone looking to translate a digital‑first strategy into measurable impact—for people, place, and long‑term resilience.24Views0likes0CommentsUsing Azure API Management with Azure Front Door for Global, Multi‑Region Architectures
Modern API‑driven applications demand global reach, high availability, and predictable latency. Azure provides two complementary services that help achieve this: Azure API Management (APIM) as the API gateway and Azure Front Door (AFD) as the global entry point and load balancer. Going over the available documentation available, my team and I found this article on how to front a single-region APIM with an Azure Front Door , but we wanted to extend this to a multi-region APIM as well. That led us to design the solution detailed in this article which explains how to configure multi‑regional, active‑active APIM behind Azure Front Door using Custom origins and regional gateway endpoints. (I have also covered topics like why organizations commonly pair APIM with Front Door, when to use internal vs. external APIM modes, etc. but main topic first! Scroll down to the bottom for more info). Configuring Multi‑Regional APIM with Azure Front Door WHAT TO KNOW: If using APIM Premium with multi‑region gateways, each region exposes its own regional gateway endpoint, formatted as: https://<service-name>-<region>-01.regional.azure-api.net Examples: https://mydemo-apim-westeurope-01.regional.azure-api.net https://mydemo-apim-eastus-01.regional.azure-api.net where 'mydemo' is the name of the APIM instance. You will use these regional endpoints and configure them as a separate origin in Azure Front Door—using the Custom origin type. Solution Architecture Azure Front Door Configuration Steps 1. Create an Origin Group Inside your Front Door profile, define a group (Settings -> Origin Groups - > Add -> Add an origin) that will contain all APIM regional gateways. See images below: 2. Add Each APIM Region as a Custom Origin Use the Custom origin type: Origin type: Custom Host name: Use the APIM regional endpoint Example: mydemo-apim-westeurope-01.regional.azure-api.net Origin host header: Same as the host name. Enable certificate subject name validation (Recommended when private link or TLS integrity is required.) Priority: Lower value = higher priority (for failover). Weight: Controls how traffic is distributed across equally prioritized origins. Status: Enable origin. And repeat the same steps for additional APIM regions giving them priority and weightage as you feel appropriate. How to Know Which Region is being Invoked To test this setup, create 2 Virtual Machines (VMs) in Azure - one for each region. For this guide, we chose to create the VMs in West Europe and East US. Open up a Command Prompt from the VM and do a curl on the sample Echo API that comes with every new APIM deployment: Example: curl -v "afd-blah.b01.azurefd.net/echo/resource?param1=sample" Your results should show the region being hit as shown below: How AFD Routes Traffic Across Multiple APIM Regions AFD evaluates origins in this order: Available instances — the Health Probe removes unhealthy origins Priority — selects highest‑priority available origins Latency — optionally selects lowest‑latency pool Weight — round‑robin distribution across selected origins Example When origins are configured as below: West Europe (priority 1, weight 1000) East US (priority 1, weight 500) Central US (priority 2, weight 1000) AFD will: Use West Europe + East US in a 1000:500 ratio. Only use Central US if both West Europe & East US become unavailable. For more information on this nice algorithm, see here: Traffic routing methods to origin - Azure Front Door | Microsoft Learn More Info (as promised) Why Use Azure API Management? Azure API Management is a fully managed service providing: 1. Centralized API Gateway Enforces policies such as authentication, rate limiting, transformations, and caching. Acts as a single façade for backend services, enabling modernization without breaking existing clients. 2. Security & Governance Integrates with Azure AD, OAuth2, and mTLS (mutual TLS). Provides threat protection and schema validation. 3. Developer Ecosystem Developer portal, API documentation, testing console, versioning, and releases. 4. Multi‑Region Gateways (Premium Tier) Allows deployment of additional regional gateways for active‑active, low‑latency global experiences. APIM Deployment Modes: Internal vs. External External Mode The APIM gateway is reachable publicly over the internet. Common when: Exposing APIs to partners, mobile apps, or public clients. You can easily front this with an Azure Front Door for reasons listed in the next section. Internal Mode APIM gateway is deployed inside a VNet, accessible only privately. Used when: APIs must stay private to an enterprise network. Only internal consumers/VPN/VNet peered systems need access. To make your APIM publicly accessible, you need to front it with both an Application Gateway and an Azure Front Door because: Azure Front Door (AFD) cannot directly reach an internal‑mode APIM because AFD requires a publicly routable origin. Application Gateway is a Layer‑7 reverse proxy that can expose a public frontend while still reaching internal private backends (like APIM gateway). [Ref] But Why Put Azure Front Door in Front of API Management? Azure Front Door provides capabilities that APIM alone does not offer: 1. Global Load Balancing As discussed above. 2. Edge Security Web Application Firewall, TLS termination at the edge, DDoS absorption. Reduces load on API gateways. 3. Faster Global Performance Anycast network and global POPs reduce round‑trip latency before requests hit APIM. A POP (Point of Presence) is an Azure Front Door edge location—a physical site in Microsoft’s global network where incoming user traffic first lands. Azure Front Door uses numerous global and local POPs strategically placed close to end‑users (both enterprise and consumer) to improve performance. Anycast is a networking protocol Azure Front Door uses to improve global connectivity. Ref: Traffic acceleration - Azure Front Door | Microsoft Learn 4. Unified Global Endpoint A single public endpoint (e.g., https://api.contoso.com) that intelligently distributes traffic across multiple APIM regions. With all of the above features, it is best to pair API Management with a Front Door, especially when dealing with multi-region architectures. Credits: Junee Singh, Senior Solution Engineer at Microsoft Isiah Hudson, Senior Solution Engineer at MicrosoftWriting Effective Prompts for Testing Scenarios: AI Assisted Quality Engineering
AI-assisted testing is no longer an experiment confined to innovation labs. Across enterprises, quality engineering teams are actively shifting from manual-heavy testing approaches to AI-first QA, where tools like GitHub Copilot participate throughout the SDLC—from requirement analysis to regression triage. Yet, despite widespread adoption, most teams are only scratching the surface. They use AI to “generate test cases” or “write automation,” but struggle with inconsistent outputs, shallow coverage, and trust issues. The root cause is rarely the model, it’s prompt design. This blog moves past basic prompting tips to cover QA practices, focusing on effective prompt design and common pitfalls. It notes that adopting AI in testing is a gradual process of ongoing transformation rather than a quick productivity gain. Why Effective Prompting Is Necessary in Testing At its core, testing is about asking the right questions of a system. When AI enters the picture, prompts become the mechanism through which those questions are asked. A vague or incomplete prompt is no different from an ambiguous test requirement—it leads to weak coverage and unreliable results. Poorly written prompts often result in generic or shallow test cases, incomplete UI or API coverage, incorrect automation logic, or superficial regression analysis. This increases rework and reduces trust in AI-generated outputs. In contrast, well-crafted prompts dramatically improve outcomes. They help expand UI and API test coverage, accelerate automation development, and enable faster interpretation of regression results. More importantly, they allow testers to focus on risk analysis and quality decisions instead of repetitive tasks. In this sense, effective prompting doesn’t replace testing skills—it amplifies them. Industry Shift: Manual QA to AI-First Testing Lifecycle Modern QA organizations are undergoing three noticeable shifts. First, there is a clear move away from manual test authoring toward AI-augmented test design. Testers increasingly rely on AI to generate baseline coverage, allowing them to focus on risk analysis, edge cases, and system behavior rather than repetitive documentation. Second, enterprises are adopting agent-based and MCP-backed testing, where AI systems are no longer isolated prompt responders. They operate with access to application context—OpenAPI specs, UI flows, historical regressions, and even production telemetry—making outputs significantly more accurate and actionable. Third, teams are seeing tangible SDLC impact. Internally reported metrics across multiple organizations show faster test creation, reduced regression cycle time, and earlier defect detection when Copilot-style tools are used correctly. The key phrase here is correct. Poor prompt neutralizes these benefits almost immediately. Prerequisites GitHub Copilot access in a supported IDE (VS Code, JetBrains, Visual Studio) An appropriate model (advanced reasoning models for workflows and analysis) Basic testing fundamentals (AI amplifies skill; it does not replace it) (Optional but powerful) Context providers / MCP servers for specs, docs, and reports Prompting - A Designing skill with Examples Most testers treat prompts as instructions. Mature teams treat them as design artifacts. Effective prompts should be intentional, layered, and defensive. They should not just ask for output, but control how the AI reasons, what assumptions it can make, and how uncertainty is handled. Pattern 1: Role-Based Prompting Assigning a role fundamentally changes the AI’s reasoning depth. Instead of: “Generate test cases for login.” Use: This pattern consistently results in better prioritization, stronger negative scenarios, and fewer superficial cases. Pattern 2: Few-Shot Prompting with Test Examples AI aligns faster when shown what “good” looks like. Providing even a single example test case or automation snippet dramatically improves consistency in AI-generated outputs, especially when multiple teams are involved. Concrete examples help align the AI with expected automation structure, enforce naming conventions, influence the depth and quality of assertions, and standardize reporting formats. By showing what “good” looks like, teams reduce variation, improve maintainability, and make AI-generated assets far easier to review and extend. Pattern 3: Provide Rich Context and Clear Instructions Copilot works best when it understands the surrounding context of what you are testing. The richer the context, the higher the quality of the output—whether you are generating manual test cases, automation scripts, or regression insights. When writing prompts clearly describe the application type (web, mobile, UI, API), the business domain, the feature or workflow under test, and the relevant user roles or API consumers. Business rules, constraints, assumptions, and exclusions should also be explicitly stated. Where possible, include structured instructions in an Instructions .md file and pass it as context to the Copilot agent. You can also attach supporting assets—such as Swagger screenshots or UI flow diagrams—to further ground the AI’s understanding. The result is more concise, accurate output that aligns closely with your system’s real behavior and constraints. Below is an example of how rich context can aid in efficient output Below example shows how to give clear instructions to GHCP that helps AI to handle the uncertainty and exceptions to adhere Prompt Anti-Patterns to Avoid Most AI failures in QA are self-inflicted. The following anti-patterns show up repeatedly in enterprise teams. Overloaded prompts that request UI tests, API tests, automation, and analysis in one step Natural language overuse where structured output (tables, JSON, code templates) is required Automation prompts without environment details (browser, framework, auth, data) Contradictory instructions, such as asking for “detailed coverage” and “keep it minimal” simultaneously The AI-Assisted QA Maturity Model Prompting is not a one-time tactic—it is a capability that matures over time. The levels below represent how increasing sophistication in prompt design directly leads to more advanced, reliable, and impactful testing outcomes. Level 1 – Prompt-Based Test Generation AI is primarily used to generate manual test cases, scenarios, and edge cases from requirements or user stories. This level improves test coverage and speeds up test design but still relies heavily on human judgment for validation, prioritization, and execution. Level 2 – AI-Assisted Automation AI moves beyond documentation and actively supports automation by generating framework-aligned scripts, page objects, and assertions. Testers guide the AI with clear constraints and patterns, resulting in faster automation development while retaining full human control over architecture and execution. Level 3 – AI-Led Regression Analysis At this stage, AI assists in analyzing regression results by clustering failures, identifying recurring patterns, and suggesting likely root causes. Testers shift from manually triaging failures to validating AI-generated insights, significantly reducing regression cycle time. Level 4 – MCP-Integrated, Agentic Testing AI operates with deep system context through MCP servers, accessing specifications, historical test data, and execution results. It can independently generate, refine, and adapt tests based on system changes, enabling semi-autonomous, context-aware quality engineering with human oversight. Best Practices for Prompt-Based Testing Prioritize context over brevity Treat prompts as test specifications Iterate instead of rewriting from scratch Experiment with models when outputs miss intent Always validate AI-generated automation and analysis Maintain reusable prompt templates for UI testing, API testing, automation, and regression analysis Final Thoughts: Prompting as a Core QA Capability Effective prompt improves coverage, accelerates delivery, and elevates QA from execution to engineering. It turns Copilot from a code generator into a quality partner. The next use case in line is going beyond functional flows and understanding how AI prompting can aid for – Automation framework enhancements, Performance testing prompts, Accessibility testing prompts, Data quality testing prompts. Stay tuned for upcoming blogs!!Optimising AI Costs with Microsoft Foundry Model Router
Microsoft Foundry Model Router analyses each prompt in real-time and forwards it to the most appropriate LLM from a pool of underlying models. Simple requests go to fast, cheap models; complex requests go to premium ones, all automatically. I built an interactive demo app so you can see the routing decisions, measure latencies, and compare costs yourself. This post walks through how it works, what we measured, and when it makes sense to use. The Problem: One Model for Everything Is Wasteful Traditional deployments force a single choice: Strategy Upside Downside Use a small model Fast, cheap Struggles with complex tasks Use a large model Handles everything Overpay for simple tasks Build your own router Full control Maintenance burden; hard to optimise Most production workloads are mixed-complexity. Classification, FAQ look-ups, and data extraction sit alongside code analysis, multi-constraint planning, and long-document summarisation. Paying premium-model prices for the simple 40% is money left on the table. The Solution: Model Router Model Router is a trained language model deployed as a single Azure endpoint. For each incoming request it: Analyses the prompt — complexity, task type, context length Selects an underlying model from the routing pool Forwards the request and returns the response Exposes the choice via the response.model field You interact with one deployment. No if/else routing logic in your code. Routing Modes Mode Goal Trade-off Balanced (default) Best cost-quality ratio General-purpose Cost Minimise spend May use smaller models more aggressively Quality Maximise accuracy Higher cost for complex tasks Modes are configured in the Foundry Portal, no code change needed to switch. Building the Demo To make routing decisions tangible, we built a React + TypeScript app that sends the same prompt through both Model Router and a fixed standard deployment (e.g. GPT-5-nano), then compares: Which model the router selected Latency (ms) Token usage (prompt + completion) Estimated cost (based on per-model pricing) Select a prompt, choose a routing mode, and hit Run Both to compare side-by-side What You Can Do 10 pre-built prompts spanning simple classification to complex multi-constraint planning Custom prompt input enter any text and benchmarks run automatically Three routing modes switch and re-run to see how distribution changes Batch mode run all 10 prompts in one click to gather aggregate stats API Integration The integration is a standard Azure OpenAI chat completion call. The only difference is the deployment name ( model-router instead of a specific model): const response = await fetch( `${endpoint}/openai/deployments/model-router/chat/completions?api-version=2024-10-21`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ messages: [{ role: 'user', content: prompt }], max_completion_tokens: 1024, }), } ); const data = await response.json(); // The key insight: response.model reveals the underlying model const selectedModel = data.model; // e.g. "gpt-5-nano-2025-08-07" That data.model field is what makes cost tracking and distribution analysis possible. Results: What the Data Shows We ran all 10 prompts through both Model Router (Balanced mode) and a fixed standard deployment. Note: Results vary by run, region, model versions, and Azure load. These numbers are from a representative sample run. Side-by-side comparison across all 10 prompts in Balanced mode Summary Metric Router (Balanced) Standard (GPT-5-nano) Avg Latency ~7,800 ms ~7,700 ms Total Cost (10 prompts) ~$0.029 ~$0.030 Cost Savings ~4.5% — Models Used 4 1 Model Distribution The router used 4 different models across 10 prompts: Model Requests Share Typical Use gpt-5-nano 5 50% Classification, summarisation, planning gpt-5-mini 2 20% FAQ answers, data extraction gpt-oss-120b 2 20% Long-context analysis, creative tasks gpt-4.1-mini 1 10% Complex debugging & reasoning Routing distribution chart — the router favours efficient models for simple prompts Across All Three Modes Metric Balanced Cost-Optimised Quality-Optimised Cost Savings ~4.5% ~4.7% ~14.2% Avg Latency (Router) ~7,800 ms ~7,800 ms ~6,800 ms Avg Latency (Standard) ~7,700 ms ~7,300 ms ~8,300 ms Primary Goal Balance cost + quality Minimise spend Maximise accuracy Model Selection Mixed (4 models) Prefers cheaper Prefers premium Cost-optimised mode — routes more aggressively to nano/mini models Quality-optimised mode — routes to larger models for complex tasks Analysis What Worked Well Intelligent distribution The router didn't just default to one model. It used 4 different models and mapped prompt complexity to model capability: simple classification → nano, FAQ answers → mini, long-context documents → oss-120b, complex debugging → 4.1-mini. Measurable cost savings across all modes 4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode. Quality mode was the surprise winner by choosing faster, cheaper models for simple prompts, it actually saved the most while still routing complex requests to capable models. Zero routing logic in application code One endpoint, one deployment name. The complexity lives in Azure's infrastructure, not yours. Operational flexibility Switch between Balanced, Cost, and Quality modes in the Foundry Portal without redeploying your app. Need to cut costs for a high-traffic period? Switch to Cost mode. Need accuracy for a compliance run? Switch to Quality. Future-proofing As Azure adds new models to the routing pool, your deployment benefits automatically. No code changes needed. Trade-offs to Consider Latency is comparable, not always faster In Balanced mode, Router averaged ~7,800 ms vs Standard's ~7,700 ms nearly identical. In Quality mode, the Router was actually faster (~6,800 ms vs ~8,300 ms) because it chose more efficient models for simple prompts. The delta depends on which models the router selects. Savings scale with workload diversity Our 10-prompt test set showed 4.5–14.2% savings. Production workloads with a wider spread of simple vs complex prompts should see larger savings, since the router has more opportunity to route simple requests to cheaper models. Opaque routing decisions You can see which model was picked via response.model , but you can't see why. For most applications this is fine; for debugging edge cases you may want to test specific prompts in the demo first. Custom Prompt Testing One of the most practical features of the demo is testing your own prompts before committing to Model Router in production. Enter any prompt `the quantum computing example is a medium-complexity educational prompt` Benchmarks execute automatically, showing the selected model, latency, tokens, and cost Workflow: Click ✏️ Custom in the prompt selector Enter your production-representative prompt Click ✓ Use This Prompt — Router and Standard run automatically Compare results — repeat with different routing modes Use the data to inform your deployment strategy This lets you predict costs and validate routing behaviour with your actual workload before going to production. When to Use Model Router Great Fit Mixed-complexity workloads — chatbots, customer service, content pipelines Cost-sensitive deployments — where even single-digit percentage savings matter at scale Teams wanting simplicity — one endpoint beats managing multi-model routing logic Rapid experimentation — try new models without changing application code Consider Carefully Ultra-low-latency requirements — if you need sub-second responses, the routing overhead matters Single-task, single-model workloads — if one model is clearly optimal for 100% of your traffic, a router adds complexity without benefit Full control over model selection — if you need deterministic model choice per request Mode Selection Guide Is accuracy critical (compliance, legal, medical)? Is accuracy critical (compliance, legal, medical)? └─ YES → Quality-Optimised └─ NO → Strict budget constraints? └─ YES → Cost-Optimised └─ NO → Balanced (recommended) Best Practices Start with Balanced mode — measure actual results, then optimise Test with your real prompts — use the Custom Prompt feature to validate routing before production Monitor model distribution — track which models handle your traffic over time Compare against a baseline — always keep a standard deployment to measure savings Review regularly — as new models enter the routing pool, distributions shift Technical Stack Technology Purpose React 19 + TypeScript 5.9 UI and type safety Vite 7 Dev server and build tool Tailwind CSS 4 Styling Recharts 3 Distribution and comparison charts Azure OpenAI API (2024-10-21) Model Router and standard completions Security measures include an ErrorBoundary for crash resilience, sanitised API error messages, AbortController request timeouts, input length validation, and restrictive security headers. API keys are loaded from environment variables and gitignored. Source: leestott/router-demo-app: An interactive web application demonstrating the power of Microsoft Foundry Model Router - an intelligent routing system that automatically selects the optimal language model for each request based on complexity, reasoning requirements, and task type. ⚠️ This demo calls Azure OpenAI directly from the browser. This is fine for local development. For production, proxy through a backend and use Managed Identity. Try It Yourself Quick Start git clone https://github.com/leestott/router-demo-app/ cd router-demo-app # Option A: Use the setup script (recommended) # Windows: .\setup.ps1 -StartDev # macOS/Linux: chmod +x setup.sh && ./setup.sh --start-dev # Option B: Manual npm install cp .env.example .env.local # Edit .env.local with your Azure credentials npm run dev Open http://localhost:5173 , select a prompt, and click ⚡ Run Both. Get Your Credentials Go to ai.azure.com → open your project Copy the Project connection string (endpoint URL) Navigate to Deployments → confirm model-router is deployed Get your API key from Project Settings → Keys Configuration Edit .env.local : VITE_ROUTER_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_ROUTER_API_KEY=your-api-key VITE_ROUTER_DEPLOYMENT=model-router VITE_STANDARD_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_STANDARD_API_KEY=your-api-key VITE_STANDARD_DEPLOYMENT=gpt-5-nano Ideas for Enhancement Historical analysis — persist results to track routing trends over time Cost projections — estimate monthly spend based on prompt patterns and volume A/B testing framework — compare modes with statistical significance Streaming support — show model selection for streaming responses Export reports — download benchmark data as CSV/JSON for further analysis Conclusion Model Router addresses a real problem: most AI workloads have mixed complexity, but most deployments use a single model. By routing each request to the right model automatically, you get: Cost savings (~4.5–14.2% measured across modes, scaling with volume) Intelligent distribution (4 models used, zero routing code) Operational simplicity (one endpoint, mode changes via portal) Future-proofing (new models added to the pool automatically) The latency trade-off is minimal — in Quality mode, the Router was actually faster than the standard deployment. The real value is flexibility: tune for cost, quality, or balance without touching your code. Ready to try it? Clone the demo repository, plug in your Azure credentials, and test with your own prompts. Resources Model Router Benchmark Sample Sample App Model Router Concepts Official documentation Model Router How-To Deployment guide Microsoft Foundry Portal Deploy and manage Model Router in the Catalog Model listing Azure OpenAI Managed Identity Production auth Built to explore Model Router and share findings with the developer community. Feedback and contributions welcome, open an issue or PR on GitHub.From AI pilots to public decisions: what it really takes to close the intelligence gap
Across the public sector, the conversation about AI has shifted. The question is no longer whether AI can generate insight—most leaders have already seen impressive pilots. The harder question is whether those insights survive the realities of government: public scrutiny, auditability, cross‑department delivery, and the need to explain decisions in plain language. That challenge was recently articulated by Sadaf Mozaffarian, writing in Smart Cities World, in the context of city‑scale AI deployments. Governments don’t need more experiments. They need decision‑ready intelligence—intelligence that can be acted on safely, governed consistently, and defended when outcomes are questioned. What’s emerging now is a more operational lens on AI adoption, one that exposes two issues many pilots quietly avoid. Decision latency is the real enemy In government, decision latency is not about slow analytics, it’s the time lost between having a signal and being able to act on it with confidence. Much of the focus in AI discussions is on accuracy, bias, or model performance. But in cities, the more damaging problem is often this latency. When data is fragmented across departments, policies live in PDFs, and institutional knowledge walks out the door at 5pm, leaders may have insight but still can’t decide fast enough. AI pilots often demonstrate answers in isolation, but they don’t reduce the friction between insight, approval, and execution. Decision‑ready intelligence directly attacks this problem. It brings together: Operational data already trusted by the organization Policy and regulatory context that constrains decisions Human checkpoints that reflect how accountability actually works The result isn’t faster answers—it’s faster decisions that stick, because they align with how governments are structured to operate. Institutional memory is infrastructure Cities invest heavily in physical infrastructure—roads, pipes, facilities—but far less deliberately in institutional memory. Yet planning rationales, inspection notes, precedent cases, and prior decisions are often what make or break today’s choices. Consider a routine enforcement or permitting decision that looks reasonable on current data, but quietly contradicts a prior settlement, a regulator’s interpretation, or a lesson learned during a past inquiry. AI systems that don’t account for this history don’t just miss context, they create risk. Decision‑ready intelligence treats institutional memory as a first‑class asset. It ensures that when AI supports a decision, it does so with: Access to relevant historical records and prior outcomes Clear lineage back to source documents and policies Logging that preserves not just what was decided, but why This is what allows governments to move faster without relearning the same lessons under audit pressure. Why this matters now Public sector AI initiatives rarely fail because of a lack of ambition. They stall because trust questions—governance, records, explainability—arrive too late. By the time leaders ask, “Can we stand behind this decision?” the system was never designed to answer. Decision‑ready intelligence flips that sequence. Governance is not bolted on after the pilot; it’s built into the operating model from the start. That’s what allows agencies to scale from a single use case to repeatable patterns across departments. A practical starting point The cities making progress aren’t trying to transform everything at once. They start small but visible: Identify one cross‑department “moment of truth” Define what must be logged, retained, and explainable Connect just enough data, policy, and work context to support that decision From there, they reuse the same patterns—governed data products, policy knowledge bases, and human‑in‑the‑loop workflows—to scale responsibly. AI in government will ultimately be judged the same way every public investment is judged: by outcomes, fairness, and public confidence. Closing the intelligence gap isn’t about smarter models. It’s about designing decision systems that reflect how governments actually work—and are held accountable. Learn more by reading Sadaf's full article: Closing the intelligence gap: how cities turn AI experiments into operational impact87Views0likes0CommentsUnifying Scattered Observability Data from Dynatrace + Azure for Self-Healing with SRE Agent
What if your deployments could fix themselves? The Deployment Remediation Challenge Modern operations teams face a recurring nightmare: A deployment ships at 9 AM Errors spike at 9:15 AM By the time you correlate logs, identify the bad revision, and execute a rollback—it's 10:30 AM Your users felt 75 minutes of degraded experience The data to detect and fix this existed the entire time—but it was scattered across clouds and platforms: Error logs and traces → Dynatrace (third-party observability cloud) Deployment history and revisions → Azure Container Apps API Resource health and metrics → Azure Monitor Rollback commands → Azure CLI Your observability data lives in one cloud. Your deployment data lives in another. Stitching together log analysis from Dynatrace with deployment correlation from Azure—and then executing remediation—required a human to manually bridge these silos. What if an AI agent could unify data from third-party observability platforms with Azure deployment history and act on it automatically—every week, before users even notice? Enter SRE Agent + Model Context Protocol (MCP) + Subagents Azure SRE Agent doesn't just work with Azure. Using the Model Context Protocol (MCP), you can connect external observability platforms like Dynatrace directly to your agent. Combined with subagents for specialized expertise and scheduled tasks for automation, you can build an automated deployment remediation system. Here's what I built/configured for my Azure Container Apps environment inside SRE Agent: Component Purpose Dynatrace MCP Connector Connect to Dynatrace's MCP gateway for log queries via DQL 'Dynatrace' Subagent Log analysis specialist that executes DQL queries and identifies root causes 'Remediation' Subagent Deployment remediation specialist that correlates errors with deployments and executes rollbacks Scheduled Task Weekly Monday 9 AM health check for the 'octopets-prod-api' Container App Subagent workflow: The subagent workflow in SRE Agent Builder: 'OctopetsScheduledTask' triggers 'RemediationSubagent' (12 tools), which hands off to 'DynatraceSubagent' (3 MCP tools) for log analysis. How I Set It Up: Step by Step Step 1: Connect Dynatrace via MCP SRE Agent supports the Model Context Protocol (MCP) for connecting external data sources. Dynatrace exposes an MCP gateway that provides access to its APIs as first-class tools. Connection configuration: { "name": "dynatrace-mcp-connector", "dataConnectorType": "Mcp", "dataSource": "Endpoint=https://<your-tenant>.live.dynatrace.com/platform-reserved/mcp-gateway/v0.1/servers/dynatrace-mcp/mcp;AuthType=BearerToken;BearerToken=<your-api-token>" } Once connected, SRE Agent automatically discovers Dynatrace tools. 💡 Tip: When creating your Dynatrace API token, grant the `entities.read`, `events.read`, and `metrics.read` scopes for comprehensive access. Step 2: Build Specialized Subagents Generic agents are good. Specialized agents are better. I created two subagents that work together in a coordinated workflow—one for Dynatrace log analysis, the other for deployment remediation. DynatraceSubagent This subagent is the log analysis specialist. It uses the Dynatrace MCP tools to execute DQL queries and identify root causes. Key capabilities: Executes DQL queries via MCP tools (`create-dql`, `execute-dql`, `explain-dql`) Fetches 5xx error counts, request volumes, and spike detection Returns consolidated analysis with root cause, affected services, and error patterns 👉 View full DynatraceSubagent configuration here RemediationSubagent This is the deployment remediation specialist. It correlates Dynatrace log analysis with Azure Container Apps deployment history, generates correlation charts, and executes rollbacks when confidence is high. Key capabilities: Retrieves Container Apps revision history (`GetDeploymentTimes`, `ListRevisions`) Generates correlation charts (`PlotTimeSeriesData`, `PlotBarChart`, `PlotAreaChartWithCorrelation`) Computes confidence score (0-100%) for deployment causation Executes rollback and traffic shift when confidence > 70% 👉 View full RemediationSubagent configuration here The power of specialization: Each agent focuses on its domain—DynatraceSubagent handles log analysis, RemediationSubagent handles deployment correlation and rollback. When the workflow runs, RemediationSubagent hands off to DynatraceSubagent (bi-directional handoff) for analysis, gets the findings back, and continues with remediation. Simple delegation, not a single monolithic agent trying to do everything. Step 3: Create the Weekly Scheduled Task Now the automation. I configured a scheduled task that runs every Monday at 9:30 AM to check whether deployments in the last 4 hours caused any issues—and automatically remediate if needed. Scheduled task configuration: Setting Value Task Name OctopetsScheduledTask Frequency Weekly Day of Week Monday Time 9:30 AM Response Subagent RemediationSubagent Scheduled Task Configuration Configuring the OctopetsScheduledTask in the SRE Agent portal The key insight: the scheduled task is just a coordinator. It immediately hands off to the RemediationSubagent, which orchestrates the entire workflow including handoffs to DynatraceSubagent. Step 4: See It In Action Here's what happens when the scheduled task runs: The scheduled task triggering and initiating Dynatrace analysis for octopets-prod-api The DynatraceSubagent analyzes the logs and identifies the root cause: executing DQL queries and returning consolidated log analysis The RemediationSubagent then generates correlation charts: Finally, with a 95% confidence score, SRE agent executes the rollback autonomously: executing rollback and traffic shift autonomously. The agent detected the bad deployment, generated visual evidence, and automatically shifted 100% traffic to the last known working revision—all without human intervention. Why This Matters Before After Manually check Dynatrace after incidents Automated DQL queries via MCP Stitch together logs + deployments manually Subagents correlate data automatically Rollback requires human decision + execution Confidence-based auto-remediation 75+ minutes from deployment to rollback Under 5 Minutes with autonomous workflow Reactive incident response Proactive weekly health checks Try It Yourself Connect your observability tool via MCP (Dynatrace, Datadog, New Relic, Prometheus—any tool with an MCP gateway) Build a log analysis subagent that knows how to query your observability data Build a remediation subagent that can correlate logs with deployments and execute fixes Wire them together with handoffs so the subagents can delegate log analysis Create a scheduled task to trigger the workflow automatically Learn More Azure SRE Agent documentation Model Context Protocol (MCP) integration guide Building subagents for specialized workflows Scheduled tasks and automation SRE Agent Community Azure SRE Agent pricing SRE Agent Blogs554Views0likes0CommentsUpcoming webinar: Maximize the Cost Efficiency of AI Agents on Azure
AI agents are quickly becoming central to how organizations automate work, engage customers, and unlock new insights. But as adoption accelerates, so do questions about cost, ROI, and long-term sustainability. That’s exactly what the Maximize the Cost Efficiency of AI Agents on Azure webinar is designed to address. The webinar will provide practical guidance on building and scaling AI agents on Azure with financial discipline in mind. Rather than focusing only on technology, the session helps learners connect AI design decisions to real business outcomes—covering everything from identifying high-impact use cases and understanding cost drivers to forecasting ROI. Whether you’re just starting your AI journey or expanding AI agents across the enterprise, the session will equip you with strategies to make informed, cost-conscious decisions at every stage—from architecture and model selection to ongoing optimization and governance. Who should attend? If you are in one of these roles and are a decision maker or can influence decision makers in AI decisions or need to show ROI metrics on AI, this session is for you. Developer Administrator Solution Architect AI Engineer Business Analyst Business User Technology Manager Why attending the webinar? In the webinar, you’ll hear how to translate theory into real-world scenarios, walk through common cost pitfalls, and show how organizations are applying these principles in practice. Most importantly, the webinar helps you connect the dots faster, turning what you’ve learned into actionable insights you can apply immediately, ask questions live, and gain clarity on how to maximize ROI while scaling AI responsibly. If you care about building AI agents that are not only innovative but also efficient, governable, and financially sustainable, this training—and this webinar that complements it—are well worth your time. Register for the free webinar today for the event on March 5, 2026, 8:00 AM - 9:00 AM (UTC-08:00) Pacific Time (US & Canada). Who will speak at the webinar? Your speakers will be: Carlotta Castelluccio: Carlotta is a Senior AI Advocate with the mission of helping every developer to succeed with AI, by building innovative solutions responsibly. To achieve this goal, she develops technical content, and she hosts skilling sessions, enabling her audience to take the most out of AI technologies and to have an impact on Microsoft AI products’ roadmap. Nitya Narasimhan: Nitya is a PhD and Polyglot with 25+ years of software research & development experience spanning mobile, web, cloud and AI. She is an innovator (12+ patents), a visual storyteller (@sketchtedocs), and an experienced community builder in the Greater New York area. As a senior AI Advocate on the Core AI Developer Relations team, she acts as "developer 0" for the Microsoft Foundry platform, providing product feedback and empowering AI developers to build trustworthy AI solutions with code samples, open-source curricula and content-initiatives like Model Mondays. Prior to joining Microsoft, she spent a decade in Motorola Labs working on ubiquitous & mobile computing research, founded Google Developer Groups in New York, and consulted for startups building real-time experiences for enterprise. Her current interests span Model understanding & customization, E2E Observability & Safety, and agentic AI workflows for maintainable software. Moderator Lee Stott is a Principal Cloud Advocate at Microsoft, working in the Core AI Developer Relations Team. He helps developers and organizations build responsibly with AI and cloud technologies through open-source projects, technical guidance, and global developer programs. Based in the UK, Lee brings deep hands-on experience across AI, Azure, and developer tooling. .Autofilter Channel Posts
We manage a variety of digital products for a large user base across many different countries and want to curate our comms to limit irrelevant posts reaching users unnecessarily. For example a post about "Product A affecting Country X" should not reach "Country Y". What is the best practice here? Currently we've opted for a central channel with tags to assist users in managing their notifications, but they will still see all posts when visiting the channel and will need to self-filter. Is there a way for us to configure the channel so they only see posts they are tagged in? I know we could set up individual channels for each Product+Country combination but then any common posts we make would need to be duplicated into each channel separately. Thinking out loud, I suppose if we automate our posts with a workflow/AI then discrete channels is a good solution to keep things organised. Any ideas or feedback please?47Views0likes1Comment