azure

591 Topics

Best practices for Infrastructure as Code CI/CD on Azure
Hello Folks! If your IaC repo has a dev folder, a test folder, and a prod folder that all started out identical and have since drifted in three different directions, this session is for you. At the Microsoft Azure Infrastructure Summit 2026, Jack Tracey and Jared Holgate (the team behind Azure Landing Zones and Azure Verified Modules) laid out, in plain language, how to ship Infrastructure as Code on Azure without leaking secrets, blowing up production, or duplicating thousands of lines of module code across folders. Here are the bits that matter most for IT Pros and platform engineers. 📺 Watch the session: Why IT Pros Should Care You are the one paged at 2am when a pipeline rolls out a broken NSG rule. You are the one carrying the cert that the deploy service principal still uses. You are the one explaining to audit why the prod plan and the prod apply ran with the same Owner-scoped identity. So this session is squarely in your lane. It covers: Why hand-rolled modules are slowly becoming an anti-pattern on Azure. A repo layout that scales to dozens of environments without copy-paste. How to get rid of static client secrets and federated cert auth, for good. Where approvals actually need to live in GitHub vs. Azure DevOps so they cannot be bypassed. The three-layer Terraform state model that Microsoft uses inside Azure Landing Zones. In short, this is the practitioner version of “do IaC properly,” from the people who write the platform code Microsoft ships. The IaC CI/CD problem Jack opened with a slide that gets a knowing laugh from anyone who has been doing this for more than a year. You start with one repo, one Bicep file, one happy team. Eighteen months later, you have a landingzone-prod-v2-final-USE-THIS-ONE folder, a service principal whose secret expired two days ago, and a pipeline nobody dares touch. The drivers of that pain are consistent: Modules written from scratch, never tested the same way twice. Per-environment folders that diverge silently over time. Long-lived secrets and certificates sitting in pipeline variables. One identity doing both plan and apply, with Owner on the management group. No approvals, or approvals in the wrong place. No tests until the deploy fails in prod. The good news is none of these problems are new, and the patterns to fix them are well understood. The session walks through them in the order you would actually adopt them. Patterns that work in production 1. Don’t write modules. Consume Azure Verified Modules. This is best practice number one, and Jack and Jared spent a full chapter on it for a reason. Azure Verified Modules (AVM) is the official Microsoft initiative that consolidates IaC modules for Azure into a single, supported, Well-Architected-aligned library, available in both Bicep and Terraform. The Bicep versions live in the Public Bicep Registry under the avm/ namespace. The Terraform versions live on the HashiCorp Terraform Registry under Azure/avm-*. What you get for free when you consume an AVM module: Defaults that line up with the Well-Architected Framework (RBAC over access policies, TLS 1.2, private endpoint support out of the box). Semantic versioning so you can pin and review the diff before upgrading. Deployment tests on every module, run by the AVM team. A real Microsoft support path, not a random GitHub issue. A great backchannel question came up about brownfield. Jared’s answer: AVM is just standard IaC, no special tooling. In Bicep, brownfield adoption is straightforward because there is no state. In Terraform, the new import blocks make it less painful than it used to be. 2. One folder, one source of truth Repo layout is where most teams go wrong, and the fix is simple. You should have one set of module code, and per-environment differences should be expressed as data, not as duplicated code. In Bicep, that means a single main.bicep and one .bicepparam file per environment. In Terraform, the same main.tf with one .tfvars file per environment. If you find yourself copying a module folder to dev, test, and prod, stop. Within six months those three folders will not look the same, and at that point you no longer have IaC, you have three handwritten environments that happen to be checked into Git. 3. Kill static secrets. Use Workload Identity Federation. This was the chat highlight. The question came in: “So in short, replace all service principals with credential secrets with user-assigned managed identity?” Jack and Jared both replied within seconds: yes, 10 points to you. Workload Identity Federation (OIDC) lets your GitHub Actions or Azure DevOps pipeline exchange a short-lived token from its own OIDC provider for a Microsoft Entra ID token. No client secrets, no certs to rotate, no Key Vault dance to retrieve them. A couple of things to know: Subject claim format differs by platform. GitHub uses repo:org/repo:environment:prod style claims; Azure DevOps uses sc://org/project/connection. Pick the right one or auth silently fails. Use a user-assigned managed identity as the target. It survives the pipeline being deleted and gives you one place to manage role assignments. The Azure Bicep Deploy GitHub Action and the official AzureRM / AzAPI Terraform providers all support OIDC natively. 4. Split plan from apply Even with OIDC, a single Owner-scoped identity that does both terraform plan and terraform apply is a problem. Plan needs Reader (and a few read-data permissions). Apply needs Contributor or Owner depending on what you deploy. Split them into two identities, federated to two different stages of your pipeline, and you have a real least-privilege story to take to your security team. Securing the pipeline Auth is half the story. The other half is making sure only the right pipelines, with the right approvals, can use those identities at all. Governed templates. Keep reusable pipeline templates in a separate, locked-down repo. Pin federated credentials or service connections to those templates via the job_workflow_ref claim on GitHub or required template checks on Azure DevOps. If someone forks the workflow, the OIDC exchange refuses to issue a token. Approvals in the right place. On GitHub, use Environments and require reviewers on prod. On Azure DevOps, put the approval on the Service Connection, not the Environment. The Environment approval can be bypassed by a clever YAML author. The Service Connection approval cannot. Shift left, hard. Pre-commit hooks for bicep format and terraform fmt, lint on every PR, GitHub Advanced Security for secret and code scanning, automated tests on PRs, and ephemeral test environments spun up per PR and torn down at the end. One attendee mentioned using Pester for end-to-end infra tests against a sandbox sub. That is exactly the pattern. Three-layer state. For Terraform on Azure Landing Zones, the recommended split is: platform landing zone (one state), application landing zone / subscription vending (one state per landing zone), application workload (one state per workload). Never collapse all subs into one state file. You will regret it the first time someone runs apply at the wrong time. Getting Started You do not have to do all of this at once. Pick the highest-pain item first. Still using client secrets in pipelines? Fix that this sprint. Wire up OIDC and a user-assigned managed identity. Drifting per-environment folders? Consolidate to one module plus per-env param files. Writing your own storage account module for the fifth time? Try the matching AVM module from the registry. Put approvals on the Service Connection (ADO) or Environment (GitHub) for prod. Add linting and pre-commit hooks. Split plan and apply identities. Layer your Terraform state. It is a roadmap, not a weekend project. Every step pays back the moment you take it. Resources Azure Verified Modules portal. the official AVM home, with module indexes for Bicep and Terraform, specs, and FAQ. Azure Verified Modules on GitHub. the tracking repo and source of truth for module proposals. Bicep on Microsoft Learn. official language docs, deployment guidance, and references for the public registry. Azure Bicep Deploy GitHub Action. the OIDC-friendly action for deploying Bicep from GitHub Actions. GitHub Actions for Azure on Microsoft Learn. Workload Identity Federation setup for GitHub Actions targeting Azure. Configuring OpenID Connect in Azure (GitHub Docs). the canonical OIDC subject claims and federated credential walkthrough for GitHub. Azure Pipelines documentation. service connections, approvals and checks, required templates, and YAML reference. Watch the rest of the Summit This session was one of many at the Microsoft Azure Infrastructure Summit 2026. If you want the keynotes, the Bicep deep dives, the AKS sessions, and the storage track, the full playlist is here: Microsoft Azure Infra Summit 2026 playlist Cheers! Pierre Roman
Pierre_Roman
Jun 26, 2026 Place ITOps Talk Blog
229Views
1like
0Comments
Building Secure, Well-Architected Azure Workloads with Azure Verified Modules and GitHub Copilot
Hello Folks! If you have been writing Bicep or Terraform for Azure over the last few years, you have probably lived this story. You pick a community module, it works great for six months, then the maintainer moves on, issues stop getting answered, and you are stuck owning code you never wrote. At the Microsoft Azure Infra Summit 2026, Jack Tracy and Jarrod Holgate (tech leads on the Azure Verified Modules project) walked us through how AVM solves that, and how pairing it with GitHub Copilot and Spec Kit changes the way IT pros build Azure workloads. 📺 Watch the session: Why IT Pros Should Care This is not a developer-only topic. If you are the person responsible for landing zones, platform engineering, or the IaC pipelines that other teams ship through, this hits you directly. You stop owning home-grown storage account and VNet modules that no two teams write the same way. You get secure-by-default resources without having to draft a 40-page internal coding standard. You can let application teams move fast without sacrificing the Well-Architected Framework guardrails you care about. You get a supported, Microsoft-backed module library with a clear lifecycle, instead of betting on an abandoned repo. You finally have a deterministic way to put AI to work on infrastructure code without it inventing things you do not want in production. If any of that sounds like a Tuesday for you, this session is worth 40 minutes. What are Azure Verified Modules Azure Verified Modules (AVM) is the official Microsoft infrastructure-as-code module library for both Bicep and Terraform. Jack put it plainly in the session: AVM is the one-time solution that is not going to go away, with ownership, a defined lifecycle, structure, and well-defined specifications. Here is what makes AVM different from the previous landscape of community repos: It is supported in multiple IaC languages today (Bicep and Terraform), with consistent specifications across both. Modules are aligned to the Azure Well-Architected Framework by default. Zone redundancy on, public IPs off, sensible TLS minimums, right out of the box. Everything is still flexible, you can override any of it via a parameter or variable. It is open source. People inside and outside Microsoft can contribute and maintain modules. It consolidates the older CARML and Terraform Verified Modules efforts under one roof, owned by Microsoft FTEs and backed by the AVM core team. AVM has three module classifications, and understanding them is half the battle: Resource modules. A one-to-one mapping to a single resource type, like a storage account or a virtual network. Need ten of them, loop the module ten times. Pattern modules. A collection of resources, usually built on top of resource modules, that delivers a bigger slice of an architecture. The Azure Landing Zone is roughly five pattern modules behind the scenes. Utility modules. Helpers you probably never call directly, but that the library uses for things like region lookups, SKU availability, and naming standards. One thing that gets undersold: AVM is not just for you. The Azure Developer CLI templates use it. Azure Landing Zone and Sovereign Landing Zone are built on it. Internal Microsoft service teams use it. When you adopt AVM, you are using the same building blocks Microsoft uses. Pairing AVM with GitHub Copilot This is where the session gets interesting. AVM gives you the trusted Lego bricks. GitHub Copilot gives you a coding assistant. The problem, as Jack called out, is that AI is non-deterministic by default. It is great at solving ambiguous problems, but you cannot just point it at a blank repo and trust it to stamp out production infrastructure. That is the gap spec-driven development is designed to fill. Spec-driven development is a documentation-first approach. Instead of telling Copilot “write me a Terraform module for a hub-spoke network,” you write a structured specification up front that captures intent, quality bar, security requirements, and coding standards. The AI then uses that spec as the contract, generates code, validates against it, and loops until the output matches what you asked for. Jarrod walked through Spec Kit, the open source toolkit maintained by GitHub and Microsoft, which formalizes this into eight steps: Constitution. The non-negotiables. “We must use AVM. We must comply with PCI. Optimize for cost.” This is your project DNA. Specify. What you actually want to build, focused on user goals and outcomes, not implementation details. Clarify. Copilot scans the spec, finds ambiguities, and asks you targeted questions (IP ranges, bastion SKUs, anything that is fuzzy). Plan. A technical plan that maps the spec to your standards and constraints. Checklist. A quality checklist the agent uses later to validate its own work. Tasks. The plan broken down into small, reviewable steps. Analyze. A consolidated report across the spec, plan, and tasks so you can sanity check the whole package. Implement. Copilot finally writes the code, validating against everything above as it goes. The critical detail: at every one of those gates, you review. You are still the human in the loop. The AI is not flying solo, and you are not signing off on a thousand-line code dump. When you wire AVM into the constitution (“use AVM modules wherever possible”), Copilot stops trying to hand-roll raw resource declarations. It composes solutions out of trusted, tested, WAF-aligned modules. That is what makes the combination so powerful. Spec Kit is not the only option. Jack mentioned two others worth knowing about: OpenSpec. Leaner than Spec Kit, brownfield-first, aimed at smaller experienced teams. Squad. A completely different model built by a Microsoft team. No specs. Instead, a virtual team of agent personas (IaC specialist, UX, deployment, an orchestrator called Ralph) that collaborate to deliver work. Worth a look if your style is more agent-team than document-first. Real-world value So what does this actually buy you when Monday morning hits? Speed without sacrificing the bar. Application teams stop writing storage account boilerplate. They focus on what the workload needs to do, and the AVM modules handle the resilient, compliant defaults. Compliance becomes additive, not a rewrite. If you need to add HIPAA or NIST compliance later, you add another spec on top of your existing constitution and iterate. You do not throw out your modules. Less ambiguity loop, fewer tokens burned. A good spec up front means fewer Copilot iterations. You get to a working answer faster, with less back and forth. Trust in the AI output. Because AVM modules are tested, supported, and WAF-aligned, what Copilot stitches together is built on solid foundations. You can review the spec instead of every line of Terraform. Your developers shift up the stack. They stop writing IaC primitives and start designing architectures and requirements. That is where the business value lives anyway. A note on tradeoffs. AVM modules are intentionally generic and flexible, so you sometimes get parameters you do not need, and the well-architected defaults can be opinionated for your scenario. The fix is simple, override the parameter. You are trading some control for a lot of consistency, and for most teams that trade is the right one. Getting Started If you want to try this for yourself, here is the path I would take: Go to aka.ms/AVM and bookmark it. Everything starts there. Browse the Bicep and Terraform module indexes. Find the resource you would normally hand-write and try the AVM version in a dev subscription. Read the AVM specifications so you understand the contract every module follows. It makes the parameter sets a lot less surprising. Install Spec Kit via the Specify CLI (the GitHub repo has the instructions) and try the AVM example under the experimental “AI-Assisted Solution Development” section on the AVM site. Run the eight-step Spec Kit flow against a small workload. Do not start with your production landing zone. Pick something contained, like a single app with a web tier, a database, and a Key Vault. Keep the human in the loop. Review every spec gate. That is where the quality comes from. Resources Azure Verified Modules portal (aka.ms/AVM) Azure Verified Modules on GitHub Azure Verified Modules on Microsoft Learn GitHub Spec Kit Spec-driven development with AI (GitHub Blog) Implement spec-driven development with Spec Kit (Microsoft Learn) GitHub Copilot Azure Well-Architected Framework Watch the rest of the Summit If you found this useful, there is a lot more where it came from. The Microsoft Azure Infra Summit 2026 playlist covers landing zones, deployment stacks, AKS networking, storage, and the AI side of platform operations. Block out an afternoon and binge it. Microsoft Azure Infra Summit 2026 on YouTube Cheers! Pierre Roman
Pierre_Roman
Jun 25, 2026 Place ITOps Talk Blog
192Views
1like
0Comments
Deploy an Azure Landing Zone in About Twelve Minutes with the ALZ IaC Accelerator
Hello Folks! Welcome back to my coverage of the Microsoft Azure Infra Summit 2026. This session is one I have been looking forward to, because if you have ever stood up an Azure Landing Zone (ALZ) by hand, you know it can eat weeks. Management groups, policy assignments, Hub-and-Spoke networking, log analytics, Defender for Cloud, identities, pipelines, governed branches. There is a lot of plumbing. In this session Jack Tracy (he leads the Azure Landing Zones team) and Jarrod Holgate (tech lead on Azure Landing Zones and Azure Verified Modules) walk through the ALZ Infrastructure as Code Accelerator. Then they actually run it, and a bootstrap that used to be a multi-week journey wraps up in about twelve minutes of typing and ticking boxes. 📺 Watch the session: Why IT Pros Should Care If you are the person who has to deliver a secure, governed Azure platform before your dev teams can land their first workload, this matters to you. Here is the short version of why: It bakes in the Cloud Adoption Framework “start right, stay right” pattern so you do not have to invent it. It supports both Bicep and Terraform, and it bootstraps GitHub or Azure DevOps for you (with a local file system option for GitLab, Bitbucket, or whatever else you run). It covers roughly 80% of common customer scenarios out of the box. You do not have to write modules from scratch. It is open source, every module is published, and you can fork or compose as you see fit. It is now built entirely on Azure Verified Modules (AVM), so what you deploy is aligned with the Well-Architected Framework by default. In short, if you have been hand-crafting management group hierarchies and policy assignments in the portal, stop. There is a better way, and the team that designs ALZ ships it as code you can actually read. What is the ALZ IaC Accelerator A quick recap, because it is worth getting the vocabulary right. The Azure Landing Zone lives inside the CAF Ready methodology. It is the shared platform (networking, identity, logging, policy, management groups) that supports the many application landing zones your workload teams consume. Jack uses a great analogy in the session: think of a metropolis. Before residents and businesses can move in, you need water, gas, electricity, and roads. The platform landing zone is the utilities layer. The application landing zones are the buildings. The ALZ IaC Accelerator is the tooling that deploys and manages that platform layer using declarative infrastructure as code. It is composed of: A set of IaC modules in Bicep and Terraform (all of them built on AVM). A bootstrap layer for GitHub or Azure DevOps (or local file system). The ALZ PowerShell module, published to the PowerShell Gallery, which orchestrates everything. Comprehensive docs covering prereqs, scenarios, and options. The accelerator is a Microsoft-supported, open source path to a production-grade landing zone. You should look at it before you decide to roll your own. How it works The accelerator runs in four phases. Jarrod walks through each of them in the demo. Phase 0: Plan. You make decisions: Bicep or Terraform, GitHub or Azure DevOps, single or multi-region, Hub-and-Spoke or Virtual WAN, Azure Firewall or NVA, DDoS on or off, and so on. Phase 1: Prereqs. Before the accelerator runs, you need two things in place: an identity to run the bootstrap, and the platform subscriptions. Traditionally this was four (connectivity, identity, management, security). There is now a new lighter option that needs only two subscriptions for smaller environments. Phase 2: Bootstrap. This is where the magic happens. You feed it a bootstrap configuration file plus a platform landing zone configuration file, then run the Deploy-Accelerator command. The PowerShell module deploys identities, optional Terraform state storage with private networking, optional self-hosted container-instance runners, and then sets up your repositories, pipelines, environments, governed pipeline templates, and OIDC-based service connections using Workload Identity Federation. No manual steps after Phase 2. Phase 3: Deploy. Run the CD pipeline. The platform landing zone deploys. Done. A few things worth highlighting about the bootstrap: The accelerator deploys two identities: one with read-only for plan / what-if, one with write for apply / deploy. Least privilege, out of the box. Pipelines are governed. The actual deployment pipeline lives in a separate template repository, so changes to it require an approval. A CI pipeline runs on pull requests automatically. You get the engineering hygiene without configuring it. Real-world scenarios and when to use it Jarrod calls these “scenarios” and “options”. They are the difference between picking a starting pattern (scenario) and tuning it (options). Scenarios. There are 11 of them out of the box. Pick the one that matches your starting state: Single region, Hub-and-Spoke, Azure Firewall. Multi-region, Hub-and-Spoke, Azure Firewall. Single or multi-region with Virtual WAN. Single or multi-region with a third-party NVA. No-connectivity (governance only, no Hub networking) for organizations who are not ready for centralized networking yet. New scenarios 10 and 11, which are cost-optimized for small and medium businesses with around 10 workloads. Same modules, same orchestration, just a smaller, cheaper starting shape. Sovereign landing zone for customers with data sovereignty and confidential compute requirements. Options. Once you pick a scenario, you can tune it. The 16 documented options are the ones the team sees customers ask about most often: customizing resource names, customizing management group names, turning the DDoS protection plan on or off, choosing the sovereign baseline, and more. Behind those, Terraform alone exposes hundreds of variables. Honest tradeoffs (because Pierre always tells you the rough edges): OpenTofu is not supported today. Just Bicep and Terraform. Personal Access Tokens are still required for Azure DevOps and self-hosted agents at the time of the session. The team has confirmed CLI / managed identity support is on the roadmap. Brownfield is “it depends”. The accelerator is greenfield-friendly. Retrofitting an existing tenant is possible but is going to depend on your current state and your risk appetite. You still own decisions. The Lady Justice slide in the session is a great reminder: balancing dev team freedom with central governance is your job. The accelerator gives you the controls; it does not pick your policy posture for you. Getting Started If you want to try this without waiting, here is the path Jarrod actually demoed: Install the ALZ PowerShell module from the PowerShell Gallery. Create your platform subscriptions (two minimum, four for the classic layout) and an identity for the bootstrap. Run Deploy-Accelerator with no parameters. It will prompt you interactively for everything: region, parent management group, subscriptions, naming convention, self-hosted agents yes or no, private networking yes or no, PAT, project name, and approvers. Review the two generated configuration files: the bootstrap config and the platform landing zone tfvars (or Bicep params). Confirm. The bootstrap runs Terraform behind the scenes and wires up Azure plus your repos. Run the CD pipeline. Approve at the apply stage. Your platform deploys. If you are not ready to drive Terraform directly, the Azure Migrate AI agent (in preview) wraps the exact same accelerator codebase behind a guided chat experience. You answer questions, it produces a zip with the same two config files plus a design document explaining the decisions it made. Then you hand that off to the same pipeline. The Azure MCP server has matching tooling for VS Code, so day-two changes like “turn off the DDoS protection plan” know to also uncomment the dependent policy assignments in the archetype files. That is the kind of context-aware editing that saves you from breaking your own deployment. Resources Azure Landing Zone in the Cloud Adoption Framework ALZ Accelerator hub (entry point for docs, scenarios, options) ALZ Terraform Accelerator on GitHub ALZ-Bicep on GitHub Azure Landing Zones Library (policies and archetypes) Azure Verified Modules Raise issues or feedback for the ALZ team Watch the rest of the Summit If you found this useful, the full Microsoft Azure Infra Summit 2026 playlist has a lot more: deployment stacks, Bicep beyond the basics, IaC CI/CD best practices, AVM with GitHub Copilot, and plenty of AKS and storage sessions. Grab the playlist here: Microsoft Azure Infra Summit 2026 on YouTube. Hit the ALZ team in the comments on the session, or open an issue on the repo. The team is genuinely active there. Cheers! Pierre Roman
Pierre_Roman
Jun 24, 2026 Place ITOps Talk Blog
227Views
1like
1Comment
From Prompt to Provisioned: A Closer Look at the Azure Deployment Agent
Hello Folks! If you sat through this session during the Microsoft Azure Infra Summit 2026, you already know that Anand Guruswami and Arun Rabindar from the Cloud Native Experiences team showed us something I have been waiting to see for a while. An AI agent that does not just spit out a Terraform file from a vague prompt, but actually thinks about your workload, talks to you about it, and then hands you something you can put in front of a pull request reviewer without holding your nose. This is the Azure Deployment Agent, and at the time of broadcast it was still in preview inside Azure Copilot, with the same brains shipping as an open source skill you can plug into GitHub Copilot, Claude Code, Cursor, or whatever your team uses. In this post I want to break down what they showed, why it matters for IT pros, and how you can get hands on with it. 📺 Watch the session: Why IT Pros Should Care Let us be honest about the day to day. Most of the time we are not building a brand new workload from a blank canvas. We are stitching resources together one at a time, copying patterns from a previous project, hunting down the right SKU, checking quotas, then arguing with policy on the way out the door. Different admins do it different ways, and that inconsistency is where risk lives. Here is what the Deployment Agent changes for us: It moves the conversation up a level, from “which resource do I click” to “what am I actually trying to build.” It grounds the architecture in the Azure Well-Architected Framework, so the output is not a generic LLM guess, it has reasoning behind it. It separates the plan from the code, so you and your team get to review architecture before any Terraform or Bicep gets written. It plugs into the tools we already use. Azure portal for the guided path, GitHub Copilot and Claude Code for the power user path. In short, it's about taking the boring repetitive parts off our plate so we can focus on the parts that need human judgment. What is the Azure Deployment Agent The Deployment Agent is a capability inside the Agents (preview) experience in Azure Copilot. Think of it as a virtual cloud solution architect that lives in your Copilot chat. You describe the workload in natural language, and it walks you through a multi step process to land on a production ready deployment. A few things that stood out from Anand’s portion of the session: It supports multi turn conversation. You can clarify scale, security posture, resilience, SKU preferences, region constraints, and the agent will fold those into the plan. It produces a human readable infrastructure plan first, complete with trade offs and the reasoning for each resource choice, before it ever writes infrastructure as code. Today it generates Terraform inside the portal, with Bicep support landing in the portal experience shortly. In the GitHub Copilot flow you can already pick Bicep or Terraform. Once the plan is approved, you get a real artifact. You can open it in VS Code for the Web, or have Copilot open a pull request straight into your GitHub repo. The deployment itself still goes through Azure Resource Manager. That is important. Your tenant policies, RBAC, naming conventions, and existing guardrails all still apply. The agent is not bypassing your governance, it is generating code that flows through it. How it Works Arun did a great job pulling back the curtain on the internals. The agent follows a two step pattern that gives you control at every checkpoint. Intent capture. The agent takes your prompt and clarifies the scope, the constraints, and what success looks like. No guessing, no jumping straight to YAML. Plan generation. It produces a structured infrastructure plan with inputs, sub goals, a full resource list, configurations, SKUs, and a per resource reasoning section. Validation in a loop. The plan runs through evaluators backed by the Well-Architected Framework pillars (reliability, security, cost, operational excellence, performance efficiency). If something fails, the agent regenerates and tries again until the plan is solid. Human review. The plan is presented to you in plain language. You can iterate. You can say “prioritize West US 2,” or “swap that SKU,” and the agent will update the plan in place. Code generation. Only after you approve the plan does the agent emit Terraform or Bicep. The generated code goes through syntactic validation as well, again in a loop, so it actually parses and is ready to apply. Under the hood in the GitHub Copilot and Claude Code path, the team has decomposed all of this into an open source skill (the Azure Enterprise Infrastructure Planner) plus the Azure Well-Architected Framework as an MCP tool. The base agent in your editor picks up the skill, runs the phases, calls the MCP tool to ground the output, and then writes the IaC. Same workflow, different host. When to Use it / Real-World Scenarios This is not just a toy for greenfield demos. A few places where I see this paying real dividends: New workload bootstrapping. A team needs a web app, SQL backend, secrets in Key Vault, monitoring, and a sane region strategy. Instead of three days of clicking and copy pasting, you describe it and review the plan. CSV ingestion to SQL automation. The Claude Code demo Arun ran was exactly this. CSV lands, gets processed, rows update in SQL. The agent picked sensible resources, justified each one, and produced Bicep ready to commit. Standardizing across teams. Different admins ending up with different shapes for the same workload is the silent killer of operational consistency. A shared agent with a shared planner skill drags everyone toward the same Well-Architected baseline. Skill leverage for smaller teams. Not every team has a deep Azure architect on staff. The agent encodes a lot of that experience and surfaces it as conversation. Open source customization. Because the skill and MCP tooling are open, platform teams in regulated environments can fork it, add their policy context, their tagging rules, their naming conventions, and ship a tuned version internally. One honest tradeoff. Right now the agent is greenfield first. The team is actively working on brownfield scenarios, pulling insights from existing workloads and referencing existing resources. If you live entirely in a complex existing estate, expect the experience to keep getting better over the next couple of releases. Getting Started If you want to try it this week, here is the short list: Ask your Azure tenant administrator to enable Agents (preview) in Azure Copilot. The toggle lives in the Azure Copilot admin center, and without it you will not see agent mode in chat. In the Azure portal, open Copilot, expand to full screen, and switch on Agent mode at the bottom of the chat panel. Describe a workload in plain language. Be specific about region, scale expectations, and any compliance constraints you care about. Review the generated plan before approving. Look at the trade offs section, that is where the agent shows its work. For the editor path, install the open source Azure Skills plugin from the microsoft/azure-skills repo, point your IDE at the Azure MCP Server, and run the same workflow inside GitHub Copilot or Claude Code. Send feedback. The team is shipping fast and the roadmap (brownfield support, reference workloads, scoped agent permissions, richer architecture diagrams) is shaped by what you tell them. Resources Deployment agent capabilities in Agents (preview) in Azure Copilot: https://learn.microsoft.com/en-us/azure/copilot/deployment-agent microsoft/azure-skills, the open source skill plugin shown in the session: https://github.com/microsoft/azure-skills Azure MCP Server on the GitHub MCP Registry: https://github.com/mcp/com.microsoft/azure Azure MCP Server tools for the Well-Architected Framework: https://learn.microsoft.com/en-us/azure/developer/azure-mcp-server/tools/azure-well-architected-framework Azure Well-Architected Framework documentation: https://learn.microsoft.com/en-us/azure/well-architected/ Agents (preview) in Azure Copilot overview: https://learn.microsoft.com/en-us/azure/copilot/agents-preview Watch the rest of the Summit If you enjoyed this session, the full Microsoft Azure Infra Summit 2026 playlist is up on YouTube. Sessions on Deployment Stacks, the SRE Agent, Azure Local, AKS networking, and a lot more are all in there. Bookmark this one and share it with your team: https://aka.ms/MAIS/2026-Playlist Drop your questions, your war stories, and your wish list for the Deployment Agent in the comments. I read them, the product team reads them, and your scenarios are exactly what shapes the next preview drop. What would you build with it first? Cheers! Pierre Roman
Pierre_Roman
Jun 23, 2026 Place ITOps Talk Blog
155Views
0likes
0Comments
Build a Sovereign Private Cloud with Azure Local
Hello Folks! Picture this. A regulator hands you a one-pager that says, in essence, “this data does not leave the building.” Or your link to Azure decides to take a nap during a critical batch run. Or you are standing up infrastructure in a remote site where connectivity is a coin flip on a good day. For a long time, our answer to that conversation was a stack of Azure Stack boxes plus a lot of wishful thinking. That story has changed, and it has changed quite a bit. At Microsoft Azure Infra Summit 2026, Thomas Maurer (Global Black Belt for Sovereign Cloud) walked us through what is now called the Microsoft Sovereign Private Cloud, with Azure Local as its foundation. In this post, I want to unpack the session for the ITPros in the room, the folks who have to actually run this stuff on Monday morning. Let us dig in. 📺 Watch the session: Why IT Pros Should Care Sovereignty is no longer a niche conversation. Thomas was very clear that there is no one-size-fits-all answer, and that is exactly why this matters to us as operators. The drivers landing on our desks now include: Regulatory requirements that demand data residency or full operator isolation. Sovereign AI workloads where the model and the data both need to stay in-country. Disconnected and air-gapped sites by design (think defense, manufacturing floors, retail backrooms, ships, mines). Business continuity, meaning a workable Plan B if the public cloud is unreachable for hours or days. Latency-sensitive workloads where the round trip to a region is just too slow. If you build or operate infrastructure that touches any of those bullets, Azure Local is now a first-class option, not a sidecar. And it gets you a cloud-consistent control plane on top of hardware you can put your hands on. What is Azure Local and the Sovereign Private Cloud Let us level-set on the stack, from the metal up. Hardware. Validated and certified through the Azure Local solution catalog, delivered by the OEMs you already buy from. Form factors range from single-node edge boxes up to multi-rack deployments. There is a Premier tier with extra testing, packaged firmware and driver updates, and AI-ready GPU configurations done with NVIDIA. Software-defined data center. Compute, storage, networking, and high availability. As of April 2026, supported SAN storage is GA alongside the existing hyperconverged storage spaces direct model. That gets you up to 64 nodes in disaggregated mode and 16 nodes in hyperconverged mode per instance. Workload plane. Linux and Windows VMs, custom images, your own Kubernetes distribution, or AKS enabled by Arc with the same management experience you have in Azure today. Arc-enabled control plane. This is where Azure Local stops being “another on-prem stack” and starts feeling like Azure. Defender, Azure Monitor, Azure Update Manager, Policy, RBAC, Resource Manager, all of it surfaces against your on-prem instance. Disconnected operations. Microsoft packaged a subset of the control plane (portal, Resource Manager, key management services) into an appliance you deploy on-premises. Connect your Azure Local infrastructure to the local appliance instead of public Azure, and you have a fully air-gapped deployment with a familiar API surface. On top of that base, the Sovereign Private Cloud bundles workloads you can run locally: Foundry Local for AI inferencing, Microsoft 365 Local (Exchange Server, SharePoint Server, Skype for Business Server) for productivity fallback, Azure Virtual Desktop on Azure Local for VDI, and GitHub Enterprise Local (in private preview at the time of the session) for source and CI/CD. How it works in production In the demo, Thomas drove the whole show from the Azure Arc Center in the Azure portal. A few things stood out for me as someone who has spent too many late nights patching clusters. One pane, many sites. The overview page rolls up every Azure Local instance you own. Thomas mentioned customers running thousands of these things, and the Azure Local Lens workbook in Azure Monitor is built to manage at that scale. Resources feel like Azure resources. An instance, a node, a VM, an AKS cluster, they all live inside Azure Resource Manager. RBAC, activity logs, tags, ARM templates, everything you expect. Update is a single button. The Solution Builder Extension packages OS, management software, drivers, and firmware into one validated update. You hit “update,” it orchestrates live migrations node by node, and it blocks the operation if something is not ready. No more cherry-picking driver bundles at 2 AM. Security defaults are real. BitLocker on OS and data volumes, SMB signing, App Control on the hypervisor hosts, drift detection that flags configuration changes back to the portal. Resiliency is layered. Storage spaces direct two-way or three-way mirroring, rack-aware clustering, live migration for maintenance, and Azure Site Recovery for site-to-cloud replication (currently preview). Site-to-site ASR between two Azure Local instances is in development. Veeam, Rubrik, and Commvault all integrate for backup. In short, the boring operational moments are the ones that benefit the most. Patching, monitoring, identity, alerting, they collapse into the tools you already use in Azure. When to use it and real-world scenarios This is not a “rip everything out of Azure” pitch. Thomas was very honest. Azure is still the right home for the vast majority of workloads. Azure Local earns its keep in a few specific places. Regulated or sovereign workloads. Government, defense, financial services, healthcare where the law or the contract says the data stays put. Disconnected or air-gapped sites. Field operations, classified networks, ships, mines, remote infrastructure where reliable connectivity is not in scope. Business continuity for productivity. Microsoft 365 Local as a fallback for Exchange and SharePoint if the cloud service is unreachable. From the session Q&A, M365 Local is GA, and it is the Exchange / SharePoint / Skype for Business trio. Entra ID and Intune are not in scope of the local bundle. Edge and latency-bound workloads. Manufacturing line control, retail in-store inference, healthcare imaging, anywhere a 30-millisecond round trip is a problem. Sovereign AI. Foundry Local on Azure Local lets you serve models on local GPUs without round-tripping to the cloud. Models stay local, data stays local, inference stays fast. Bi-directional workload mobility. With Sovereign Private Landing Zones, you design once and keep workloads portable between Azure and Azure Local based on a service-compatible subset. Getting Started If you are picking this up cold, here is a sensible on-ramp: Start with the official docs on Sovereign Private Cloud and Azure Local. Read them with your architect hat on, not just your operator hat. Design matters here. Browse the Azure Local solution catalog and filter by Premier solutions and by your target scenario (disconnected operations, M365 Local, AI workloads, GPU support). The hardware shape drives a lot of downstream decisions. Talk to your OEM about a validated node, and talk to your Microsoft account team or a sovereign partner. The partner ecosystem in this space is mature, and they will save you weeks. Stand up a small connected instance first to learn the Arc Center experience, the update flow, and Azure Monitor integration. Even a one-node or two-node lab is enough to internalize the model. For disconnected, size for the extra capacity the control plane appliance needs, plan your local identity (Active Directory with AD FS) and your local monitoring integration up front. If you live in Azure today and need workload portability, look at Sovereign Private Landing Zones so you do not paint yourself into a corner with services that have no on-prem equivalent. Resources What is Sovereign Private Cloud? on Microsoft Learn Azure Local documentation Disconnected operations for Azure Local Azure Arc product page Azure Site Recovery product page Foundry Local documentation on Microsoft Learn Foundry Local on GitHub Sovereign Landing Zones on GitHub Watch the rest of the Summit This was just one of the sessions at the Microsoft Azure Infra Summit 2026. If you want more peer-to-peer technical content from the Azure infrastructure community, grab a coffee and queue up the full playlist here: https://aka.ms/MAIS/2026-Playlist There is plenty of good stuff covering Bicep, AKS networking, storage, IaC, and more. If you spin up an Azure Local instance after watching the session, or if you are already running one in anger, drop a comment and let me know how it goes. What works, what hurts, what you wish was better. That is how we all level up. Cheers! Pierre Roman
Pierre_Roman
Jun 22, 2026 Place ITOps Talk Blog
229Views
0likes
0Comments
How to use Instance Mix with Azure Virtual Machine Scale Sets
When a Virtual Machine Scale Set needs to scale out, it normally uses a single VM size (also known as SKU). Whilst that is simple, it can also become a constraint. For example the scale out might be limited if that specific size has limited regional capacity, quota pressure, or a price profile that is not ideal for every scale-out event. Instance Mix for Azure Virtual Machine Scale Sets allows you to configure your scale out with more options by define multiple compatible VM sizes in a single scale set. When scaling out, Azure can choose from your list during provisioning based on the allocation strategy you select. What is Instance Mix? Instance Mix is a Virtual Machine Scale Sets capability that lets you specify up to five VM sizes for a single scale set that uses Flexible orchestration mode. Instead of setting the scale set to a single size such as Standard_D2s_v5 , you set the scale set SKU name to Mix and define the real VM sizes in the skuProfile . At provisioning time, Azure uses the skuProfile.vmSizes list and the selected allocation strategy to decide which VM size to deploy. The high-level model looks like this: { "sku": { "name": "Mix", "capacity": 2 }, "properties": { "skuProfile": { "vmSizes": [ { "name": "Standard_D2s_v5" }, { "name": "Standard_D2as_v5" }, { "name": "Standard_D2s_v4" } ], "allocationStrategy": "CapacityOptimized" } } } You can use this if your workload can run on more than one VM size and you want to improve provisioning success, optimize for cost, or align allocation with reservations or savings plans. When should you use Instance Mix? Instance Mix is a good fit when: Your application can run correctly on several similar VM sizes. You want scale-out to have more than one capacity option in a region. You run Spot VMs and want Azure to prefer lower-priced capacity when available. You have reservations for specific sizes, or want to favor VM sizes with better savings-plan economics, and need predictable priority. You want one scale set instead of several separate scale sets for equivalent worker capacity. It is usually best for stateless or horizontally scalable workloads, such as web front ends, API tiers, queue workers, batch workers, and other scale-out services where each instance performs the same role. Avoid using Instance Mix as a way to combine very different machines in one pool. For example, mixing a small general-purpose VM with a much larger memory-optimized VM can make capacity planning, load distribution, and performance troubleshooting harder. Best practice is to use sizes with similar vCPU and memory for balanced load distribution and similar VM types for consistent performance. Key requirements and limits Before you deploy, check these requirements: Requirement Detail Orchestration mode Instance Mix is available only for Virtual Machine Scale Sets using Flexible orchestration mode. Number of VM sizes You can specify up to five VM sizes. VM family support The Instance Mix skuProfile supports A, B, D, E, and F VM families. Architecture Do not mix CPU architectures. For example, do not mix Arm64 and x64 sizes in the same Instance Mix. Storage interface Do not mix incompatible storage interfaces such as SCSI and NVMe. Premium storage capability Do not mix VM SKUs that use premium storage with SKUs that do not. Security profile The selected sizes must use a compatible security profile. Local disk configuration The selected sizes must have a consistent local disk configuration. Quota You must already have quota for the VM sizes you include. Instance Mix does not request quota for you. Unsupported scenarios Instance Mix currently does not support Standby Pools, Azure Dedicated Host, Proximity Placement Groups, on-demand capacity reservations, or diffDiskSettings on the OS disk. Microsoft Learn currently states that all public Azure regions support Instance Mix. VM size availability and quota still vary by region and subscription, so you should check the exact sizes you plan to use before deploying. Choose an allocation strategy Instance Mix supports three allocation strategies. Strategy Best for Behavior LowestPrice Cost-sensitive and fault-tolerant workloads, especially Spot Azure prefers the lowest-priced VM sizes in the list while considering available capacity. It deploys as many of the lowest-priced VMs as capacity allows before moving to higher-priced sizes, so higher-cost sizes may be selected to secure capacity. This is the default strategy if you do not specify one. CapacityOptimized Production workloads where provisioning success is more important than the lowest possible price Azure prioritizes VM sizes that have the highest likelihood of available capacity in the target region. Cost is not considered, and higher-cost sizes may be selected to secure capacity. Prioritized Predictable ordering, reservations, or savings plan alignment Azure follows user-defined rank values on VM sizes when provided, subject to quota and regional capacity. Lower rank values have higher priority. This strategy is currently documented as preview. Rank note: Use rank only with Prioritized . Ranks are optional with Prioritized ; when specified, lower numbers have higher priority, and rank values can be duplicated or non-sequential. Omit ranks for LowestPrice and CapacityOptimized . For many operations teams, CapacityOptimized is a practical starting point for production scale sets because it is designed to improve the chance of successful provisioning. For Spot-heavy or highly cost-sensitive workloads, LowestPrice may be a better first choice. If you have reservations for specific sizes, or want to favor VM sizes with better savings-plan economics, use Prioritized and rank those reservation-backed or preferred sizes first. Walkthru Prereqs The commands below are Bash-specific. Use Azure Cloud Shell in Bash mode or another Bash environment with Azure CLI installed. You need: An Azure subscription. Permission to create resource groups, networking, and Virtual Machine Scale Sets. Azure CLI version 2.66.0 or later. An SSH-capable client if you plan to connect to the Linux instances. Check your Azure CLI version: az version --query '"azure-cli"' --output tsv Sign in and select the subscription you want to use: az login az account set --subscription "<subscription-id-or-name>" Set variables for the example: RG="rg-vmss-instancemix-demo" LOCATION="australiaeast" VMSS="vmss-instancemix-demo" ADMIN_USER="azureuser" You can change LOCATION to another public Azure region if you don't feel like sharing a datacenter rack with drop bears and the occasional kangaroo. Keep the VM sizes in the examples compatible if you change regions or families. Step 1: Check VM size availability in the region Before creating the scale set, check that the candidate VM sizes are available in your target region and subscription. az vm list-skus \ --location "$LOCATION" \ --resource-type virtualMachines \ --size Standard_D \ --all \ --output table For this walkthrough, we will use these D-series sizes: Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 In the az vm list-skus output, check the Restrictions column. If a size is restricted in your region or subscription, choose another compatible size or deploy in a different region. Step 2: Check quota Instance Mix does not automatically request quota. If one size in the mix lacks quota, Azure can try another size from the list that has quota. If none of the eligible sizes has enough quota, deployment or scale-out can fail. Check current regional VM quota and usage: az vm list-usage \ --location "$LOCATION" \ --output table Look for: Total Regional vCPUs The family quota rows that match your selected VM sizes, such as DSv5, DASv5, or DSv4 family vCPUs If quota is too low, either request a quota increase or choose sizes that have available quota in the target region. Step 3: Create a resource group Create a resource group for the demo: az group create \ --name "$RG" \ --location "$LOCATION" Step 4: Create a VM Scale Set with Instance Mix The key settings are: --orchestration-mode Flexible --vm-sku Mix --skuprofile-vmsizes --skuprofile-allocation-strategy Create the scale set: az vmss create \ --resource-group "$RG" \ --name "$VMSS" \ --location "$LOCATION" \ --orchestration-mode Flexible \ --image Ubuntu2204 \ --vm-sku Mix \ --skuprofile-vmsizes Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 \ --skuprofile-allocation-strategy CapacityOptimized \ --instance-count 2 \ --admin-username "$ADMIN_USER" \ --generate-ssh-keys This creates a Flexible orchestration scale set with two instances. Because the scale set uses Instance Mix, the scale set SKU name is Mix , and the allowed VM sizes are stored in skuProfile.vmSizes . Do not expect every VM size in the list to appear immediately. Azure may deploy all initial instances using the same VM size if that satisfies the selected allocation strategy and capacity is available. The point of Instance Mix is to give Azure more eligible choices during provisioning and scaling. Step 5: Verify the Instance Mix configuration View the complete skuProfile : az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile" \ --output jsonc Example output: { "allocationStrategy": "CapacityOptimized", "vmSizes": [ { "name": "Standard_D2s_v5" }, { "name": "Standard_D2as_v5" }, { "name": "Standard_D2s_v4" } ] } View only the configured VM sizes: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.vmSizes[].name" \ --output tsv View only the allocation strategy: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.allocationStrategy" \ --output tsv List the current instances and their VM sizes. Because Flexible orchestration instances are standard Azure VMs, use az vm list for full instance details and filter by the scale set resource ID: VMSS_ID=$(az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query id \ --output tsv) az vm list \ --resource-group "$RG" \ --query "[?virtualMachineScaleSet.id=='$VMSS_ID'].{Name:name, VMSize:hardwareProfile.vmSize, ProvisioningState:provisioningState}" \ --output table Step 6: Scale out and observe allocation Scale the set from two instances to five: az vmss scale \ --resource-group "$RG" \ --name "$VMSS" \ --new-capacity 5 List the instances again: VMSS_ID=$(az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query id \ --output tsv) az vm list \ --resource-group "$RG" \ --query "[?virtualMachineScaleSet.id=='$VMSS_ID'].{Name:name, VMSize:hardwareProfile.vmSize, ProvisioningState:provisioningState}" \ --output table You may see one VM size or several VM sizes. The result depends on allocation strategy, quota, regional capacity, and the sizes in your skuProfile . Step 7: Add autoscale Instance Mix does not require autoscale, but it becomes especially useful when a workload scales out under demand. Autoscale asks the scale set to add instances; Instance Mix gives Azure multiple eligible VM sizes to use for those new instances. Create an autoscale profile: az monitor autoscale create \ --resource-group "$RG" \ --resource "$VMSS" \ --resource-type Microsoft.Compute/virtualMachineScaleSets \ --name "vmss-autoscale" \ --min-count 2 \ --max-count 10 \ --count 2 Create a scale-out rule that adds two instances when average CPU is greater than 70 percent for five minutes: az monitor autoscale rule create \ --resource-group "$RG" \ --autoscale-name "vmss-autoscale" \ --condition "Percentage CPU > 70 avg 5m" \ --scale out 2 Create a scale-in rule that removes one instance when average CPU is less than 30 percent for five minutes: az monitor autoscale rule create \ --resource-group "$RG" \ --autoscale-name "vmss-autoscale" \ --condition "Percentage CPU < 30 avg 5m" \ --scale in 1 Tune these thresholds for your workload. Scale out aggressively enough to protect availability, and scale in conservatively enough to avoid unnecessary churn. Step 8: Update the VM sizes in the mix You can change the VM sizes in an existing Instance Mix configuration. The important detail is that updating --skuprofile-vmsizes replaces the full list. It does not add or remove a single size incrementally. For example, this command replaces the mix with four sizes: az vmss update \ --resource-group "$RG" \ --name "$VMSS" \ --skuprofile-vmsizes Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 Standard_D2as_v4 If you want to remove a size, specify the full list of sizes you want to keep. If you want to add a size, specify the full existing list plus the new size. Verify the new list: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.vmSizes[].name" \ --output tsv Step 9: Change the allocation strategy You can also update the allocation strategy. For example, this changes the walkthrough scale set from its original CapacityOptimized strategy to LowestPrice . Use this as a real update example, not as a blanket production recommendation: az vmss update \ --resource-group "$RG" \ --name "$VMSS" \ --set skuProfile.allocationStrategy=LowestPrice When you change the allocation strategy, existing VMs are not immediately reshaped. The new strategy takes effect after the scale set scales in or out. If you change from Prioritized to another allocation strategy, Microsoft Learn notes that you must first nullify the priority ranks associated with the VM sizes. Step 10: Enable Instance Mix on an existing scale set You can enable Instance Mix on a separate existing scale set if it uses Flexible orchestration mode, does not already use Instance Mix, and the selected VM sizes are compatible. Do not run this procedure against the $VMSS scale set created earlier, because that scale set is already Instance Mix-enabled. The required settings are: Set sku.name to Mix . Set sku.tier explicitly to null unless you have confirmed it is already null or absent. Provide at least one VM size in skuProfile.vmSizes . Provide an allocation strategy, or let Azure use the default lowest-price strategy. Set a separate variable for the existing scale set you want to convert: EXISTING_VMSS="my-existing-flex-vmss" Example: az vmss update \ --resource-group "$RG" \ --name "$EXISTING_VMSS" \ --set sku.name=Mix sku.tier=null \ --skuprofile-vmsizes Standard_D2as_v4 Standard_D2s_v5 Standard_D2as_v5 \ --set skuProfile.allocationStrategy=CapacityOptimized As with other Instance Mix updates, existing VM instances are not necessarily changed immediately. The configuration is used for subsequent scale actions. Portal procedure If you prefer the Azure portal: Go to Virtual machine scale sets. Select Create. On the Basics tab, choose the subscription, resource group, scale set name, region, image, and administrator settings. Set Orchestration mode to Flexible. In the Size section, choose Select up to 5 sizes. Select up to five compatible VM sizes. Choose the Allocation strategy. If you choose Prioritized (preview), a Rank size section appears below Allocation strategy. Select Rank priority to open the prioritization blade and order the VM sizes based on your preferred priority. Complete the remaining tabs for networking, management, health, scaling, and advanced settings. Select Review + create, then create the scale set. After deployment, open the scale set and check the Overview blade. In the Properties section, check Size for the configured VM sizes and Management for the allocation strategy. Operational tips Use similar sizes for predictable application behavior For web, API, and worker tiers, choose sizes with similar vCPU and memory. This makes autoscale behavior easier to reason about and helps load distribution remain balanced. Do not rely on Instance Mix as a quota workaround Instance Mix can use another listed size if one size lacks quota, but it does not request quota for you. Check quota before production deployment, especially if your autoscale maximum is high. Align reservations and savings plans with Prioritized Reserved instance pricing and savings-plan discounts can apply with Instance Mix. If you want to consume reservation-backed sizes first, or favor sizes with better savings-plan economics, use the Prioritized strategy and rank those sizes first. Use Spot Priority Mix for Spot plus Standard Instance Mix can be used with both Spot and Standard VMs. The Instance Mix FAQ says to use Spot Priority Mix when you need a defined split between Spot and Standard capacity. Caveat: Instance Mix supports B-family sizes for regular capacity, but Azure Spot VMs do not support B-series or promo versions of any size. If your scale set will use Spot capacity, choose Spot-supported sizes for the mix. Be careful with unsupported properties If a deployment template includes unsupported properties such as Azure Dedicated Host, on-demand capacity reservation, Standby Pools, Proximity Placement Groups, or OS disk diffDiskSettings , remove those settings before using Instance Mix. Troubleshooting common deployment errors Error code Meaning Fix SkuProfileAllocationStrategyInvalid The allocation strategy is not valid. Use CapacityOptimized , Prioritized , or LowestPrice for the allocation strategy. SkuProfileVMSizesCannotBeNullOrEmpty No VM sizes were provided. Add at least one VM size to skuProfile.vmSizes . SkuProfileHasTooManyVMSizesInRequest More than five VM sizes were specified. Reduce the list to no more than five VM sizes. SkuProfileVMSizesCannotHaveDuplicates The same VM size appears more than once. Remove duplicate VM sizes. SkuNameMustBeMixIfSkuProfileIsSpecified A skuProfile was provided, but sku.name is not Mix . Set sku.name to Mix . SkuTierMustNotBeSetIfSkuProfileIsSpecified sku.tier is set when using Instance Mix. Set sku.tier to null or omit it. SkuProfileScenarioNotSupported The template includes a property not supported with Instance Mix. Remove unsupported properties such as host groups, capacity reservations, or standby pool settings. Instance Mix gives your operations team a practical way to make scale sets more flexible. By defining several compatible VM sizes and selecting the right allocation strategy, you can improve scale-out success, give Azure more capacity choices, and better align VM allocation with cost or reservation goals. For most production workloads, start with similar VM sizes, verify regional availability and quota, use Flexible orchestration mode, and test scale-out behavior before relying on the configuration in a critical environment. Further reading on Microsoft Learn: Use multiple Virtual Machine sizes with instance mix Create a scale set using instance mix Orchestration modes for Virtual Machine Scale Sets Update instance mix settings on an existing scale set Instance Mix FAQ and troubleshooting
OrinThomas
May 24, 2026 Place ITOps Talk Blog
306Views
0likes
0Comments
The Microsoft Azure Infra Summit 2026 Schedule Is Live.
Hello Folks, I promised the full agenda would drop soon. Today’s the day. The schedule is locked in, the approved sessions are on the board, and I want to walk you through what three days of deep-technical, engineering-led Azure content looks like. A quick refresher before we get into the content: this event is free, it’s virtual, and it’s built by engineering for engineering. Most sessions are at the L300–L400 level, which means we’re skipping the marketing slide and getting straight to the architecture, the gotchas, and the “here’s what actually happens in production” stories you came for. We’re starting at 8:00 AM Pacific each day and running solid technical content through the afternoon. You can still register here (https://aka.ms/MAIS-reg) We organized the three days around the pillars our community keeps coming back to, Build, Operate, and Optimize. Day 1 leans into Build so you leave the keynote with momentum, Day 2 bridges Build into Operate (where most of us actually spend our workdays), and Day 3 is pure Optimize, resiliency, cost, performance, and networking, before we close things out. The full 3-day agenda (all times Pacific) Online Schedule Here Day 1, Tue, May 19 · BUILD Day 2, Wed, May 20 · BUILD + OPERATE Day 3, Thu, May 21 · OPTIMIZE + Closing 8:00 KEYNOTE: Welcome & Azure Infrastructure Vision 8:00, Build and Optimize a Data Lakehouse for Unified Data Intelligence 8:00, Achieving Zonal Resiliency in Azure Infrastructure 9:00, Build a Sovereign Private Cloud with Azure Local 8:45, Designing Azure Networks That Scale: From Small Deployments to Enterprise-Grade 8:30, Architecting Resilient Azure Platforms: Durable Functions, Cosmos DB, and DR by Design 9:45, The Azure Deployment Agent: How AI Turns a Prompt into a Production-Ready Workload 9:30, From Alert to Resolved: Building a Self-Healing Azure Platform with SRE Agent 9:00, Optimizing EDA & HPC Pipelines on Azure: High-Performance Shared Storage with Azure NetApp Files 10:15, ALZ IaC Accelerator: Deploy Your Azure Platform Landing Zone with IaC 10:15, Agentic Migrations & Modernization 9:30, Elastic SAN for AVS Datastores: Best Price-Performance External Storage 11:00, Building Secure, Well-Architected Azure Workloads by Default with Azure Verified Modules and GitHub Copilot 10:45, Simplifying File Share Management and Control for Azure Files 10:00, Premium SSD v2 Disk: Best Price-Performance Block Storage for VMs and Containers 11:45, Best Practices for Infrastructure as Code CI/CD on Azure 11:30, Marketplace Image Protection: Safeguarding Workloads Through Patching and Graceful Deprecation 10:45, Optimizing File Storage for AI and Cloud-Native Workloads on Azure 12:30, Modern Ingress for AKS: Introducing Application Gateway for Containers (AGC) 12:00, Operating Hybrid at Scale: Real-World Azure Arc Patterns for Governance, Security, and Cost Control 11:30, Cut Storage Costs, Boost ROI: Optimizing Your Storage TCO on Azure Object Storage 13:15, End-to-End Security on AKS Using Azure Application Gateway for Containers with Managed Cilium 12:45, Run At-Scale On-Premises and Cloud Assessments and Migrations to Azure Storage 12:15, How to Build Resilient Networks Using Azure Networking, What’s New in Azure Software Load Balancing 14:00, Deployment Stacks: Getting Started 13:30, Modernize VDI with Azure Files and Entra Cloud-Native Identities 13:00, AKS Networking at Scale, CNI, Security, and Multi-Cluster Networking with Accelerated Performance 14:30, Accelerating Automated VM Image Pipelines with Azure Image Builder and Azure Compute Gallery 14:15, Operating Azure Backup at Scale: Day-2 Excellence for IaaS, PaaS, and Storage Workloads 13:45, Kubenet Deprecation, Futureproofing AKS IPAM and Dataplane Configurations 15:00, Troubleshooting Kubernetes Networking with an AI Diagnostic Assistant 14:15, Implement Zero-Tolerance Downtime Web Apps with Azure Front Door 14:45, Closing: Azure Infrastructure Applied Skills and Certifications What to do right now Block your calendar, May 19, 20, and 21, 8:00 AM PT start each day. Check out www.azureinfrasummit.com for more information. Register, it’s free. Pick your sessions, the online schedule has ICS files for each session. Build your personal track across Build, Operate, and Optimize. Bring your team, the agenda is deliberately wide: platform engineers, SREs, storage folks, network folks, AKS operators, IaC builders, and backup/DR owners will all find their sessions. We put a lot of work into making sure every slot earned its place, these are engineering-delivered, production-grounded, no-fluff sessions. The speakers are the people shipping the features you’re using in Azure. Can’t wait to see you online May 19–21. Until then, Cheers! Pierre Roman
Pierre_Roman
Apr 22, 2026 Place ITOps Talk Blog
3.9KViews
4likes
1Comment
Join us at Microsoft Azure Infra Summit 2026 for deep technical Azure infrastructure content
Microsoft Azure Infra Summit 2026 is a free, engineering-led virtual event created for IT professionals, platform engineers, SREs, and infrastructure teams who want to go deeper on how Azure really works in production. It will take place May 19-21, 2026. This event is built for the people responsible for keeping systems running, making sound architecture decisions, and dealing with the operational realities that show up long after deployment day. Over the past year, one message has come through clearly from the community: infrastructure and operations audiences want more in-depth technical content. They want fewer surface-level overviews and more practical guidance from the engineers and experts who build, run, and support these systems every day. That is exactly what Azure Infra Summit aims to deliver. All content is created AND delivered by engineering, targeting folks working with Azure infrastructure and operating production environments. Who is this for: IT professionals, platform engineers, SREs, and infrastructure teams When: May 19-21, 2026 - 8:00 AM–1:00 PM Pacific Time, all 3 days Where: Online Virtual Cost: Free Level: Most sessions are advanced (L300-400). Register here: https://aka.ms/MAIS-Reg Built for the people who run workloads on Azure Azure Infra Summit is for the people who do more than deploy to Azure. It is for the people who run it. If your day involves uptime, patching, governance, monitoring, reliability, networking, identity, storage, or hybrid infrastructure, this event is for you. Whether you are an IT professional managing enterprise environments, a platform engineer designing landing zones, an Azure administrator, an architect, or an SRE responsible for resilience and operational excellence, you will find content built with your needs in mind. We are intentionally shaping this event around peer-to-peer technical learning. That means engineering-led sessions, practical examples, and candid discussion about architecture, failure modes, operational tradeoffs, and what breaks in production. The promise here is straightforward: less fluff, more infrastructure. What to expect Azure Infra Summit will feature deep technical content in the 300 to 400 level range, with sessions designed by engineering to help you build, operate, and optimize Azure infrastructure more effectively. The event will include a mix of live and pre-recorded sessions and live Q&A. Throughout the three days, we will dig into topics such as: Hybrid operations and management Networking at scale Storage, backup, and disaster recovery Observability, SLOs, and day-2 operations Confidential compute Architecture, automation, governance, and optimization in Azure Core environments And more… The goal is simple: to give you practical guidance you can take back to your environment and apply right away. We want attendees to leave with stronger mental models, a better understanding of how Azure behaves in the real world, and clearer patterns for designing and operating infrastructure with confidence. Why this event matters Infrastructure decisions have a long tail. The choices we make around architecture, operations, governance, and resilience show up later in the form of performance issues, outages, cost, complexity, and recovery challenges. That is why deep technical learning matters, and why events like this matter. Join us I hope you will join us for Microsoft Azure Infra Summit 2026, happening May 19-21, 2026. If you care about how Azure infrastructure behaves in the real world, and you want practical, engineering-led guidance on how to build, operate, and optimize it, this event was built for you. Register here: https://aka.ms/MAIS-Reg Cheers! Pierre Roman
Pierre_Roman
Apr 07, 2026 Place ITOps Talk Blog
5.9KViews
2likes
2Comments
Overview of Azure Workload Modernization
Azure workload modernization generally means shifting from traditional deployment options, such as running a workload within a VM, to more cloud native components, such as functions, PaaS services, and other cloud architecture components. Shift from VMs to PaaS and Cloud-Native Services: By replatforming to services like Azure App Service for web apps, managed databases (e.g. Azure SQL Database), or container platforms (e.g. Azure Kubernetes Service (AKS)), you offload infrastructure management to Azure. Azure handles patches, scaling, and high availability, so your team can focus on code and features. (Learn more: https://learn.microsoft.com/azure/app-modernization-guidance/plan/plan-an-application-modernization-strategy#iaas-vs-paas) Immediately Leverage Azure’s Built-in Capabilities: You can light up Azure’s ecosystem features for security, compliance, monitoring, and more. For example, without changing any code you can enable Azure Monitor for telemetry and alerting, use Azure’s compliance certifications to meet regulatory needs, and turn on governance controls. Modernizing a workload is about unlocking things like auto-scaling, backup/DR, and patch management that will be handled for you as platform features. (See: https://learn.microsoft.com/azure/well-architected/framework/platform-automation) Treat Modernization as a Continuous Journey. Modernizing isn’t a single “big bang” rewrite, it’s an ongoing process. Once on Azure, plan to iteratively improve your applications as new services and best practices emerge. Implement DevOps pipelines (CI/CD) to regularly deliver updates and refactor parts of the system over time. This allows you to adopt new Azure capabilities (such as improved instance types, updated frameworks, or new managed services) with minimal disruption. By continually integrating improvements – from code enhancements to architecture changes – you ensure your workloads keep getting more efficient, secure, and scalable. (See: https://learn.microsoft.com/azure/app-modernization-guidance/get-started/application-modernization-life-cycle – continuous improvement approach) Use Containers and Event-Driven Architectures to Evolve Legacy Apps: Breaking apart large, tightly-coupled applications into smaller components can drastically improve agility and resilience. Containerize parts of your app and deploy them to a managed orchestrator like Azure Kubernetes Service (AKS) for better scalability and fault isolation. In an AKS cluster, each microservice or module runs independently, so you can update or scale one component without impacting the whole system. In addition, consider introducing serverless functions (via Azure Functions) or event-driven services for specific tasks and background jobs. These approaches enable on-demand scaling and cost efficiency – Azure only runs your code when triggered by events or requests. Adopting microservices and serverless architectures helps your application become more modular, easier to maintain, and automatically scalable to meet demand. (Learn more: https://learn.microsoft.com/azure/architecture/guide/architecture-styles/microservices and https://learn.microsoft.com/azure/azure-functions/functions-overview) Modernize Security and Identity: Update your application’s security posture to align with cloud best practices. Integrate your apps with Microsoft Entra ID for modern authentication and single sign-on, rather than custom or legacy auth methods. This provides immediate enhancements like multi-factor authentication, token-based access, and easier user management across cloud services. Additionally, take advantage of Azure’s global networking and security services, for example, use Azure Front Door to improve performance for users worldwide and add a built-in Web Application Firewall to protect against DDoS and web attacks. By using cloud-native security services (such as Azure Key Vault to manage app secrets and certificates, or Microsoft Defender for Cloud for threat protection), you can significantly strengthen your workload’s security while reducing the operational burden on your team. (See: https://learn.microsoft.com/entra/identity/intro and https://learn.microsoft.com/azure/frontdoor/front-door-overview)
OrinThomas
Mar 15, 2026 Place ITOps Talk Blog
647Views
0likes
0Comments
Azure Migration Challenges (and how to resolve them)
Moving workloads to Azure is rarely plug-and-play. Here are some workarounds for challenges organizations encounter when planning and executing migrations. Server Migration Legacy OS & Software Compatibility Old, out-of-support operating systems may not run in Azure or may perform poorly. Tightly coupled apps tied to specific hardware or OS versions are hard to replicate. Fix: Run compatibility assessments early. Upgrade or patch the OS before migrating, or refactor the workload to run on a supported OS. Performance Sizing On-prem VMs may rely on fast local SSDs or low-latency network links you won't get by default in Azure. Undersizing means poor performance; oversizing means wasted spend. Fix: Use Azure Migrate's performance-based recommendations to right-size your VMs. Network & Identity Integration Migrated servers still need to communicate with on-prem resources and authenticate users. Splitting app servers and auth servers across environments breaks things fast. Fix: Design network topology & identity infrastructure before you move anything. Move workloads that have interdependencies together. Governance & Cloud Sprawl On-prem controls (naming conventions, equipment tags) don't automatically follow you to the cloud. Spinning up resources with a click leads to sprawl. Fix: Set up Azure Policy from day one. Enforce tagging, naming, and compliance rules as part of the migration project—not after. Skills Gaps On-prem server experts aren't automatically fluent in Azure operations. Fix: Invest in cloud operations training before and during the migration. Database Migration Compatibility Not every database engine or version maps cleanly to an Azure equivalent. Fix: Run the Azure Data Migration Assistant early to verify feature and functionality support. Post-Migration Performance Performance depends on the hosting ecosystem; what worked on-prem may not translate directly. Fix: Revisit indexing and configuration after migration. Use SQL Intelligent Insights and Performance Recommendations for tuning guidance. Choosing the Right Service Tier Azure offers elastic pools, managed instances, Hyperscale, and sharding—picking wrong may be costly. Fix: Profile your workload with your DBA and use Azure Migrate's Database Assessment for sizing suggestions. Security Configuration User logins, roles, and encryption settings must migrate with the data. Fix: Map every layer of your on-prem security configuration and implement corresponding controls post-migration. Data Integrity Data types, constraints, and triggers must come over intact with zero loss or corruption. Fix: Use reliable migration tools, test multiple times, and validate row counts and key constraints. Plan cutover during low-usage windows and always have a rollback plan. Application Migration Legacy App Complexity Custom and legacy apps carry years of accumulated config files, hard-coded paths, IP addresses, and environment-specific logging. Each app can feel like its own mini migration project. Fix: Use Azure Migrate's app dependency analysis to map what each app needs before you touch it. Dependency Conflicts Apps may depend on specific framework versions, libraries, or OS features that aren't available or supported in Azure. Fix: Identify and resolve dependency gaps early. Consider containerizing or refactoring apps to isolate them from environment differences. Scale of Effort Dozens or hundreds of apps, each with unique characteristics, create a massive manual workload. Fix: Automate everything you can. Use porting assistants and batch migration tooling to reduce repetitive tasks. Key Takeaway Start assessments early, automate aggressively, set up governance from day one, and train your team before the move—not after. The most likely cause of a migration failure comes from skipping the prep work.
OrinThomas
Mar 04, 2026 Place ITOps Talk Blog
969Views
2likes
1Comment