<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>rss.livelink.threads-in-node</title>
    <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/ct-p/StartupsatMicrosoft</link>
    <description>rss.livelink.threads-in-node</description>
    <pubDate>Tue, 28 Apr 2026 17:13:23 GMT</pubDate>
    <dc:creator>StartupsatMicrosoft</dc:creator>
    <dc:date>2026-04-28T17:13:23Z</dc:date>
    <item>
      <title>The flat-subscription problem</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/the-flat-subscription-problem/ba-p/4513777</link>
      <description>&lt;P&gt;&lt;EM&gt;A real design review: management groups, policies, break-glass accounts, and the five things I'd tweak before going to production.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Here's what I see at most startups when they first show up on Azure: one subscription, one Global Admin, everything in the same resource group, and everyone's an Owner.&lt;/P&gt;
&lt;P&gt;That works when you have three engineers and one environment. It stops working around the time you have a production workload, a dev environment, shared infrastructure, and an engineer who accidentally deleted the wrong resource group on a Friday afternoon.&lt;/P&gt;
&lt;P&gt;The next step is usually "let's create more subscriptions." That's the right instinct. But without management groups and policies tying them together, you end up with four subscriptions, four sets of inconsistent RBAC assignments, no shared tagging strategy, and no audit trail showing who deployed what.&lt;/P&gt;
&lt;P&gt;If you're at this stage and want a starting point, the &lt;A class="lia-external-url" href="https://aka.ms/sslz" target="_blank"&gt;Startup-Scale Landing Zone&lt;/A&gt; gives you an opinionated Bicep template with management groups, policies, and RBAC already wired together. This post goes deeper: what happens when a team takes those concepts and customizes them for their own environment.&lt;/P&gt;
&lt;H2&gt;The design&lt;/H2&gt;
&lt;P&gt;A startup VP of Engineering sent me their proposed management group hierarchy and asked me to review it before going to production. They'd done their homework: read the &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/resource-org-management-groups" target="_blank"&gt;Cloud Adoption Framework&lt;/A&gt; docs, researched config options, and put together a three-level hierarchy with specific policies and RBAC at each level.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Here's the breakdown:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Tenant Root Group&lt;/STRONG&gt; is the automatic top-level MG that Azure creates in every tenant. Be very selective about what you assign here. Anything at this level affects every subscription you'll ever create, including ones that don't exist yet. Some organizations do assign enterprise-wide "must have" policies at root, but for a startup still figuring out its governance posture, keeping root clean and pushing baselines to a company MG one level down gives you more flexibility.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Company MG&lt;/STRONG&gt; sits directly below and carries the baseline that applies to everything: required tags on all resources (env, owner, cost-center, app), allowed regions locked to three US regions, Defender for Cloud enabled everywhere, and all diagnostic logs routed to a central Log Analytics workspace. Engineering gets Reader at this level, so everyone can see everything but can't change anything by default.&lt;/P&gt;
&lt;P&gt;Three child MGs below that:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Nonprod MG&lt;/STRONG&gt; is the relaxed zone. Tags are audited but not denied, so engineers can experiment without being blocked by policy. Public IPs are allowed. Engineering gets Contributor. This is where you iterate fast without filing PIM requests.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Prod MG&lt;/STRONG&gt; is the strict zone. Tags are denied if missing. Public IPs are blocked. Encryption at rest is required. VM SKUs are restricted. Engineering gets Reader by default, and Contributor access is available through &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/entra/id-governance/privileged-identity-management/pim-configure" target="_blank"&gt;PIM&lt;/A&gt; (just-in-time, time-limited activation). You have to explicitly request write access, and it expires.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Platform MG&lt;/STRONG&gt; protects the shared infrastructure that everything depends on. The Terraform state storage account, central Log Analytics workspace, and shared Key Vault all live here. Platform team gets Contributor; everyone else gets Reader. Critical resources are protected from deletion.&lt;/P&gt;
&lt;P&gt;Under each MG, the subscriptions:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;MG&lt;/th&gt;&lt;th&gt;Subscription&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Nonprod&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;dev&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Development and testing&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Nonprod&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;devtest&lt;/STRONG&gt; (MSDN)&lt;/td&gt;&lt;td&gt;Engineer's personal scratch (MSDN-bound)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Prod&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;prod&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Production workloads&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Platform&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;cloud-infra&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Terraform state, Log Analytics, Key Vault, workload identity&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;The parts that nail it&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;The hierarchy is flat and functional.&lt;/STRONG&gt; CAF says keep it &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/resource-org-management-groups" target="_blank"&gt;three to four levels deep&lt;/A&gt; and don't create management groups just for the sake of structure. This design does exactly that: a company MG for baselines, then Nonprod/Prod/Platform for the policy gradient. It's not "the one CAF pattern" (CAF deliberately avoids prescribing a single topology), but it's a clean startup pattern that scales to dozens of subscriptions without restructuring.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Audit in dev, deny in prod.&lt;/STRONG&gt; Dev environments that deny everything become unusable. Engineers stop experimenting. Prod environments that only audit become insecure. The split is the right trade-off: visibility without friction in dev, enforcement without exceptions in prod.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The platform subscription for shared services.&lt;/STRONG&gt; Centralizing Terraform state, the Log Analytics workspace, and shared Key Vault into a separate subscription (with its own RBAC) means application teams can't accidentally delete the infrastructure that manages their infrastructure. This is the "trust boundary" pattern, and most startups skip it until they learn the hard way.&lt;/P&gt;
&lt;H2&gt;What i'd change before going live&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;PIM licensing isn't one-seat-fits-all.&lt;/STRONG&gt; They mentioned having "1 P2 seat" for PIM. &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/entra/id-governance/licensing-fundamentals" target="_blank"&gt;PIM requires an Entra ID P2 (or Governance) license&lt;/A&gt; per user who's eligible for activation, plus anyone who approves or reviews PIM access. If four engineers need just-in-time Contributor access to production and one manager approves, that's five P2 licenses (~$9/user/month). Still cheap insurance compared to "everyone has standing Contributor," but budget for it correctly.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Think about SKU restrictions as a trade-off.&lt;/STRONG&gt; Their prod MG had "restrict to approved SKUs." An allow-list gives you strict standardization (only pre-approved SKUs work), but every time Azure launches a new VM series, someone has to update it. A deny-list ("block these specific expensive or unnecessary SKUs") is easier to maintain since new SKUs are available by default. The right choice depends on your team: if you need tight control over what runs in prod, keep the allow-list. If you move fast and want less policy maintenance, a deny-list with periodic reviews is simpler.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Resource locks beat policy for protecting critical infra.&lt;/STRONG&gt; Their Platform MG had "deny deletion of state storage / log workspace" as a policy. &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources" target="_blank"&gt;Azure Resource Locks&lt;/A&gt; (CanNotDelete) are simpler and more visible for this. A lock shows up right on the resource in the portal, so engineers see it immediately. A deny-delete policy is invisible until it blocks you, and the error message doesn't always make it obvious why. Locks are also easier to temporarily remove when you legitimately need to rotate or replace a resource.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Add cost alerts on every subscription from day one.&lt;/STRONG&gt; Their design didn't mention budget alerts. &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cost-management-billing/costs/tutorial-acm-create-budgets" target="_blank"&gt;Azure Cost Management&lt;/A&gt; lets you set budget thresholds per subscription with email and webhook notifications. Set them before any workloads deploy, not after the first surprise bill. Start with 80% and 100% of expected monthly spend. It takes 5 minutes and can save thousands.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Cap the MSDN subscription.&lt;/STRONG&gt; Their devtest sub was MSDN-bound, described as "personal scratch." MSDN subscriptions come with a monthly credit ($50-$150 depending on the license tier), but the &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/devtest/offer/how-to-manage-the-spending-limit" target="_blank"&gt;spending limit can be removed&lt;/A&gt;, which means charges hit a valid payment method with no cap. Keep the spending limit ON for scratch subs. If it's been removed, set a budget alert at the credit amount. Also note that some Marketplace and external services may bill separately regardless of the spending limit.&lt;/P&gt;
&lt;H2&gt;The break-glass question&lt;/H2&gt;
&lt;P&gt;This team was federating their primary domain with Google Workspace as the SAML identity provider (their whole company runs on Google). They asked: "Can I use my .onmicrosoft.com account as a break-glass account while my federated &lt;a href="javascript:void(0)" data-lia-user-mentions="" data-lia-user-uid="241535" data-lia-user-login="company" class="lia-mention lia-mention-user"&gt;company&lt;/a&gt;.com is my daily driver?"&lt;/P&gt;
&lt;P&gt;Yes. This is exactly the pattern Microsoft recommends.&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-clear-both"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/security/benchmark/azure/mcsb-privileged-access#pa-5-set-up-emergency-access" target="_blank"&gt;Microsoft's security benchmark (PA-5)&lt;/A&gt; specifically calls for cloud-only break-glass accounts that bypass external IdP dependencies. If your Google SAML federation goes down (Google outage, misconfigured SAML cert, domain issues), all federated accounts fail to sign in. Cloud-only .onmicrosoft.com accounts authenticate directly against Entra ID with no external dependency.&lt;/P&gt;
&lt;P&gt;How to harden them:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Create two break-glass accounts.&lt;/STRONG&gt; Microsoft recommends at least two. Store credentials in separate physical locations. One person alone shouldn't be able to access both. Docs: &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/entra/identity/role-based-access-control/security-emergency-access" target="_blank"&gt;Manage emergency access accounts&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use phishing-resistant auth.&lt;/STRONG&gt; Passkeys (FIDO2 security keys) are the strongest option: phishing-resistant and no dependency on a phone or authenticator app that might be unavailable during an emergency. If you already run PKI, certificate-based auth is another viable option. The key is diversity across your two accounts so a single authentication method failure doesn't lock out both. Docs: &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/entra/identity/authentication/howto-authentication-passwordless-security-key" target="_blank"&gt;Enable FIDO2 security key sign-in&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Exclude at least one account from ALL Conditional Access policies.&lt;/STRONG&gt; This is the account that guarantees access if a bad CA policy locks everyone out. Microsoft recommends excluding at least one break-glass account from every CA policy. The second account can optionally have phishing-resistant MFA enforced via CA, giving you a safer fallback for non-federation emergencies.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Assign Global Administrator permanently.&lt;/STRONG&gt; Not through PIM. Break-glass accounts need immediate access. PIM activation requires the normal auth flow, which defeats the purpose in an emergency.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Monitor every sign-in.&lt;/STRONG&gt; Set up alerts in Azure Monitor or Microsoft Sentinel for any authentication from a break-glass account. If these accounts show activity outside an emergency, investigate immediately.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Test quarterly.&lt;/STRONG&gt; Actually sign in with the break-glass accounts on a schedule. Verify the credentials work, the FIDO2 keys work, and the monitoring alert fires. Don't wait for a real emergency to discover something is broken.&lt;/P&gt;
&lt;H2&gt;The pre-production governance checklist&lt;/H2&gt;
&lt;P&gt;Before deploying workloads into your new hierarchy, verify:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;All subscriptions are nested under the correct MG (not dangling under Tenant Root Group)&lt;/LI&gt;
&lt;LI&gt;Baseline policies applied at the company MG and verified with &lt;CODE&gt;Get-AzPolicyAssignment&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;PIM configured with appropriate activation duration (4-8 hours max)&lt;/LI&gt;
&lt;LI&gt;P2 licenses assigned to every user eligible for PIM activation, plus approvers and reviewers&lt;/LI&gt;
&lt;LI&gt;Two break-glass accounts exist, tested, and monitored&lt;/LI&gt;
&lt;LI&gt;At least one break-glass account excluded from all Conditional Access policies&lt;/LI&gt;
&lt;LI&gt;Budget alerts set on every subscription (80% and 100% thresholds)&lt;/LI&gt;
&lt;LI&gt;Resource locks on Terraform state, Log Analytics workspace, and Key Vault&lt;/LI&gt;
&lt;LI&gt;MSDN spending limit verified ON (or budget alert set if removed)&lt;/LI&gt;
&lt;LI&gt;Diagnostic settings routing all activity logs to the central Log Analytics workspace&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Where this fits in the governance journey&lt;/H2&gt;
&lt;P&gt;If you're building Azure governance from zero, here's my recommended reading order:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/demystifying-microsoft-entra-id-tenants-and-azure-subscriptions/4155261" data-lia-auto-title="Demystifying Microsoft Entra ID, Tenants and Azure Subscriptions" data-lia-auto-title-active="0" target="_blank"&gt;Demystifying Microsoft Entra ID, Tenants and Azure Subscriptions&lt;/A&gt; - understand what tenants, subscriptions, and Entra ID actually are&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-has-three-permission-systems-and-youre-probably-confusing-them/4471854" data-lia-auto-title="Azure has three permission systems, and you're probably confusing them" data-lia-auto-title-active="0" target="_blank"&gt;Azure has three permission systems, and you're probably confusing them&lt;/A&gt; - the identity, resource, and billing planes&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;This post&lt;/STRONG&gt; - design your management group hierarchy&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/role-structures-anti-patterns-and-the-10-governance-principles/4510070" data-lia-auto-title="Role Structures, Anti-Patterns, and the 10 Governance Principles" data-lia-auto-title-active="0" target="_blank"&gt;Role Structures, Anti-Patterns, and the 10 Governance Principles&lt;/A&gt; - RBAC patterns and what not to do&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/introducing-the-startup-scale-landing-zone-get-azure-right-from-day-one/4501566" data-lia-auto-title="Introducing the Startup-Scale Landing Zone" data-lia-auto-title-active="0" target="_blank"&gt;Introducing the Startup-Scale Landing Zone&lt;/A&gt; - the full reference architecture&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 22 Apr 2026 19:06:12 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/the-flat-subscription-problem/ba-p/4513777</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-04-22T19:06:12Z</dc:date>
    </item>
    <item>
      <title>Your Azure VM went down and nobody knew why. Here's how to fix that.</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/your-azure-vm-went-down-and-nobody-knew-why-here-s-how-to-fix/ba-p/4513733</link>
      <description>&lt;P&gt;If you've ever had a production VM go unhealthy on Azure and found yourself scrambling to figure out what happened, you're not alone. I work with startups running production workloads on Azure, and this is one of the most common patterns I see: something goes wrong, the team opens a support ticket, and then everyone waits for a root cause while the CTO asks "how do we make sure we know about this before our customers do next time?"&lt;/P&gt;
&lt;P&gt;The good news: Azure already gives you the tools to answer both questions. Most teams just haven't set them up yet.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Scope note:&lt;/STRONG&gt; This post covers &lt;STRONG&gt;platform health and maintenance signals&lt;/STRONG&gt; for Azure VMs. We're not covering guest OS metrics, application telemetry, or Azure Monitor/VM Insights here. If you don't have a dedicated SRE team, these are the highest-leverage Azure-native checks to set up first.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Let's get into it.&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-clear-both"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Step 1: Figure out what actually happened (Resource Health)&lt;/H2&gt;
&lt;P&gt;Before you open a support ticket, check &lt;STRONG&gt;Resource Health&lt;/STRONG&gt;. It's the fastest way to determine whether your VM went down because of something Azure did (platform event) or something on your side (user-initiated or config issue).&lt;/P&gt;
&lt;P&gt;Go to your VM in the Azure portal &amp;gt; &lt;STRONG&gt;Resource Health&lt;/STRONG&gt; blade. You'll see:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Current status&lt;/STRONG&gt;: Available, Unavailable, Degraded, or Unknown&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Health history&lt;/STRONG&gt;: 30 days of state transitions with annotations explaining each one&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Root cause&lt;/STRONG&gt;: For platform-initiated outages on VMs, Azure automatically publishes root cause details within 72 hours, directly in this blade&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The annotations often tell you what kind of event occurred: live migration, host reboot, planned maintenance, degraded hardware, etc. In many cases, you get this information without filing a support ticket.&lt;/P&gt;
&lt;P&gt;If your VM was affected by a live migration, the annotation will show it was a platform-initiated event. Live migration is a memory-preserving operation that causes a brief pause, typically no more than 5 seconds (&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/maintenance-and-updates#maintenance-that-doesnt-require-a-reboot" target="_blank" rel="noopener"&gt;docs&lt;/A&gt;). But if your application is sensitive to even short freezes, or if you're seeing them frequently, that's worth investigating further.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/en-us/azure/service-health/resource-health-overview" target="_blank" rel="noopener"&gt;Resource Health overview&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Step 2: Get notified when it happens (Service Health + Resource Health Alerts)&lt;/H2&gt;
&lt;P&gt;Checking the portal after an incident is fine. Getting an alert &lt;EM&gt;when&lt;/EM&gt; the incident happens is better.&lt;/P&gt;
&lt;H3&gt;Service Health Alerts&lt;/H3&gt;
&lt;P&gt;These notify you about service issues, planned maintenance, health advisories, and security advisories for the Azure services and regions you're actually using. Service Health is best for subscription-level and region-level awareness. If there's a regional maintenance wave driving elevated live migrations, this is how you'd know about it proactively.&lt;/P&gt;
&lt;P&gt;Set them up to notify your ops channel via email, SMS, webhook (Slack, PagerDuty, Teams), or automation via Logic Apps or Azure Functions.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/en-us/azure/service-health/alerts-activity-log-service-notifications-portal" target="_blank" rel="noopener"&gt;Create Service Health alerts&lt;/A&gt; | &lt;A href="https://learn.microsoft.com/en-us/azure/service-health/service-health-alert-webhook-pagerduty" target="_blank" rel="noopener"&gt;PagerDuty integration&lt;/A&gt;&lt;/P&gt;
&lt;H3&gt;Resource Health Alerts&lt;/H3&gt;
&lt;P&gt;These fire when a specific resource (or all resources in a resource group) changes health status. The alert includes health-change details such as status, cause type (platform vs. user-initiated), and descriptive event text, so you get more than a generic "VM is unhealthy" notification.&lt;/P&gt;
&lt;P&gt;This is the "never be surprised again" alert. If you only set up one thing from this post, make it this.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/en-us/azure/service-health/resource-health-alert-monitor-guide" target="_blank" rel="noopener"&gt;Create Resource Health alerts&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Step 3: See it coming (Scheduled Events API)&lt;/H2&gt;
&lt;P&gt;This is the part most teams don't know about, and it's the most powerful tool for handling live migrations gracefully.&lt;/P&gt;
&lt;P&gt;Azure exposes an &lt;STRONG&gt;Instance Metadata Service (IMDS)&lt;/STRONG&gt; endpoint on every VM that gives your application advance notice of upcoming maintenance events. Live migrations show up as &lt;CODE&gt;EventType: "Freeze"&lt;/CODE&gt;. In typical cases, you get up to ~15 minutes between the event appearing and Azure proceeding with the operation, though exact timing varies and some failures (like hardware issues) can bypass the advance notification entirely.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Note:&lt;/STRONG&gt; Most Azure VM families support live migration, but G, L, N, and H series VMs do not. If you run GPU or HPC workloads on these SKUs, you won't see &lt;CODE&gt;Freeze&lt;/CODE&gt; events. You'll still get &lt;CODE&gt;Reboot&lt;/CODE&gt; or &lt;CODE&gt;Redeploy&lt;/CODE&gt; events for other maintenance types.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The endpoint is available from inside the VM at:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here's an example response when a live migration is scheduled:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;{
  "DocumentIncarnation": 1,
  "Events": [
    {
      "EventId": "602d9444-d2cd-49c7-8624-8643e7171297",
      "EventType": "Freeze",
      "ResourceType": "VirtualMachine",
      "Resources": ["my-production-vm"],
      "EventStatus": "Scheduled",
      "NotBefore": "Mon, 22 Apr 2026 19:17:47 GMT",
      "Description": "Virtual machine is being paused for a memory-preserving Live Migration operation.",
      "EventSource": "Platform",
      "DurationInSeconds": 5
    }
  ]
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You can poll this endpoint and use the lead time to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Drain connections&lt;/STRONG&gt; so active users aren't affected&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Checkpoint application state&lt;/STRONG&gt; to recover faster&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Remove the VM from your load balancer&lt;/STRONG&gt; temporarily&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Log the event&lt;/STRONG&gt; so you have a record of migration frequency&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Here's a simple polling script in Python:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import requests
import json
import time

ENDPOINT = "http://169.254.169.254/metadata/scheduledevents"
HEADERS = {"Metadata": "true"}
PARAMS = {"api-version": "2020-07-01"}

def get_scheduled_events():
    response = requests.get(ENDPOINT, headers=HEADERS, params=PARAMS)
    return response.json()

def handle_events(data):
    for event in data.get("Events", []):
        print(f"[{event['EventType']}] {event.get('Description', 'No description')}")
        print(f"  Status: {event['EventStatus']}, Not Before: {event['NotBefore']}")
        print(f"  Duration: {event['DurationInSeconds']}s, Source: {event['EventSource']}")
        # Your graceful drain/checkpoint logic here

def approve_event(event_id):
    """Acknowledge the event so Azure can proceed immediately."""
    payload = json.dumps({"StartRequests": [{"EventId": event_id}]})
    requests.post(ENDPOINT, headers=HEADERS, params=PARAMS, data=payload)

# Poll frequently - the official docs recommend every 1 second for production.
# Adjust based on your workload sensitivity.
while True:
    data = get_scheduled_events()
    handle_events(data)
    time.sleep(1)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Or a quick check in Bash:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;curl -s -H "Metadata:true" --noproxy "*" \
  "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01" | jq .&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;Event approval:&lt;/STRONG&gt; Once your application has drained connections or checkpointed state, it can approve the event by POSTing back with the &lt;CODE&gt;EventId&lt;/CODE&gt;. This tells Azure your app is ready, and the platform can proceed without waiting for the full timeout. If you don't explicitly approve, Azure proceeds when the &lt;CODE&gt;NotBefore&lt;/CODE&gt; time is reached.&lt;/P&gt;
&lt;P&gt;If you're seeing elevated frequency of live migrations, this data lets you quantify the pattern (how often, what times, what durations) and bring hard numbers to a support conversation instead of "it feels like it's happening a lot."&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/windows/scheduled-events" target="_blank" rel="noopener"&gt;Scheduled Events for VMs&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Step 4: Check your overall posture (Azure Advisor)&lt;/H2&gt;
&lt;P&gt;While you're at it, check &lt;STRONG&gt;Azure Advisor's Reliability recommendations&lt;/STRONG&gt; for your VMs. It flags things like:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;VMs not deployed in availability zones&lt;/LI&gt;
&lt;LI&gt;Deprecated VM images that need updating&lt;/LI&gt;
&lt;LI&gt;Missing backup configurations&lt;/LI&gt;
&lt;LI&gt;Other resiliency gaps that make you more susceptible to availability issues&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Advisor won't explain a past incident, but it can help prevent the next one.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/en-us/azure/advisor/advisor-reference-reliability-recommendations" target="_blank" rel="noopener"&gt;Azure Advisor Reliability recommendations&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;A quick note on resilience&lt;/H2&gt;
&lt;P&gt;These tools improve your visibility and response time, but they don't eliminate downtime by themselves. If a VM is truly critical, pair this monitoring with basic resilience patterns: multiple instances behind a load balancer, availability zones, health probes, regular backups, and cross-region recovery where needed. Monitoring tells you what's happening. Architecture determines whether it matters.&lt;/P&gt;
&lt;H2&gt;The setup checklist&lt;/H2&gt;
&lt;H3&gt;Quick wins (15 minutes)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="width: 100%; height: 246px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 34.8px;"&gt;&lt;th style="height: 34.8px;"&gt;#&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;What&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;Why&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;Time&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 58.8px;"&gt;&lt;td style="height: 58.8px;"&gt;1&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;Check Resource Health on your production VMs&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;See if there are past events you didn't know about&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;2 min&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 58.8px;"&gt;&lt;td style="height: 58.8px;"&gt;2&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;Create a Service Health alert for your regions/services&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;Get notified about platform issues proactively&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;3 min&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 58.8px;"&gt;&lt;td style="height: 58.8px;"&gt;3&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;Create Resource Health alerts for your VM resource groups&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;Get notified when any VM changes health state&lt;/td&gt;&lt;td style="height: 58.8px;"&gt;3 min&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;4&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Review Azure Advisor Reliability tab&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Fix any posture gaps&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;2 min&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Advanced hardening (1+ hours depending on your app)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;#&lt;/th&gt;&lt;th&gt;What&lt;/th&gt;&lt;th&gt;Why&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Deploy the Scheduled Events polling script on critical VMs&lt;/td&gt;&lt;td&gt;Get advance notice of live migrations and maintenance&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;Implement drain/checkpoint logic tied to Scheduled Events&lt;/td&gt;&lt;td&gt;Gracefully handle maintenance with zero user impact&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;Wire event approvals into your automation&lt;/td&gt;&lt;td&gt;Control the timing of when Azure proceeds with maintenance&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;Wrapping up&lt;/H2&gt;
&lt;P&gt;The pattern I keep seeing is teams treating Azure VM monitoring as something they'll get to "later." Then an incident happens, the RCA takes longer than anyone wants, and everyone wishes they had visibility sooner.&lt;/P&gt;
&lt;P&gt;The tools are already there. Resource Health tells you what happened. Service Health and Resource Health alerts tell you when it's happening. Scheduled Events tells you before it happens. And Advisor helps you make sure your setup is resilient in the first place.&lt;/P&gt;
&lt;P&gt;Fifteen minutes of setup for the quick wins, and you're in a fundamentally better place than most teams running VMs on Azure today.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 15:49:46 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/your-azure-vm-went-down-and-nobody-knew-why-here-s-how-to-fix/ba-p/4513733</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-04-22T15:49:46Z</dc:date>
    </item>
    <item>
      <title>$17,493 in Undisclosed Marketplace Charges with No Cost Visibility, No Recourse, No Accountability</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/17-493-in-undisclosed-marketplace-charges-with-no-cost/m-p/4510844#M116</link>
      <description>&lt;P&gt;I'm a co-founder of a 13-person startup in the Microsoft for Startups Founders Hub program. I'm posting here because after two months of support tickets, calls, and emails across both Microsoft and Anthropic, I have been unable to get anyone with decision-making authority to address this issue. I'm hoping this reaches someone at Microsoft who can help, and that other affected founders in the program will share their experiences.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What happened:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In February 2026, we deployed Claude Opus 4.6 and Sonnet 4.6 through Azure AI Foundry as part of a migration of our AI infrastructure. We are active users of our Azure sponsorship credits and assumed, as anyone would, that these models were covered the same way Azure OpenAI models are. There was no indication otherwise during deployment.&lt;/P&gt;&lt;P&gt;In early March, we received our first invoice: $1,078.07 (invoice G144899694, billing period 02/01–02/28). We were shocked, but we paid it immediately and removed all Anthropic model deployments from our account to prevent further charges.&lt;/P&gt;&lt;P&gt;It didn't matter. On April 9, we received a second invoice: $16,414.94 (invoice G151890529, billing period 03/01–03/31). Despite removing the deployments in mid-March, charges had already accumulated for the first half of the month. We are unable to pay this invoice. Our total exposure across both invoices is $17,493.01.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Why we had no way to prevent this:&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;No billing distinction at deployment. Azure AI Foundry presents all models (Microsoft-native and third-party) in the same unified interface. There is no warning, label, or confirmation step indicating that certain models are excluded from sponsorship credits.&lt;/LI&gt;&lt;LI&gt;No cost visibility whatsoever. The Azure AI Foundry monitoring dashboard has an "Estimated Cost" section that is completely blank for these models, with a disclaimer: "Cost monitoring is available for Foundry Models sold directly by Azure only." We could see token counts but had zero visibility into what we were being charged.&lt;/LI&gt;&lt;LI&gt;Token counts that don't explain the charges. The dashboards show our Claude Opus 4.6 deployment used 63.4M tokens and our Sonnet 4.6 deployments used roughly 170M tokens combined. At published rates, that should be in the low thousands, not $17,500. My analysis shows the dashboard hides billions of cached tokens (prompt caching reads and writes) that are invisible in the monitoring UI but account for the vast majority of the bill. There is no view in Azure that provides a breakdown of these charges by token type.&lt;/LI&gt;&lt;LI&gt;No alerts or notifications. There were no cost alerts, no threshold warnings, and no notifications at any point.&lt;/LI&gt;&lt;LI&gt;No indication in any Azure portal that charges were hitting our credit card&lt;STRONG&gt;.&lt;/STRONG&gt; There was no line item, no pending charge, no Marketplace spend summary - nothing anywhere in the Azure ecosystem that showed dollars accumulating against our payment method for these deployments.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;What happened when we asked for help:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Azure Support (TrackingID#2603090040002936): After a month-long wait, a support engineer told us Microsoft cannot issue credits for Marketplace charges and directed us to Anthropic. The first version of the response email referenced "Azure DDoS Protection Standard" instead of our actual issue, suggesting the volume of similar cases in the queue.&lt;/LI&gt;&lt;LI&gt;Anthropic: Their AI support bot responded within one minute with a blanket statement and closed the ticket four hours later. I escalated, but have still not received a response over a month later.&lt;/LI&gt;&lt;LI&gt;Microsoft for Startups Team (TrackingID#2604070040009778): Told us they cannot apply Marketplace charges against sponsorship credits and referred us to a Marketplace billing contact.&lt;/LI&gt;&lt;LI&gt;Azure Marketplace billing contact: Pending response.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;The pattern is clear&lt;/STRONG&gt;: Microsoft directs us to Anthropic. Anthropic directs us to Microsoft. The Microsoft for Startups team directs us to Azure Marketplace billing. No one takes responsibility.&lt;/P&gt;&lt;P&gt;What I'm asking for:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;A full refund of $17,493.01 across invoices G144899694 and G151890529&lt;/LI&gt;&lt;LI&gt;That Microsoft implement clear billing warnings in Azure AI Foundry before deploying models that are excluded from sponsorship credits.&lt;/LI&gt;&lt;LI&gt;That Microsoft provide actual cost visibility in the monitoring dashboard for all models deployed through AI Foundry, not just first-party models.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;To other founders in the program&lt;/STRONG&gt;:&lt;/P&gt;&lt;P&gt;If you have experienced this same issue, please reply to this thread. I know I'm not alone because this has been covered by The Register, InfoWorld, and Computerworld, and at least 20 founders have signed a Change.org petition about it. The more founders who come forward with specific amounts and case numbers, the harder this is to ignore.&lt;/P&gt;&lt;P&gt;We joined Microsoft for Startups because the program was supposed to help early-stage companies manage infrastructure costs during the most financially vulnerable period of our growth. Instead, the program's own platform generated charges within 2 weeks that exceed the total sponsorship credits we've consumed over the past year, with no visibility, no warning, and no path to resolution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;With Further Inc. | Microsoft for Startups Founders Hub&lt;/P&gt;&lt;P&gt;Azure Support TrackingID #2603090040002936&lt;/P&gt;&lt;P&gt;Startup Support TrackingID #2604070040009778&lt;/P&gt;</description>
      <pubDate>Mon, 13 Apr 2026 14:01:05 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/17-493-in-undisclosed-marketplace-charges-with-no-cost/m-p/4510844#M116</guid>
      <dc:creator>chrisbaker2000</dc:creator>
      <dc:date>2026-04-13T14:01:05Z</dc:date>
    </item>
    <item>
      <title>Role Structures, Anti-Patterns, and the 10 Governance Principles</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/role-structures-anti-patterns-and-the-10-governance-principles/ba-p/4510070</link>
      <description>&lt;P&gt;Part 3 of 3: The implementation playbook for engineering, finance, and security teams&lt;/P&gt;
&lt;P&gt;In&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-has-three-permission-systems-and-youre-probably-confusing-them/4471854" target="_blank" rel="noopener" data-lia-auto-title="Part 1" data-lia-auto-title-active="0"&gt;Part 1&lt;/A&gt;, we established Azure's three-plane model: Entra for identity, RBAC for resources, Commerce for billing. In&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/marketplace-governance-and-the-cross-plane-bridge/4510067" target="_blank" rel="noopener" data-lia-auto-title="Part 2" data-lia-auto-title-active="0"&gt;Part 2&lt;/A&gt;, we explored where those planes collide: Marketplace governance, Managed Identity, and ABAC.&lt;/P&gt;
&lt;P&gt;Now it's time to get practical. This post covers the patterns that work, the anti-patterns that don't, and the governance principles that every digital-native company should adopt&amp;nbsp;&lt;EM&gt;before&lt;/EM&gt;&amp;nbsp;they're forced to adopt them after an incident.&lt;/P&gt;
&lt;H2&gt;7 anti-patterns to avoid&lt;/H2&gt;
&lt;P&gt;These seven anti-patterns appear repeatedly across AI, SaaS, and digital-native customers. Every one of them has caused real incidents — surprise invoices, accidental deletions, compliance failures, or governance breakdowns.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 1: Giving engineers billing permissions&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt; Engineers are given Billing Reader or Billing Contributor roles "so they can see costs." They can now see MACC credits, private offer terms, commercial discounts, and Marketplace purchase history, none of which they need.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt;&amp;nbsp;Engineers purchasing Marketplace SaaS without oversight. Surprise invoices. Procurement loses visibility into vendor commitments.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Engineers need&amp;nbsp;&lt;STRONG&gt;Cost Management Reader&lt;/STRONG&gt;&amp;nbsp;(RBAC) for usage-based cost visibility. They do&amp;nbsp;&lt;EM&gt;not&lt;/EM&gt; need billing roles. If they need to understand MACC impact, create a reporting process, don't give them the keys.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 2: Giving finance subscription owner access&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt;&amp;nbsp;Finance teams are given Owner or Contributor roles on subscriptions "so they can track spending." They now have the ability to deploy, modify, and delete production resources.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt; Massive over-permissioning. Finance can accidentally delete production resources. Audit risk, regulators will flag this.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Finance roles belong in the&amp;nbsp;&lt;STRONG&gt;Billing plane&lt;/STRONG&gt;, not the resource plane. Give finance&amp;nbsp;&lt;STRONG&gt;Billing Reader&lt;/STRONG&gt;&amp;nbsp;for credit and invoice visibility. If they also need resource cost data, add&amp;nbsp;&lt;STRONG&gt;Cost Management Reader&lt;/STRONG&gt;&amp;nbsp;(RBAC) scoped to the appropriate subscriptions — that's a read-only, resource-plane role.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 3: Too many subscription owners&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt;&amp;nbsp;Every senior engineer, team lead, and sometimes product managers get Owner on subscriptions. The logic: "they need to unblock themselves."&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt; No accountability, when everyone is Owner, nobody is. High blast radius. Hard to trace role assignments when troubleshooting.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Maximum&amp;nbsp;&lt;STRONG&gt;2–3 Owners&lt;/STRONG&gt;&amp;nbsp;per subscription: Platform Lead, SRE Lead, and optionally the Cloud Architect. Everyone else gets Contributor or scoped roles. Use PIM for emergency elevation.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 4: Believing Entra Global Admin = Azure Owner&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt; Leadership assumes Global Admin has universal access: subscriptions, resources, billing. They don't. Global Admin controls the identity plane only.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt;&amp;nbsp;Security teams thinking they can see all resources (they can't). Incorrect governance designs that assume Entra = RBAC.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Train leadership explicitly:&amp;nbsp;&lt;STRONG&gt;Entra ≠ RBAC ≠ Billing&lt;/STRONG&gt;. Three planes, three sets of roles, zero overlap. A Global Admin who needs resource access must be separately granted RBAC roles.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 5: Deploying marketplace SaaS without finance&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt;&amp;nbsp;Engineers purchase Marketplace tools directly because they have billing permissions (see Anti-Pattern 1) or because the org hasn't restricted Marketplace purchases.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt;&amp;nbsp;Incorrect MACC burn. Licensing duplicates. Vendor lock-in without legal review. Private offer terms not applied.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Require finance approval for all paid Marketplace purchases. Follow the five-step workflow from&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/marketplace-governance-and-the-cross-plane-bridge/4510067" target="_blank" rel="noopener" data-lia-auto-title="Part 2" data-lia-auto-title-active="0"&gt;Part 2&lt;/A&gt;: Engineer requests → Finance reviews → Billing executes → Engineering deploys → Cost monitoring activated.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 6: Mixed dev/test/prod in one subscription&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt;&amp;nbsp;To save time, teams put all environments in one subscription.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt;&amp;nbsp;Can't isolate production costs. A Contributor on the sub can modify both dev and prod. Can't enforce stricter policies on prod without affecting dev. Compliance teams can't get clean boundaries.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt;&amp;nbsp;Separate subscriptions by environment. Pattern:&amp;nbsp;&lt;STRONG&gt;1 subscription per environment per workload&lt;/STRONG&gt;&amp;nbsp;(or at minimum per environment). Use cross-subscription networking via Hub &amp;amp; Spoke or Landing Zones.&lt;/P&gt;
&lt;H3&gt;❌ Anti-Pattern 7: Not using Azure Policy&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;What happens:&lt;/STRONG&gt;&amp;nbsp;Teams deploy freely with no guardrails. Over time: VMs in unapproved regions, GPU SKUs in non-production, storage accounts without encryption, missing tags, public IP drift.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Symptoms:&lt;/STRONG&gt;&amp;nbsp;Inconsistent regions. Wrong VM families. Missing tags make cost attribution impossible. Non-compliant configurations.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; Adopt Azure Policy early, at Management Group scope. Critical policies: allowed locations, allowed VM SKUs, enforce HTTPS, enforce private endpoints, enforce tagging (environment, owner, cost-center).&lt;/P&gt;
&lt;H2&gt;Recommended role structure&lt;/H2&gt;
&lt;P&gt;Based on experience with dozens of digital-native customers, here's the role structure that works across the three planes.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Engineering plane (RBAC)&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;2–3 subscription Owners:&amp;nbsp;&lt;/STRONG&gt;Platform Lead, SRE Lead, Cloud Architect&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Platform/SRE team&lt;/STRONG&gt;&amp;nbsp;as&amp;nbsp;&lt;STRONG&gt;Contributors:&amp;nbsp;&lt;/STRONG&gt;deploy and manage infrastructure&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Developers&lt;/STRONG&gt;&amp;nbsp;as&amp;nbsp;&lt;STRONG&gt;RG-scoped Contributors or Readers:&amp;nbsp;&lt;/STRONG&gt;limited to their workload's resource group&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cost Management Reader&lt;/STRONG&gt; for budget owners: usage visibility without deployment rights&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Policy&lt;/STRONG&gt; for guardrails: VM SKUs, regions, encryption, tags&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Management Groups&lt;/STRONG&gt;&amp;nbsp;for organizational structure&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Finance plane (Commerce)&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Billing Account Owner&lt;/STRONG&gt;&amp;nbsp;= CFO or Finance Director&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Billing Contributor&lt;/STRONG&gt;&amp;nbsp;= Finance Operations&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Billing Reader&lt;/STRONG&gt;&amp;nbsp;= FP&amp;amp;A and financial analysts&lt;/LI&gt;
&lt;LI&gt;All Marketplace-paid offers require finance approval&lt;/LI&gt;
&lt;LI&gt;MACC visibility restricted to finance roles&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Identity/Security plane (Entra)&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;2–4 Global Admins&lt;/STRONG&gt;&amp;nbsp;(break-glass accounts included)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;PIM enforced&lt;/STRONG&gt; for all privileged roles, no permanent admin access&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Conditional Access&lt;/STRONG&gt;&amp;nbsp;for all admin roles (MFA, compliant device, block legacy auth)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Groups&lt;/STRONG&gt; used for RBAC assignment, never assign RBAC to individual users&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Workload identities&lt;/STRONG&gt;&amp;nbsp;(Managed Identity) preferred over service principals&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Role mapping templates&lt;/H2&gt;
&lt;P&gt;Copy these into your onboarding documentation.&lt;/P&gt;
&lt;H3&gt;Engineering Team&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Azure Role&lt;/th&gt;&lt;th&gt;Plane&lt;/th&gt;&lt;th&gt;Allowed actions&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Cloud Architect&lt;/td&gt;&lt;td&gt;Owner (2–3 per sub)&lt;/td&gt;&lt;td&gt;RBAC&lt;/td&gt;&lt;td&gt;Govern workloads, assign roles, manage infrastructure&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Platform / SRE&lt;/td&gt;&lt;td&gt;Contributor&lt;/td&gt;&lt;td&gt;RBAC&lt;/td&gt;&lt;td&gt;Deploy and manage infrastructure&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Developer&lt;/td&gt;&lt;td&gt;Contributor or Reader (RG-scoped)&lt;/td&gt;&lt;td&gt;RBAC&lt;/td&gt;&lt;td&gt;Deploy to specific resource groups&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Budget Owner&lt;/td&gt;&lt;td&gt;Cost Management Reader&lt;/td&gt;&lt;td&gt;RBAC&lt;/td&gt;&lt;td&gt;View usage-based cost, manage budgets — not billing&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Finance Team&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Azure Role&lt;/th&gt;&lt;th&gt;Plane&lt;/th&gt;&lt;th&gt;Allowed actions&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Finance Lead&lt;/td&gt;&lt;td&gt;Billing Account Owner&lt;/td&gt;&lt;td&gt;Billing&lt;/td&gt;&lt;td&gt;View and manage credits, invoices, MACC, payment methods&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Finance Analyst&lt;/td&gt;&lt;td&gt;Billing Reader&lt;/td&gt;&lt;td&gt;Billing&lt;/td&gt;&lt;td&gt;Read-only billing visibility&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FP&amp;amp;A&lt;/td&gt;&lt;td&gt;Billing Reader&lt;/td&gt;&lt;td&gt;Billing&lt;/td&gt;&lt;td&gt;Read-only; no deployments, no resource access&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Leadership&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Azure Role&lt;/th&gt;&lt;th&gt;Plane&lt;/th&gt;&lt;th&gt;Actions&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;CTO / VP Engineering&lt;/td&gt;&lt;td&gt;Reader or Cost Mgmt Reader&lt;/td&gt;&lt;td&gt;RBAC&lt;/td&gt;&lt;td&gt;Visibility into platform and resource costs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CFO&lt;/td&gt;&lt;td&gt;Billing Reader&lt;/td&gt;&lt;td&gt;Billing&lt;/td&gt;&lt;td&gt;Visibility into credits, invoices, MACC, commitments&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;RACI Matrix&lt;/H2&gt;
&lt;P&gt;Adapted from the Microsoft&amp;nbsp;&lt;A href="https://learn.microsoft.com/azure/cloud-adoption-framework/organize/raci-alignment" target="_blank" rel="noopener"&gt;Cloud Adoption Framework&lt;/A&gt;.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="width: 72.1296%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Function&lt;/th&gt;&lt;th&gt;Accountable&lt;/th&gt;&lt;th&gt;Responsible&lt;/th&gt;&lt;th&gt;Consulted&lt;/th&gt;&lt;th&gt;Informed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Billing account roles &amp;amp; access&lt;/td&gt;&lt;td&gt;Finance Lead&lt;/td&gt;&lt;td&gt;Finance Ops&lt;/td&gt;&lt;td&gt;Cloud Architect&lt;/td&gt;&lt;td&gt;Engineering&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Subscription role assignments&lt;/td&gt;&lt;td&gt;Cloud Architect&lt;/td&gt;&lt;td&gt;Platform / SRE&lt;/td&gt;&lt;td&gt;Finance, Security&lt;/td&gt;&lt;td&gt;Engineering&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost monitoring &amp;amp; budgets&lt;/td&gt;&lt;td&gt;Finance&lt;/td&gt;&lt;td&gt;Engineering&lt;/td&gt;&lt;td&gt;Leadership&lt;/td&gt;&lt;td&gt;All teams&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Marketplace purchases&lt;/td&gt;&lt;td&gt;Finance Lead&lt;/td&gt;&lt;td&gt;Finance Ops&lt;/td&gt;&lt;td&gt;Engineering, Legal&lt;/td&gt;&lt;td&gt;CFO&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;IaC / Deployment governance&lt;/td&gt;&lt;td&gt;Platform Lead&lt;/td&gt;&lt;td&gt;Engineers&lt;/td&gt;&lt;td&gt;Security&lt;/td&gt;&lt;td&gt;Finance&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Policies &amp;amp; guardrails&lt;/td&gt;&lt;td&gt;Security / Cloud Architect&lt;/td&gt;&lt;td&gt;Platform Team&lt;/td&gt;&lt;td&gt;Engineering&lt;/td&gt;&lt;td&gt;Leadership&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity &amp;amp; access governance&lt;/td&gt;&lt;td&gt;Security Lead&lt;/td&gt;&lt;td&gt;Identity Admin&lt;/td&gt;&lt;td&gt;Cloud Architect&lt;/td&gt;&lt;td&gt;All teams&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;PIM &amp;amp; Conditional Access&lt;/td&gt;&lt;td&gt;Security Lead&lt;/td&gt;&lt;td&gt;Identity Admin&lt;/td&gt;&lt;td&gt;Platform Lead&lt;/td&gt;&lt;td&gt;Engineering&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;MACC tracking &amp;amp; credit visibility&lt;/td&gt;&lt;td&gt;Finance Lead&lt;/td&gt;&lt;td&gt;Finance Ops&lt;/td&gt;&lt;td&gt;Cloud Architect&lt;/td&gt;&lt;td&gt;Leadership&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Include this template in your onboarding documentation and review it quarterly.&lt;/P&gt;
&lt;H2&gt;Best Practices&lt;/H2&gt;
&lt;H3&gt;Use Entra Groups for RBAC assignment, never assign directly to users&lt;/H3&gt;
&lt;P&gt;Benefits: clear separation of identity and resource planes, easy onboarding/offboarding, predictable RBAC inheritance, enables PIM for group-based elevation.&lt;/P&gt;
&lt;P&gt;Naming pattern:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;grp-sub-&amp;lt;SubscriptionName&amp;gt;-Owner&lt;/LI&gt;
&lt;LI&gt;grp-sub-&amp;lt;SubscriptionName&amp;gt;-Contributor&lt;/LI&gt;
&lt;LI&gt;grp-rg-&amp;lt;WorkloadName&amp;gt;-Reader&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Assign the&amp;nbsp;&lt;STRONG&gt;group&lt;/STRONG&gt;&amp;nbsp;to the role, not individual users.&lt;/P&gt;
&lt;H3&gt;Enforce PIM + Conditional Access for all privileged roles&lt;/H3&gt;
&lt;P&gt;Key CA policies: MFA required for all admins, compliant device requirement, block legacy authentication, block sign-in from high-risk locations, require phishing-resistant MFA.&lt;/P&gt;
&lt;P&gt;No permanent admin access. Use time-based elevation for every privileged operation.&lt;/P&gt;
&lt;H3&gt;Separate subscriptions by environment and workload&lt;/H3&gt;
&lt;P&gt;Subscriptions are a security boundary. Pattern: 1 subscription per environment per workload. Platform teams get their own subscription. Use Hub &amp;amp; Spoke or Landing Zones for cross-subscription networking.&lt;/P&gt;
&lt;H3&gt;Keep billing data confidential&lt;/H3&gt;
&lt;P&gt;Only Billing roles should see credits, commitments, discounts, invoices, and MACC balance. Engineers should never have access to commercial data.&lt;/P&gt;
&lt;H2&gt;The 10 Principles of Azure Governance&lt;/H2&gt;
&lt;P&gt;After working with digital natives across AI, SaaS, and infrastructure companies, I can summarize Azure governance into these principles:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;#&lt;/th&gt;&lt;th&gt;Principle&lt;/th&gt;&lt;th&gt;Summary&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Separate identity, resources, and billing. Always.&lt;/td&gt;&lt;td&gt;Never mix roles across planes. An engineer should never hold billing roles. A finance analyst should never hold subscription Owner.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;2&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Engineering owns the resource plane.&lt;/td&gt;&lt;td&gt;Give them Contributor and Cost Management Reader. Don't burden them with billing or identity administration.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;3&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Finance owns the billing plane.&lt;/td&gt;&lt;td&gt;Credits, MACC, invoices, private offers. Every Marketplace purchase flows through Finance.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;4&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Security owns identity and governance.&lt;/td&gt;&lt;td&gt;PIM, Conditional Access, Azure Policy. Identity decisions should not be made by engineering or finance.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;5&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Keep subscription Owners scarce.&lt;/td&gt;&lt;td&gt;Maximum 2–3 per subscription. Use PIM for emergency elevation. Everyone else gets Contributor or scoped roles.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;6&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Lock down Marketplace.&lt;/td&gt;&lt;td&gt;Every SaaS purchase approved by Finance. No exceptions. Use the five-step workflow.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;7&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Use Infrastructure as Code.&lt;/td&gt;&lt;td&gt;Manual deployments don't scale and can't be audited. Use Bicep, Terraform, or Pulumi.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;8&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Use budgets early.&lt;/td&gt;&lt;td&gt;Set budgets at Management Group, Subscription, and Resource Group levels. Configure alerts to email, Teams, or automation.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;9&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Use Management Groups from day one.&lt;/td&gt;&lt;td&gt;Every startup that scales beyond a single subscription regrets not using them. Recommended hierarchy: Tenant Root → OrgName → Platform / Production / NonProduction / Sandbox / Shared Services.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;10&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Build governance before scale.&lt;/td&gt;&lt;td&gt;The companies that scale successfully treat Azure governance as infrastructure, not bureaucracy.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 5.46701%" /&gt;&lt;col style="width: 34.5824%" /&gt;&lt;col style="width: 59.932%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/overview" target="_blank" rel="noopener"&gt;Azure RBAC Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/rbac-and-directory-admin-roles" target="_blank" rel="noopener"&gt;Entra Directory &amp;amp; Admin Roles&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cost-management-billing/manage/understand-mca-roles" target="_blank" rel="noopener"&gt;Billing Roles (Microsoft Customer Agreement)&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cost-management-billing/costs/assign-access-acm-data" target="_blank" rel="noopener"&gt;Assign Access to Cost Management Data&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/marketplace/azure-purchasing-invoicing" target="_blank" rel="noopener"&gt;Marketplace Purchases &amp;amp; Invoicing&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/marketplace/private-offers" target="_blank" rel="noopener"&gt;Private Offers in Azure Marketplace&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/conditions-overview" target="_blank" rel="noopener"&gt;Azure RBAC Conditions (ABAC)&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/governance/policy/overview" target="_blank" rel="noopener"&gt;Azure Policy Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cloud-adoption-framework/organize/raci-alignment" target="_blank" rel="noopener"&gt;Cloud Adoption Framework RACI&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview" target="_blank" rel="noopener"&gt;Managed Identities Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/aks/workload-identity-overview" target="_blank" rel="noopener"&gt;AKS Workload Identity&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Closing thoughts&lt;/H2&gt;
&lt;P&gt;Azure's three permission planes aren't a problem to solve, they're a framework to leverage.&lt;/P&gt;
&lt;P&gt;The confusion happens when teams try to treat Azure as if it has a single permission system. It doesn't, and it never will. Because identity, billing, and resource deployment are fundamentally different domains that must be operated and secured differently.&lt;/P&gt;
&lt;P&gt;But when organizations understand these three planes and structure their roles accordingly, something powerful happens:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Engineering moves faster.&lt;/STRONG&gt;&amp;nbsp;Clear RBAC scopes mean teams deploy without waiting for approvals they don't need.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Finance gains real oversight.&lt;/STRONG&gt;&amp;nbsp;Billing roles provide full commercial visibility without the risk of touching production resources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Security gets a clean, enforceable boundary model.&lt;/STRONG&gt;&amp;nbsp;Entra controls identity; PIM and Conditional Access control elevation; Azure Policy controls the guardrails.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Leadership sees clarity instead of chaos.&lt;/STRONG&gt;&amp;nbsp;The right roles in the right planes mean dashboards, reports, and alerts actually reflect what each stakeholder needs.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Good governance doesn't slow down innovation.&amp;nbsp;&lt;STRONG&gt;Bad governance does.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The companies that scale successfully, whether AI-native, SaaS platforms, or global digital-first organizations, are the ones that adopt a clean, intentional model early. They treat Azure governance as infrastructure, not bureaucracy.&lt;/P&gt;
&lt;P&gt;The model is simple:&amp;nbsp;&lt;STRONG&gt;Entra for who. RBAC for what. Commerce for how you pay.&lt;/STRONG&gt;&amp;nbsp;Start with that, and everything else becomes easier.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;This concludes the 3-part series on Azure Governance for Digital Natives. For the full model, start with&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-has-three-permission-systems-and-youre-probably-confusing-them/4471854" target="_blank" rel="noopener" data-lia-auto-title="Part 1: The Three Permission Planes" data-lia-auto-title-active="0"&gt;Part 1: The Three Permission Planes&lt;/A&gt;. For collision points and Managed Identity, read&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/marketplace-governance-and-the-cross-plane-bridge/4510067" target="_blank" rel="noopener" data-lia-auto-title="Part 2: Marketplace Governance and the Cross-Plane Bridge" data-lia-auto-title-active="0"&gt;Part 2: Marketplace Governance and the Cross-Plane Bridge&lt;/A&gt;.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 21:25:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/role-structures-anti-patterns-and-the-10-governance-principles/ba-p/4510070</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-04-09T21:25:00Z</dc:date>
    </item>
    <item>
      <title>Marketplace governance and the cross-plane bridge</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/marketplace-governance-and-the-cross-plane-bridge/ba-p/4510067</link>
      <description>&lt;P&gt;&lt;EM&gt;Part 2 of 3: Where resource deployment meets financial authority and how to govern it&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;In&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-has-three-permission-systems-and-youre-probably-confusing-them/4471854" target="_blank" rel="noopener" data-lia-auto-title="Part 1" data-lia-auto-title-active="0"&gt;Part 1&lt;/A&gt;, we established the foundational model: Azure operates on three completely separate permission planes, &lt;STRONG&gt;Entra&lt;/STRONG&gt;&amp;nbsp;(identity),&amp;nbsp;&lt;STRONG&gt;RBAC&lt;/STRONG&gt;&amp;nbsp;(resources), and&amp;nbsp;&lt;STRONG&gt;Commerce&lt;/STRONG&gt;&amp;nbsp;(billing). A role in one plane grants zero access in the others.&lt;/P&gt;
&lt;P&gt;That model is clean in theory. But in practice, the planes collide. And when they do, teams get confused, purchases stall, and governance gaps appear.&lt;/P&gt;
&lt;P&gt;This post covers the biggest collision point:&amp;nbsp;&lt;STRONG&gt;Marketplace,&amp;nbsp;&lt;/STRONG&gt;where resource deployment meets financial authority. We'll also dig into&amp;nbsp;&lt;STRONG&gt;Managed Identity&lt;/STRONG&gt;&amp;nbsp;(the one construct that genuinely bridges two planes),&amp;nbsp;&lt;STRONG&gt;ABAC&lt;/STRONG&gt;&amp;nbsp;(advanced conditional governance within the resource plane), and the five-step Marketplace approval workflow every digital-native company should adopt.&lt;/P&gt;
&lt;H2&gt;Marketplace: Where the resource and billing planes intersect&lt;/H2&gt;
&lt;P&gt;Marketplace is the most common collision point between Azure's permission planes. Here's why: deploying an Azure resource and purchasing a Marketplace SaaS product feel like the same action from the Portal, but they are governed by completely different permission systems.&lt;/P&gt;
&lt;H3&gt;Deploying resources ≠ Purchasing SaaS&lt;/H3&gt;
&lt;P&gt;A Contributor can deploy any native Azure resource: VMs, Storage, AKS, Networking, Databases, Azure OpenAI. These are&amp;nbsp;&lt;STRONG&gt;resource plane&lt;/STRONG&gt;&amp;nbsp;operations governed by RBAC.&lt;/P&gt;
&lt;P&gt;But purchasing a third-party SaaS product through Marketplace — Datadog, Snowflake, Elastic, Confluent, MongoDB Atlas, is a&amp;nbsp;&lt;STRONG&gt;commercial transaction&lt;/STRONG&gt;. It creates a financial obligation between your organization and a vendor. That's the billing plane.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Deploying&lt;/STRONG&gt;&amp;nbsp;→ RBAC (Resource Plane)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Purchasing&lt;/STRONG&gt;&amp;nbsp;→ Commerce (Financial Plane)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;The marketplace permission model&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Action&lt;/th&gt;&lt;th&gt;Requires RBAC?&lt;/th&gt;&lt;th&gt;Requires billing role?&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Deploy a VM&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy AKS cluster&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy Azure OpenAI&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy Datadog agent extension&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy Confluent cluster (Azure-native)&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Purchase Datadog SaaS plan&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Purchase Snowflake SaaS&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Accept Confluent SaaS contract&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;View Snowflake private offer&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Approve Marketplace private offer&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This is why engineers often ask:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;"Why can't I buy Snowflake? I'm an Owner."&lt;BR /&gt;&lt;BR /&gt;Because&amp;nbsp;&lt;STRONG&gt;Owner&lt;/STRONG&gt;&amp;nbsp;has no financial authority. Owner is the highest role in the resource plane, but Marketplace SaaS purchases are commercial transactions that require billing plane permissions. These are different systems.&lt;/EM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;The subtlety: Azure-Native vs. SaaS&lt;/H3&gt;
&lt;P&gt;Some vendors have both Azure-native integrations and SaaS offerings, which makes this even more confusing:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Datadog agent extension:&amp;nbsp;&lt;/STRONG&gt;deploys as an Azure resource → RBAC ✅&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Datadog SaaS plan:&amp;nbsp;&lt;/STRONG&gt;creates a billing relationship → Commerce ✅&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Confluent for Azure:&amp;nbsp;&lt;/STRONG&gt;deploys Kafka as an Azure resource → RBAC ✅&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Confluent Cloud SaaS contract:&amp;nbsp;&lt;/STRONG&gt;financial commitment → Commerce ✅&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;When an engineer deploys a Datadog agent via the Portal, everything works. When they try to subscribe to the Datadog SaaS plan, they hit a wall. Same vendor, same Portal, different permission plane.&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;The five-step marketplace purchase workflow&lt;/H2&gt;
&lt;P&gt;For digital natives operating with financial governance, every Marketplace purchase should follow this workflow:&lt;/P&gt;
&lt;H3&gt;Step 1: Engineer requests a SaaS or marketplace resource&lt;/H3&gt;
&lt;P&gt;The request should include: why it's needed, expected cost, impact on MACC, preferred vendor, and alternatives considered.&lt;/P&gt;
&lt;H3&gt;Step 2: Finance reviews commercial implications&lt;/H3&gt;
&lt;P&gt;Finance checks: MACC impact (does this purchase count toward the commitment?), budget alignment, available discounts (private offers), vendor validation, and contract terms.&lt;/P&gt;
&lt;H3&gt;Step 3: Billing role executes the purchase&lt;/H3&gt;
&lt;P&gt;Billing Account Owner or Contributor completes the transaction in the Portal. This is a billing plane operation.&lt;/P&gt;
&lt;H3&gt;Step 4: Engineering deploys or configures the resource&lt;/H3&gt;
&lt;P&gt;SaaS connector setup, private offer entitlement, RBAC for workload integration, data pipelines and integration. This is a resource plane operation.&lt;/P&gt;
&lt;H3&gt;Step 5: Cost monitoring activated&lt;/H3&gt;
&lt;P&gt;Alerts configured, budgets set, tagging applied, forecasting enabled.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;This five-step workflow is simple, but most digital natives skip it&lt;/STRONG&gt; and end up with surprise invoices, unapproved vendor commitments, or MACC burn they didn't plan for.&lt;/EM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;The one cross-plane bridge: Managed Identity&lt;/H2&gt;
&lt;P&gt;If the three-plane model is about separation, Managed Identity is the one construct that genuinely bridges two of those planes.&lt;/P&gt;
&lt;P&gt;A Managed Identity is an&amp;nbsp;&lt;STRONG&gt;Entra identity&lt;/STRONG&gt;&amp;nbsp;tied to an&amp;nbsp;&lt;STRONG&gt;Azure resource&lt;/STRONG&gt;&amp;nbsp;and authorized via&amp;nbsp;&lt;STRONG&gt;RBAC&lt;/STRONG&gt;. It lets Azure workloads authenticate to other Azure services without storing credentials in code, environment variables, or configuration files.&lt;/P&gt;
&lt;H3&gt;The cross-plane flow&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="width: 75%; height: 139.2px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 34.8px;"&gt;&lt;th style="height: 34.8px;"&gt;Step&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;Plane&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;What happens&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;&lt;STRONG&gt;1. Identity created&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Entra (Identity)&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;A service principal is registered in the directory&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;&lt;STRONG&gt;2. Access authorized&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;RBAC (Resource)&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Role assignments grant access to specific resources&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;&lt;STRONG&gt;3. Identity used&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Runtime (Resource)&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;The workload requests a token from Entra and calls the target service&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;No secrets. No passwords. No key rotation. The identity lifecycle is managed by Azure itself.&lt;/P&gt;
&lt;H3&gt;AI workload examples&lt;/H3&gt;
&lt;P&gt;For digital natives building AI applications, Managed Identity is essential:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-none" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Scenario&lt;/th&gt;&lt;th&gt;Source&lt;/th&gt;&lt;th&gt;Target&lt;/th&gt;&lt;th&gt;RBAC role needed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;App calls Azure OpenAI&lt;/td&gt;&lt;td&gt;App Service / Container App&lt;/td&gt;&lt;td&gt;Azure OpenAI&lt;/td&gt;&lt;td&gt;Cognitive Services OpenAI User&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;App reads secrets&lt;/td&gt;&lt;td&gt;App Service / Container App&lt;/td&gt;&lt;td&gt;Key Vault&lt;/td&gt;&lt;td&gt;Key Vault Secrets User&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;App reads/writes blobs&lt;/td&gt;&lt;td&gt;App Service / Container App&lt;/td&gt;&lt;td&gt;Storage Account&lt;/td&gt;&lt;td&gt;Storage Blob Data Contributor&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AKS pod calls AOAI&lt;/td&gt;&lt;td&gt;AKS (Workload Identity)&lt;/td&gt;&lt;td&gt;Azure OpenAI&lt;/td&gt;&lt;td&gt;Cognitive Services OpenAI User&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AKS pod reads secrets&lt;/td&gt;&lt;td&gt;AKS (Workload Identity)&lt;/td&gt;&lt;td&gt;Key Vault&lt;/td&gt;&lt;td&gt;Key Vault Secrets User&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Function processes events&lt;/td&gt;&lt;td&gt;Azure Function&lt;/td&gt;&lt;td&gt;Event Hub&lt;/td&gt;&lt;td&gt;Azure Event Hubs Data Receiver&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Pipeline reads training data&lt;/td&gt;&lt;td&gt;ML Workspace&lt;/td&gt;&lt;td&gt;Storage Account&lt;/td&gt;&lt;td&gt;Storage Blob Data Reader&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;System-Assigned vs. User-Assigned&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;System-assigned:&lt;/STRONG&gt;&amp;nbsp;Tied to a single resource. When the resource is deleted, the identity is deleted. Best for simple scenarios with one resource accessing one or a few target services.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User-assigned:&lt;/STRONG&gt;&amp;nbsp;Created as a standalone resource. Can be assigned to multiple resources. Best for shared identity across microservices, AKS Workload Identity, or when the identity must persist independently.&lt;/P&gt;
&lt;H3&gt;AKS Workload Identity&lt;/H3&gt;
&lt;P&gt;AKS Workload Identity deserves special mention, it's the most common Managed Identity pattern in digital-native companies running Kubernetes:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;A&amp;nbsp;&lt;STRONG&gt;User-Assigned Managed Identity&lt;/STRONG&gt;&amp;nbsp;is created in Azure&lt;/LI&gt;
&lt;LI&gt;A&amp;nbsp;&lt;STRONG&gt;Kubernetes Service Account&lt;/STRONG&gt;&amp;nbsp;is annotated with the identity's client ID&lt;/LI&gt;
&lt;LI&gt;A&amp;nbsp;&lt;STRONG&gt;Federated Identity Credential&lt;/STRONG&gt;&amp;nbsp;links the K8s service account to the Managed Identity&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;RBAC role assignments&lt;/STRONG&gt;&amp;nbsp;grant the Managed Identity access to target resources&lt;/LI&gt;
&lt;LI&gt;At runtime, the pod uses the service account to get an Entra token via workload identity federation&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This is Entra + RBAC + Kubernetes working together: identity plane creates the trust, resource plane authorizes the access, and the workload uses it at runtime.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Key insight:&lt;/STRONG&gt; Managed Identity bridges Entra and RBAC, but never touches the third plane (billing). No identity, managed or otherwise, can see MACC credits or approve Marketplace purchases.&lt;/EM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Advanced: Attribute-Based Access Control (ABAC)&lt;/H2&gt;
&lt;P&gt;ABAC extends RBAC with conditions based on resource attributes (tags), principal attributes, and request context. It is&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt; a separate permission system,&amp;nbsp; it's an enhancement to the resource plane.&lt;/P&gt;
&lt;P&gt;For example, you can write a role assignment that says:&amp;nbsp;&lt;EM&gt;"Allow Contributor access only to resources tagged&amp;nbsp;Environment = Dev"&lt;/EM&gt;&amp;nbsp;or&amp;nbsp;&lt;EM&gt;"Allow read access only to storage blobs under a specific path prefix."&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;ABAC is particularly useful for:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Multi-tenant SaaS applications&lt;/STRONG&gt;&amp;nbsp;that need tenant isolation at the resource layer&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Regulated workloads&lt;/STRONG&gt;&amp;nbsp;that require fine-grained access control beyond what standard RBAC scopes provide&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;What ABAC cannot do:&lt;/STRONG&gt;&amp;nbsp;grant billing access, override Entra roles, access MACC, or purchase Marketplace products. It operates entirely within the RBAC resource plane.&lt;/P&gt;
&lt;P&gt;For implementation details, see:&amp;nbsp;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/conditions-overview" target="_blank" rel="noopener"&gt;Azure RBAC Conditions (ABAC)&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/overview" target="_blank" rel="noopener"&gt;Azure RBAC Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview" target="_blank" rel="noopener"&gt;Managed Identities Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/aks/workload-identity-overview" target="_blank" rel="noopener"&gt;AKS Workload Identity&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/marketplace/azure-purchasing-invoicing" target="_blank" rel="noopener"&gt;Marketplace Purchases &amp;amp; Invoicing&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/marketplace/private-offers" target="_blank" rel="noopener"&gt;Private Offers in Azure Marketplace&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cost-management-billing/manage/understand-mca-roles" target="_blank" rel="noopener"&gt;Billing Roles (Microsoft Customer Agreement)&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/conditions-overview" target="_blank" rel="noopener"&gt;Azure RBAC Conditions (ABAC)&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;What's Next →&lt;/STRONG&gt;&amp;nbsp;We've now covered the three-plane model (&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-has-three-permission-systems-and-youre-probably-confusing-them/4471854" target="_blank" rel="noopener" data-lia-auto-title="Part 1" data-lia-auto-title-active="0"&gt;Part 1&lt;/A&gt;) and the biggest collision points: Marketplace, Managed Identity, and ABAC. In&amp;nbsp;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/role-structures-anti-patterns-and-the-10-governance-principles/4510070" data-lia-auto-title="Part 3" data-lia-auto-title-active="0" target="_blank"&gt;&lt;STRONG&gt;Part 3&lt;/STRONG&gt;&lt;/A&gt;, we get tactical: the&amp;nbsp;&lt;STRONG&gt;7 anti-patterns&lt;/STRONG&gt;&amp;nbsp;to avoid, recommended&amp;nbsp;&lt;STRONG&gt;role structures&lt;/STRONG&gt;&amp;nbsp;for Engineering, Finance, and Security teams,&amp;nbsp;&lt;STRONG&gt;RACI templates&lt;/STRONG&gt;, and the&amp;nbsp;&lt;STRONG&gt;10 core governance principles&lt;/STRONG&gt; every scaling organization should adopt.&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Thu, 09 Apr 2026 21:17:44 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/marketplace-governance-and-the-cross-plane-bridge/ba-p/4510067</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-04-09T21:17:44Z</dc:date>
    </item>
    <item>
      <title>No Decision After 7 Working Days + Portal Loop Issue – Rain Stella Technology</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/no-decision-after-7-working-days-portal-loop-issue-rain-stella/m-p/4505230#M115</link>
      <description>&lt;P&gt;Hi Microsoft for Startups Community &amp;amp; Team,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am following up on my Microsoft for Startups application submitted on behalf of Rain Stella Technology, and would also like to flag a portal issue I experienced after the application was submitted.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;--- Application Status ---&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I received a confirmation email acknowledging receipt of my application, however it has now been 7 working days with no further communication — no approval, no decline, and no request for additional information.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;According to Microsoft's own guidelines, applications are typically reviewed within 3 business days. We are now more than double that timeframe with no update.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Application Details:&lt;/P&gt;&lt;P&gt;• Applicant Name: Khalid Adine&lt;/P&gt;&lt;P&gt;• Startup Name: Rain Stella Technology&lt;/P&gt;&lt;P&gt;• Status: Receipt confirmation received, no decision communicated since&lt;/P&gt;&lt;P&gt;• Days Elapsed: 7 working days&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;--- Portal Issue ---&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;During and after submitting my application, I encountered a frustrating portal bug where, instead of displaying a confirmation screen or application status, the portal kept redirecting me back to the "Apply Now" screen — as if my submission had not been recorded.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I received the receipt confirmation email, which confirmed the application went through, but the portal still does not reflect any application status. This caused significant confusion, and I wanted to flag it in case it is a widely known issue affecting other applicants.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also attempted to contact support via email (email address removed for privacy reasons), but the address returned a bounce error, leaving the forum and contact form as my only available channels.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;--- My Requests ---&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Can someone from the team confirm the current status of our application?&lt;/P&gt;&lt;P&gt;2. Is any additional information or documentation required from our side?&lt;/P&gt;&lt;P&gt;3. Can the portal loop bug be investigated and fixed for other applicants facing the same issue?&lt;/P&gt;&lt;P&gt;4. What is the correct and currently working support email or channel for urgent queries?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are actively building our product and are eager to move forward with the program. Any update would be sincerely appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your time and support.&lt;/P&gt;&lt;P&gt;Khalid Adine&lt;/P&gt;&lt;P&gt;Founder, Rain Stella Technology&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2026 16:14:45 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/no-decision-after-7-working-days-portal-loop-issue-rain-stella/m-p/4505230#M115</guid>
      <dc:creator>khalidadine</dc:creator>
      <dc:date>2026-03-24T16:14:45Z</dc:date>
    </item>
    <item>
      <title>Azure credit and Azure Portal successful, but login to Startups Portal fails</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-credit-and-azure-portal-successful-but-login-to-startups/m-p/4502233#M114</link>
      <description>&lt;P&gt;Hello&lt;BR /&gt;I have successfully registered for the Microsoft for Startups basic offering using a new Microsoft Account [1].&lt;BR /&gt;Using my Microsoft account, I can log into the Azure Portal successfully and can see my "Azure for Startups" subscription with free credit.&lt;BR /&gt;&lt;BR /&gt;However, when I try to log into the Microsoft for Startups Portal using LinkedIn&amp;nbsp;I see error "No user found. Please sign up or try a different LinkedIn account".&lt;BR /&gt;&lt;BR /&gt;Initially I though this is because my Primary LinkedIn email address is not the same as the email address of my Microsoft Account [1]&lt;BR /&gt;In LinkedIn, I changed my Primary email address to my Microsoft Account email address.&lt;BR /&gt;Again I tried again to log into the Microsoft for Startups Portal using LinkedIn - same error as before.&lt;BR /&gt;I created a new LinkedIn account for using my new Microsoft Account email address - same error as before.&lt;BR /&gt;&lt;BR /&gt;Please can someone help?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 14 Mar 2026 10:13:52 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-credit-and-azure-portal-successful-but-login-to-startups/m-p/4502233#M114</guid>
      <dc:creator>MessageStack-HB</dc:creator>
      <dc:date>2026-03-14T10:13:52Z</dc:date>
    </item>
    <item>
      <title>Introducing the Startup-Scale Landing Zone: Get Azure right from day one</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/introducing-the-startup-scale-landing-zone-get-azure-right-from/ba-p/4501566</link>
      <description>&lt;H3&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;If you've been following this blog, you may recall the post&amp;nbsp;&lt;/SPAN&gt;&lt;A style="font-style: normal; font-weight: 400; background-color: rgb(255, 255, 255); font-size: 16px;" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/from-zero-to-hero-with-azure-landing-zones/4229195" target="_blank" rel="noopener" data-href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/from-zero-to-hero-with-azure-landing-zones/4229195"&gt;From Zero to Hero with Azure Landing Zones&lt;/A&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;, where we walked through the full Azure Landing Zone journey, from identity and RBAC to Platform and Application Landing Zones. That guide covered the&amp;nbsp;&lt;/SPAN&gt;&lt;EM style="color: rgb(30, 30, 30); font-size: 16px;"&gt;what&lt;/EM&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;&amp;nbsp;and the&amp;nbsp;&lt;/SPAN&gt;&lt;EM style="color: rgb(30, 30, 30); font-size: 16px;"&gt;why&lt;/EM&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;. This post introduces the&amp;nbsp;&lt;/SPAN&gt;&lt;EM style="color: rgb(30, 30, 30); font-size: 16px;"&gt;how,&amp;nbsp;&lt;/EM&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;a deployable, open-source project that distills those principles into something a startup can actually ship in an afternoon:&lt;/SPAN&gt;&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-line="4"&gt;The problem: Cloud foundations shouldn't take two months&lt;/H2&gt;
&lt;P data-line="6"&gt;Every startup building on Azure faces the same fork in the road:&lt;/P&gt;
&lt;P data-line="8"&gt;&lt;STRONG&gt;Option A:&lt;/STRONG&gt;&amp;nbsp;Follow the&amp;nbsp;&lt;A href="https://aka.ms/alz" target="_blank" rel="noopener" data-href="https://aka.ms/alz"&gt;Azure Landing Zone (ALZ)&lt;/A&gt; guidance. It's comprehensive, battle-tested, and designed for organizations with thousands of users. It's also 100+ modules, a multi-layered management group hierarchy, and months of work to understand, let alone implement. For a 10-person startup, it's like buying a commercial kitchen to make breakfast.&lt;/P&gt;
&lt;P data-line="10"&gt;&lt;STRONG&gt;Option B:&lt;/STRONG&gt; Skip governance entirely. One subscription, no policies, no budgets, no RBAC strategy. Ship fast now, deal with security debt later. This is what most startups actually do, and it works until the first security questionnaire from an enterprise customer, the first runaway cost incident, or the first az group delete that hits production.&lt;/P&gt;
&lt;P data-line="12"&gt;Neither option is right. Startups need a third path: just enough governance to be secure and cost-aware from day one, without the operational overhead that slows them down.&lt;/P&gt;
&lt;P data-line="14"&gt;That's exactly what the&amp;nbsp;&lt;A class="lia-external-url" href="https://startupscalelanding.zone" target="_blank" rel="noopener"&gt;Startup-Scale Landing Zone (SSLZ)&lt;/A&gt;&amp;nbsp;provides.&lt;/P&gt;
&lt;H2 data-line="16"&gt;What is the Startup-Scale Landing Zone?&lt;/H2&gt;
&lt;P data-line="18"&gt;SSLZ is an opinionated, production-ready Azure infrastructure template that deploys in&amp;nbsp;&lt;STRONG&gt;under one hour&lt;/STRONG&gt; using Bicep or Terraform. It's built for teams of 5–50 engineers, typically pre-seed to Series A, who don't have a dedicated platform team but need to get Azure right from the start.&lt;/P&gt;
&lt;P data-line="20"&gt;It takes the core principles from the Azure Landing Zone architecture and strips them to the essentials:&lt;/P&gt;
&lt;UL data-line="22"&gt;
&lt;LI data-line="22"&gt;&lt;STRONG&gt;One management group, two subscriptions&lt;/STRONG&gt;&amp;nbsp;(prod + non-prod). That's it. No six-layer hierarchy.&lt;/LI&gt;
&lt;LI data-line="23"&gt;&lt;STRONG&gt;Security built-in.&lt;/STRONG&gt; Defender for Cloud, RBAC groups, NSG deny-all defaults, and policy enforcement, all automated.&lt;/LI&gt;
&lt;LI data-line="24"&gt;&lt;STRONG&gt;Cost controls from day one.&lt;/STRONG&gt;&amp;nbsp;Budget alerts at 50%, 80%, and 100%, mandatory tagging, and reservation guidance.&lt;/LI&gt;
&lt;LI data-line="25"&gt;&lt;STRONG&gt;An explicit graduation path.&lt;/STRONG&gt;&amp;nbsp;When you outgrow SSLZ, there's a step-by-step guide to migrate to the full ALZ architecture.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="27"&gt;&lt;STRONG&gt;Important:&lt;/STRONG&gt; SSLZ is not a replacement for Azure Landing Zones. It targets a different profile: very early-stage startups with a single workload, a single region, and no hybrid connectivity. For those teams, the realistic alternative isn't ALZ, it's usually&amp;nbsp;&lt;EM&gt;no governance at all&lt;/EM&gt;.&lt;/P&gt;
&lt;H2 data-line="29"&gt;Architecture: Simplicity as a design principle&lt;/H2&gt;
&lt;P data-line="31"&gt;The architecture is deliberately minimal:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;Tenant Root Group
└── mg-&amp;lt;yourcompany&amp;gt;              ← Policies applied here
    ├── sub-&amp;lt;yourcompany&amp;gt;-prod    ← Production workloads
    └── sub-&amp;lt;yourcompany&amp;gt;-nonprod ← Dev, staging, QA&lt;/LI-CODE&gt;
&lt;P&gt;Each subscription gets its own VNet with a standardized subnet layout:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;vnet-&amp;lt;co&amp;gt;-prod (10.0.0.0/16)
├── snet-aks         10.0.0.0/20    (4,091 IPs — AKS nodes + pods)
├── snet-app         10.0.16.0/22   (1,019 IPs — App Service / Container Apps)
├── snet-data        10.0.20.0/22   (1,019 IPs — Private Endpoints)
└── snet-shared      10.0.24.0/24   (251 IPs — CI/CD agents, jump boxes)&lt;/LI-CODE&gt;
&lt;P data-line="50"&gt;No hub network. No Azure Firewall. No VNet peering. Each subscription is a self-contained island.&lt;/P&gt;
&lt;H3 data-line="52"&gt;Why no hub?&lt;/H3&gt;
&lt;P data-line="54"&gt;A hub-spoke topology costs a minimum of ~$1,500/month. Azure Firewall alone runs $900+/month. For a startup with a single workload in a single region, that's cost and complexity with no return. NSGs provide L3/L4 filtering for free and handle 95% of startup networking use cases. When compliance or hybrid connectivity demands centralized egress control, the graduation guide walks you through adding a hub, without touching existing resources.&lt;/P&gt;
&lt;H3 data-line="56"&gt;Why two subscriptions?&lt;/H3&gt;
&lt;P data-line="58"&gt;Two subscriptions give you isolation that resource groups can't:&lt;/P&gt;
&lt;UL data-line="60"&gt;
&lt;LI data-line="60"&gt;&lt;STRONG&gt;Cost isolation for free:&amp;nbsp;&lt;/STRONG&gt;no tagging gymnastics to separate prod from dev spend.&lt;/LI&gt;
&lt;LI data-line="61"&gt;&lt;STRONG&gt;RBAC without custom roles:&amp;nbsp;&lt;/STRONG&gt;developers get Contributor on non-prod and Reader on prod.&lt;/LI&gt;
&lt;LI data-line="62"&gt;&lt;STRONG&gt;Blast radius containment:&amp;nbsp;&lt;/STRONG&gt;az group delete&amp;nbsp;in dev can't touch production.&lt;/LI&gt;
&lt;LI data-line="63"&gt;&lt;STRONG&gt;Quota isolation:&amp;nbsp;&lt;/STRONG&gt;non-prod experiments don't consume prod quotas.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="65"&gt;This is a habit that's cheap to form early and expensive to retrofit later. One primary workload per subscription; when you deploy a second independent workload, create a new subscription.&lt;/P&gt;
&lt;H2 data-line="67"&gt;What you get out of the box&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Component&lt;/th&gt;&lt;th&gt;What's deployed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Management Groups&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Single MG with two subscriptions&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Azure Policy&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Microsoft Cloud Security Benchmark (audit mode), required tags (environment,&amp;nbsp;team), allowed locations, diagnostic settings&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Networking&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;VNet + 4 subnets per subscription, NSGs with deny-all-inbound default&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Log Analytics workspace, Activity Log forwarding, 90-day retention&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Security&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Defender for Cloud CSPM (free), Defender for Servers P2 (prod), security contact alerts&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Cost Management&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Budget alerts at 50/80/100% thresholds, tag enforcement via policy&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;CI/CD&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;GitHub Actions workflows for both Bicep and Terraform, Workload Identity Federation (no secrets)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="79"&gt;Security without friction&lt;/H3&gt;
&lt;P data-line="81"&gt;The security model avoids compliance theater. Instead of buying Entra ID P2 "to check a box," SSLZ enables Security Defaults, free MFA that blocks 99.9% of identity attacks. Instead of enforcing MCSB in Deny mode on day one (which blocks legitimate deployments and frustrates developers), it starts in Audit mode so you can understand your posture first, then selectively move to Deny as your team matures.&lt;/P&gt;
&lt;P data-line="83"&gt;RBAC follows three rules:&lt;/P&gt;
&lt;OL data-line="85"&gt;
&lt;LI data-line="85"&gt;&lt;STRONG&gt;Never assign roles to individuals:&amp;nbsp;&lt;/STRONG&gt;always use security groups.&lt;/LI&gt;
&lt;LI data-line="86"&gt;&lt;STRONG&gt;Developers don't get Contributor on prod:&amp;nbsp;&lt;/STRONG&gt;deployments go through CI/CD.&lt;/LI&gt;
&lt;LI data-line="87"&gt;&lt;STRONG&gt;No Owner at subscription level for non-admins:&amp;nbsp;&lt;/STRONG&gt;a compromised account with Owner can grant itself anything.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="89"&gt;For CI/CD, SSLZ uses Workload Identity Federation (WIF) instead of client secrets. No credentials to store, rotate, or accidentally commit. Short-lived OIDC tokens scoped to specific repos and branches.&lt;/P&gt;
&lt;H3 data-line="91"&gt;Cost transparency&lt;/H3&gt;
&lt;P data-line="93"&gt;Every recommendation includes real numbers:&lt;/P&gt;
&lt;UL data-line="95"&gt;
&lt;LI data-line="95"&gt;&lt;EM&gt;"Azure Firewall: $900+/month. Skip until compliance or hybrid demands it."&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-line="96"&gt;&lt;EM&gt;"DDoS Protection Standard: $2,944/month. Azure's free basic DDoS + Front Door WAF handles most cases."&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-line="97"&gt;&lt;EM&gt;"Defender for App Service: ~$15/month. Limited value compared to other plans. Revisit later."&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-line="98"&gt;&lt;EM&gt;"Standard_D4s_v5 VM: $140/month on-demand → $90/month with 1-year RI. 36% savings."&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="100"&gt;The documentation also covers the six most common cost mistakes startups make: forgotten dev VMs, over-provisioned databases, ignoring Reserved Instances, premium storage where standard works, not using Spot VMs, and missing Dev/Test pricing. Each mistake comes with a concrete fix and code example.&lt;/P&gt;
&lt;H2 data-line="102"&gt;Starter examples: Three startup archetypes&lt;/H2&gt;
&lt;P data-line="104"&gt;SSLZ ships with three production-grade example architectures, each with Bicep + Terraform implementations, deployment instructions, and realistic cost estimates:&lt;/P&gt;
&lt;H3 data-line="106"&gt;SaaS Startup (~$330–440/month)&lt;/H3&gt;
&lt;P data-line="108"&gt;Container Apps + Azure SQL Elastic Pool + Redis + Key Vault. Multi-tenant with shared schema and&amp;nbsp;tenant_id&amp;nbsp;column. Container Apps scale to zero in non-prod. Elastic pools are 50–70% cheaper than individual databases.&lt;/P&gt;
&lt;H3 data-line="110"&gt;AI Startup (~$1,150–1,250/month)&lt;/H3&gt;
&lt;P data-line="112"&gt;AKS with GPU Spot node pools (60–90% savings) + Azure OpenAI + Blob Storage + Redis for inference caching. Covers model serving framework choices (vLLM vs Triton vs TGI) and GPU node management with taints and KEDA autoscaling.&lt;/P&gt;
&lt;H3 data-line="114"&gt;API-First Startup (~$163–345/month)&lt;/H3&gt;
&lt;P data-line="116"&gt;App Service with deployment slots (zero-downtime swaps) + API Management (Consumption tier, pay-per-call) + Cosmos DB + Application Insights. Includes API versioning strategy, rate limiting tiers, and Cosmos DB partitioning guidance.&lt;/P&gt;
&lt;H2 data-line="118"&gt;When to graduate&lt;/H2&gt;
&lt;P data-line="120"&gt;SSLZ is explicit about its limits. You'll outgrow it when 2–3 of these signals appear simultaneously:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Signal&lt;/th&gt;&lt;th&gt;Why it matters&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Second independent workload&lt;/td&gt;&lt;td&gt;Each workload gets its own subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Engineering team &amp;gt; 50 people&lt;/td&gt;&lt;td&gt;Different teams need different permissions and cost boundaries&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Regulatory compliance (SOC2, HIPAA, PCI)&lt;/td&gt;&lt;td&gt;Requires specific controls SSLZ doesn't cover&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-region deployment&lt;/td&gt;&lt;td&gt;Needs centralized network management&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hybrid connectivity (VPN, ExpressRoute)&lt;/td&gt;&lt;td&gt;Requires a Connectivity subscription with gateways&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5+ subscriptions&lt;/td&gt;&lt;td&gt;Policy and RBAC at scale needs MG hierarchy&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="131"&gt;The&amp;nbsp;&lt;A href="https://github.com/ricmmartins/sslz/blob/main/docs/graduation-guide.md" target="_blank" rel="noopener" data-href="https://github.com/ricmmartins/sslz/blob/main/docs/graduation-guide.md"&gt;Graduation Guide&lt;/A&gt; provides a five-phase migration path to full ALZ: management group hierarchy, hub network + firewall, management subscription, policy hardening, and identity hardening with risk assessments for each phase. It also includes the cost of the full platform layer ($1,500–3,000/month), so you can make an informed decision about when the investment makes sense.&lt;/P&gt;
&lt;H2 data-line="133"&gt;Quick start: From zero to production-ready in under 1 hour&lt;/H2&gt;
&lt;H3 data-line="135"&gt;Prerequisites (5 min)&lt;/H3&gt;
&lt;UL data-line="137"&gt;
&lt;LI data-line="137"&gt;Azure CLI installed&lt;/LI&gt;
&lt;LI data-line="138"&gt;Two Azure subscriptions (prod + non-prod)&lt;/LI&gt;
&lt;LI data-line="139"&gt;Owner permissions on both subscriptions&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="bash"&gt;git clone https://github.com/ricmmartins/sslz.git
cd sslz
az login
./scripts/validate-prerequisites.sh&lt;/LI-CODE&gt;
&lt;H3 data-line="148"&gt;Deploy with Bicep (20 min)&lt;/H3&gt;
&lt;LI-CODE lang="bash"&gt;cd infra/bicep
cp parameters/prod.bicepparam parameters/prod.local.bicepparam
# Edit prod.local.bicepparam with your values

az deployment sub create \
  --location eastus2 \
  --template-file main.bicep \
  --parameters parameters/prod.local.bicepparam&lt;/LI-CODE&gt;
&lt;H3 data-line="161"&gt;Or Deploy with Terraform (20 min)&lt;/H3&gt;
&lt;LI-CODE lang="bash"&gt;cd infra/terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values

terraform init
terraform plan -out=tfplan
terraform apply tfplan&lt;/LI-CODE&gt;
&lt;H3 data-line="173"&gt;Verify (5 min)&lt;/H3&gt;
&lt;LI-CODE lang="bash"&gt;az group list --query "[?contains(name, 'yourcompany')].name" -o tsv
az policy assignment list --query "[].displayName" -o tsv
az security pricing list --query "value[?pricingTier=='Standard'].{Name:name, Tier:pricingTier}" -o table&lt;/LI-CODE&gt;
&lt;H2 data-line="181"&gt;Design philosophy&lt;/H2&gt;
&lt;P data-line="183"&gt;Three principles guided every decision in SSLZ:&lt;/P&gt;
&lt;OL&gt;
&lt;LI data-line="185"&gt;&lt;STRONG&gt; Opinionated over flexible. &lt;/STRONG&gt;"It depends" isn't helpful when you have five engineers and no platform team. SSLZ makes the call: two subscriptions, no hub, deny-all NSGs, MCSB in audit mode and tells you when to revisit.&lt;/LI&gt;
&lt;LI data-line="187"&gt;&lt;STRONG&gt; Reversible over perfect. &lt;/STRONG&gt;Every architectural decision is designed to be easy to change later. Moving subscriptions between management groups is a 10-second operation. Adding a hub VNet requires only a new deployment, not changes to existing resources. Policies can move from Audit to Deny on a schedule. Multi-region is a future add-on, not a prerequisite.&lt;/LI&gt;
&lt;LI data-line="189"&gt;&lt;STRONG&gt; Honest about trade-offs. &lt;/STRONG&gt;Instead of claiming "enterprise-grade," SSLZ says:&lt;EM&gt;"You'll outgrow this when..."&lt;/EM&gt;&amp;nbsp;and&amp;nbsp;&lt;EM&gt;"Here's exactly what it costs to add the next layer."&lt;/EM&gt;&amp;nbsp;That transparency is what separates it from frameworks that are either overkill for startups or under-engineered for production.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 data-line="191"&gt;Get involved&lt;/H2&gt;
&lt;P data-line="193"&gt;SSLZ is open source under the MIT license. The project welcomes contributions, especially real-world configurations from startup CTOs and platform engineers who've battle-tested the patterns.&lt;/P&gt;
&lt;UL data-line="195"&gt;
&lt;LI data-line="195"&gt;&lt;STRONG&gt;GitHub:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://github.com/ricmmartins/sslz" target="_blank" rel="noopener" data-href="https://github.com/ricmmartins/sslz"&gt;github.com/ricmmartins/sslz&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="196"&gt;&lt;STRONG&gt;Documentation site:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://startupscalelanding.zone/" target="_blank" rel="noopener" data-href="https://startupscalelanding.zone"&gt;startupscalelanding.zone&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="197"&gt;&lt;STRONG&gt;Previous post:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/from-zero-to-hero-with-azure-landing-zones/4229195" target="_blank" rel="noopener" data-href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/from-zero-to-hero-with-azure-landing-zones/4229195"&gt;From Zero to Hero with Azure Landing Zones&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="199"&gt;If you're a startup building on Azure, give SSLZ a try. Deploy it, break it, and tell us what your real infrastructure looks like, so the next team doesn't have to figure it out from scratch.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Mar 2026 18:25:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/introducing-the-startup-scale-landing-zone-get-azure-right-from/ba-p/4501566</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-03-16T18:25:26Z</dc:date>
    </item>
    <item>
      <title>Unable to see Azure credits for verified startup business</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/unable-to-see-azure-credits-for-verified-startup-business/m-p/4499556#M113</link>
      <description>&lt;P&gt;TL;DR: Business verification only completed a few weeks ago (Feb 2026), can't see Azure for Startup credits anywhere on my account, would like to activate my $5000 credits and put them to use building my application which will leverage Azure in a number of ways. Help needed!&lt;/P&gt;&lt;P&gt;Details:&lt;/P&gt;&lt;P&gt;I have an early stage startup, pre-money, pre-product.&amp;nbsp; Building a native Microsoft/Windows 11 application for AI data analysis with cloud hosted private LLMs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I signed up for a business M365 account back in September 2025 which was fine for my M365 Office Suite access (OneDrive, OneNote, Word, Excel, PowerPoint, SharePoint), however my business verification ran into mysterious obstacles (it is a Delaware C Corp, incorporated in 2025), so I couldn't setup my corporate developer accounts.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Fast forward to February 2026, and I finally push through the business verification steps, and am now working full time on my business (as of late January 2026).&amp;nbsp; I am "in" the Azure for Startups program, but I can't access this "Founders Hub" area that I'm reading about, and I can't see in my Azure Billing/Invoicing area anything other than my corporate credit card for payments -- no evidence of $1000 or $5000 startup credits for Azure.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What am I missing?&amp;nbsp;&lt;/P&gt;&lt;P&gt;What did I do wrong?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any way I can activate these now that I'm actually in a position where I need them, now that I have my business verified with Microsoft?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance for any assistance anyone can provide on this point!&lt;/P&gt;</description>
      <pubDate>Wed, 04 Mar 2026 21:28:53 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/unable-to-see-azure-credits-for-verified-startup-business/m-p/4499556#M113</guid>
      <dc:creator>IanSR</dc:creator>
      <dc:date>2026-03-04T21:28:53Z</dc:date>
    </item>
    <item>
      <title>Production-grade API Gateway patterns for Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/production-grade-api-gateway-patterns-for-microsoft-foundry/ba-p/4490494</link>
      <description>&lt;P data-start="62" data-end="263"&gt;Most startup teams start with the simplest thing that can work. One or two apps call Microsoft Foundry model endpoints directly, traffic is predictable, and “routing” is just a config value in the app.&lt;/P&gt;
&lt;P data-start="265" data-end="424"&gt;The gateway pattern becomes necessary when Foundry stops being “an integration” and becomes “a shared platform”. That shift shows up in a few reliable signals:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-start="265" data-end="424"&gt;You do not fully control client code, or updating client configuration is riskier than updating a central routing configuration.&lt;/LI&gt;
&lt;LI data-start="265" data-end="424"&gt;You need blue green rollouts for model versions or fine-tuned variants without forcing every client to redeploy&lt;/LI&gt;
&lt;LI data-start="265" data-end="424"&gt;You need server-side retry and circuit breaking semantics to handle throttling and availability without duplicating logic across every app.&lt;/LI&gt;
&lt;LI data-start="265" data-end="424"&gt;You need consistent token governance and usage visibility across multiple apps and consumers.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;On Azure, this is commonly implemented with Azure API Management (APIM) using GenAI-aware “AI Gateway” capabilities, and it can be configured from the Foundry portal and applied per project.&lt;/P&gt;
&lt;H3 data-start="1306" data-end="1339"&gt;&lt;U&gt;What problems a gateway solves&lt;/U&gt;&lt;/H3&gt;
&lt;P data-start="1341" data-end="1505"&gt;A production gateway in front of Foundry is not about adding a hop. It is about centralizing cross-cutting concerns that otherwise get reimplemented inconsistently:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-start="1341" data-end="1505"&gt;&lt;STRONG data-start="1509" data-end="1531"&gt;Stable API surface&lt;/STRONG&gt; while deployments and backends evolve.&lt;/LI&gt;
&lt;LI data-start="1341" data-end="1505"&gt;&lt;STRONG data-start="1573" data-end="1604"&gt;Consistent auth termination&lt;/STRONG&gt; at the gateway, then reestablish trust from the gateway to the model backend (for example with Azure RBAC).&lt;/LI&gt;
&lt;LI data-start="1341" data-end="1505"&gt;&lt;STRONG data-start="1755" data-end="1792"&gt;Token-based throttling and quotas&lt;/STRONG&gt; for fairness and cost control across consumers.&lt;/LI&gt;
&lt;LI data-start="1341" data-end="1505"&gt;&lt;STRONG data-start="1883" data-end="1909"&gt;Operational resiliency&lt;/STRONG&gt; via backend pools, priority and weight routing, retry, and circuit breaker behavior that honors throttling signals like Retry-After.&lt;/LI&gt;
&lt;LI data-start="1341" data-end="1505"&gt;&lt;STRONG data-start="2087" data-end="2108"&gt;Unified telemetry&lt;/STRONG&gt; at the choke point, even when you have multiple underlying instances.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;U&gt;Decoupling clients from backend topology&lt;/U&gt;&lt;/H3&gt;
&lt;P&gt;One secondary but important effect of introducing a gateway is that it shifts backend-specific details out of application code. Clients call a stable API owned by your platform team, while routing, credentials, and failover semantics live behind that boundary. This does not make models interchangeable, and it does not eliminate platform dependencies. What it does is contain them. As backend topology evolves, whether that means new deployments, additional subscriptions, or additional regions, those changes become operational updates rather than coordinated application rewrites.&lt;/P&gt;
&lt;P&gt;In practice, this means your platform team owns the API contract and operational semantics, while backend providers remain an implementation detail behind that contract.&lt;/P&gt;
&lt;H3 data-start="2225" data-end="2251"&gt;&lt;U&gt;One simple mental model&lt;/U&gt;&lt;/H3&gt;
&lt;img /&gt;
&lt;H3 data-start="2733" data-end="2761"&gt;&lt;U&gt;Concrete gateway patterns&lt;/U&gt;&lt;/H3&gt;
&lt;H4 data-start="421" data-end="458"&gt;Choosing the right gateway pattern&lt;/H4&gt;
&lt;P data-start="460" data-end="560"&gt;The table below summarizes when each pattern is most appropriate, and what trade-offs it introduces.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-solid" border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Pattern&lt;/th&gt;&lt;th&gt;Primary goal&lt;/th&gt;&lt;th&gt;Isolation level&lt;/th&gt;&lt;th&gt;Throughput scaling&lt;/th&gt;&lt;th&gt;Resiliency impact&lt;/th&gt;&lt;th&gt;Operational complexity&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Single Foundry, multi-deployment routing&lt;/td&gt;&lt;td&gt;Decouple clients from models and enable safe rollouts&lt;/td&gt;&lt;td&gt;Logical only (same resource boundary)&lt;/td&gt;&lt;td&gt;Limited to single resource quotas&lt;/td&gt;&lt;td&gt;Low to moderate (deployment-level)&lt;/td&gt;&lt;td&gt;Low&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-resource, same region, same subscription&lt;/td&gt;&lt;td&gt;Security segmentation, reliability, backend pooling&lt;/td&gt;&lt;td&gt;Resource-level&lt;/td&gt;&lt;td&gt;Not increased for standard tier&lt;/td&gt;&lt;td&gt;Moderate (backend failover)&lt;/td&gt;&lt;td&gt;Medium&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Prioritized failover, spillover (PTU → standard)&lt;/td&gt;&lt;td&gt;Cost control and burst protection&lt;/td&gt;&lt;td&gt;Resource-level&lt;/td&gt;&lt;td&gt;Controlled spillover&lt;/td&gt;&lt;td&gt;High (explicit failover semantics)&lt;/td&gt;&lt;td&gt;Medium to high&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-subscription, same region&lt;/td&gt;&lt;td&gt;Quota expansion, org boundaries, central AI service&lt;/td&gt;&lt;td&gt;Subscription-level&lt;/td&gt;&lt;td&gt;Scales with number of subscriptions&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-region&lt;/td&gt;&lt;td&gt;Regional resilience, data residency, global access&lt;/td&gt;&lt;td&gt;Region-level&lt;/td&gt;&lt;td&gt;Region-bounded&lt;/td&gt;&lt;td&gt;Very high&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-start="1660" data-end="1687"&gt;&lt;STRONG data-start="1660" data-end="1687"&gt;How to read this table:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="1689" data-end="2058"&gt;
&lt;LI data-start="1689" data-end="1776"&gt;If your problem is &lt;STRONG data-start="1710" data-end="1751"&gt;model lifecycle and client decoupling&lt;/STRONG&gt;, start with Pattern 1.&lt;/LI&gt;
&lt;LI data-start="1777" data-end="1874"&gt;If your problem is &lt;STRONG data-start="1798" data-end="1830"&gt;reliability and segmentation&lt;/STRONG&gt;, Pattern 2 and 3 are the usual next step.&lt;/LI&gt;
&lt;LI data-start="1875" data-end="1965"&gt;If your problem is &lt;STRONG data-start="1896" data-end="1943"&gt;quota ceilings or organizational boundaries&lt;/STRONG&gt;, Pattern 4 appears.&lt;/LI&gt;
&lt;LI data-start="1966" data-end="2058"&gt;If your problem is &lt;STRONG data-start="1987" data-end="2026"&gt;regional resilience or global scale&lt;/STRONG&gt;, Pattern 5 becomes unavoidable.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2763" data-end="2926"&gt;Below are the most common patterns that show up as startups move from “one app calling one deployment” to “multiple products, multiple teams, and production SLOs”.&lt;/P&gt;
&lt;H4 data-start="2928" data-end="2996"&gt;Pattern 1: Single Foundry resource with multi-deployment routing&lt;/H4&gt;
&lt;img /&gt;
&lt;P data-start="2998" data-end="3017"&gt;&lt;STRONG data-start="2998" data-end="3017"&gt;When you use it&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="3018" data-end="3231"&gt;
&lt;LI data-start="3018" data-end="3120"&gt;You run multiple model deployments under one Foundry resource and want to control routing centrally.&lt;/LI&gt;
&lt;LI data-start="3121" data-end="3231"&gt;You want safer rollouts (blue green) without forcing client updates.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3233" data-end="3251"&gt;&lt;STRONG data-start="3233" data-end="3251"&gt;What it solves&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="3252" data-end="3537"&gt;
&lt;LI data-start="3252" data-end="3308"&gt;Routing decisions move from clients to a single place.&lt;/LI&gt;
&lt;LI data-start="3309" data-end="3537"&gt;You can gradually shift traffic between deployments, but you still need safe deployment practices because changing “which model” can be a breaking change from the client’s perspective.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3539" data-end="3565"&gt;&lt;STRONG data-start="3539" data-end="3565"&gt;Key operational detail&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="3566" data-end="3775"&gt;
&lt;LI data-start="3566" data-end="3775"&gt;Strongly consider &lt;STRONG data-start="3586" data-end="3632"&gt;credential termination and reestablishment&lt;/STRONG&gt;. Clients authenticate to the gateway. The gateway authenticates to the model backend via Azure RBAC.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="3782" data-end="3852"&gt;Pattern 2: Multi-resource in the same region and same subscription&lt;/H4&gt;
&lt;img /&gt;
&lt;P data-start="3854" data-end="3873"&gt;&lt;STRONG data-start="3854" data-end="3873"&gt;When you use it&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="3874" data-end="4166"&gt;
&lt;LI data-start="3874" data-end="3963"&gt;You need &lt;STRONG data-start="3885" data-end="3910"&gt;security segmentation&lt;/STRONG&gt; boundaries (separate keys or Azure RBAC per client).&lt;/LI&gt;
&lt;LI data-start="3964" data-end="4006"&gt;You want an easier &lt;STRONG data-start="3985" data-end="3999"&gt;chargeback&lt;/STRONG&gt; model.&lt;/LI&gt;
&lt;LI data-start="4007" data-end="4166"&gt;You want failover for availability issues, operational mistakes, or pairing provisioned and standard for spillover.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4168" data-end="4186"&gt;&lt;STRONG data-start="4168" data-end="4186"&gt;What it solves&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="4187" data-end="4413"&gt;
&lt;LI data-start="4187" data-end="4318"&gt;You can treat multiple backends as &lt;STRONG data-start="4224" data-end="4241"&gt;active-active&lt;/STRONG&gt; and load balance across instances.&lt;/LI&gt;
&lt;LI data-start="4319" data-end="4413"&gt;You centralize retry and circuit-breaker behavior.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4415" data-end="4438"&gt;&lt;STRONG data-start="4415" data-end="4438"&gt;Critical constraint&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="4439" data-end="4651"&gt;
&lt;LI data-start="4439" data-end="4651"&gt;&lt;STRONG data-start="4441" data-end="4503"&gt;Standard quotas are subscription-level, not instance-level&lt;/STRONG&gt;. Load balancing across standard instances in the same subscription does not create additional throughput&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="4658" data-end="4749"&gt;Pattern 3: Prioritized failover and planned spillover (PTU first, consumption fallback)&lt;/H4&gt;
&lt;img /&gt;
&lt;P data-start="4751" data-end="4882"&gt;This is the pattern you reach for when you want to maximize utilization of dedicated capacity and still survive bursts and outages.&lt;/P&gt;
&lt;P data-start="4884" data-end="5165"&gt;The AI Gateway workshop describes a “Prioritized PTU with Fallback Consumption” approach using APIM backend pools with &lt;STRONG data-start="5003" data-end="5040"&gt;priority and weight-based routing&lt;/STRONG&gt;, combined with &lt;STRONG data-start="5056" data-end="5081"&gt;circuit breaker rules&lt;/STRONG&gt; and retries for 429 and selected 503 cases.&lt;/P&gt;
&lt;P data-start="5167" data-end="5259"&gt;Concrete implementation details from the workshop that are worth copying into your playbook:&lt;/P&gt;
&lt;UL data-start="5261" data-end="5632"&gt;
&lt;LI data-start="5261" data-end="5316"&gt;Configure &lt;STRONG data-start="5273" data-end="5289"&gt;backend pool&lt;/STRONG&gt; across multiple endpoints.&lt;/LI&gt;
&lt;LI data-start="5317" data-end="5451"&gt;Add a &lt;STRONG data-start="5325" data-end="5349"&gt;circuit breaker rule&lt;/STRONG&gt; that can trip on throttling (429) and accept Retry-After&lt;/LI&gt;
&lt;LI data-start="5317" data-end="5451"&gt;Use APIM policy to authenticate with &lt;STRONG data-start="5491" data-end="5511"&gt;managed identity&lt;/STRONG&gt; and set the backend to the pool, then retry on 429 or specific 503 conditions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="5634" data-end="5728"&gt;This moves “resiliency logic” out of every client and into one place you can test and iterate.&lt;/P&gt;
&lt;H4 data-start="5735" data-end="5821"&gt;Pattern 4: Multi-subscription, same region (quota scaling and centralized service)&lt;/H4&gt;
&lt;img /&gt;
&lt;P data-start="5823" data-end="5842"&gt;&lt;STRONG data-start="5823" data-end="5842"&gt;When you use it&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="5843" data-end="6186"&gt;
&lt;LI data-start="5843" data-end="5980"&gt;You need more quota in &lt;STRONG data-start="5868" data-end="5880"&gt;standard&lt;/STRONG&gt; deployments but must constrain models to a single region.&lt;/LI&gt;
&lt;LI data-start="5843" data-end="5980"&gt;You are building a centralized “Microsoft Foundry as a service” model. Standard quota is subscription-bound, so capacity pooling often implies multiple subscriptions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="6188" data-end="6252"&gt;&lt;STRONG data-start="6188" data-end="6252"&gt;Implementation tips from the Azure Architecture Center guide&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="6253" data-end="6631"&gt;
&lt;LI data-start="6253" data-end="6365"&gt;Prefer subscriptions backed by the same Microsoft Entra tenant for consistency in Azure RBAC and Azure Policy.&lt;/LI&gt;
&lt;LI data-start="6366" data-end="6422"&gt;Deploy the gateway in the same region as the backends.&lt;/LI&gt;
&lt;LI data-start="6423" data-end="6467"&gt;Consider a dedicated gateway subscription.&lt;/LI&gt;
&lt;LI data-start="6468" data-end="6631"&gt;Ensure private endpoints are reachable across subscriptions, including cross-subscription Private Link where supported.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="6638" data-end="6665"&gt;Pattern 5: Multi-region&lt;/H4&gt;
&lt;img /&gt;
&lt;P data-start="6667" data-end="6686"&gt;&lt;STRONG data-start="6667" data-end="6686"&gt;When you use it&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="6687" data-end="6921"&gt;
&lt;LI data-start="6687" data-end="6772"&gt;You need a service availability failover strategy (for example cross-region pairs).&lt;/LI&gt;
&lt;LI data-start="6773" data-end="6827"&gt;You have data residency and compliance requirements.&lt;/LI&gt;
&lt;LI data-start="6828" data-end="6921"&gt;You face mixed model availability across regions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The Azure Architecture Center guide calls out that for business-critical architectures that must survive a complete regional outage, a &lt;STRONG data-start="7058" data-end="7084"&gt;global unified gateway&lt;/STRONG&gt; helps eliminate failover logic from client code. It also notes the trade-offs of single-region gateway deployment doing active-active load balancing across regions, including added latency and egress charges for cross-region calls.&lt;/P&gt;
&lt;H3 data-start="2225" data-end="2275"&gt;&lt;U&gt;Real-world scenarios this architecture supports&lt;/U&gt;&lt;/H3&gt;
&lt;P data-start="2277" data-end="2431"&gt;These are representative scenarios drawn from common production environments and directly supported by the gateway patterns and reference implementations.&lt;/P&gt;
&lt;H4 data-start="2433" data-end="2481"&gt;Scenario A: Containing a runaway application&lt;/H4&gt;
&lt;P data-start="2483" data-end="2656"&gt;A company has five internal applications sharing the same Foundry environment. One application ships a prompt regression that suddenly multiplies average request size by 8x.&lt;/P&gt;
&lt;P data-start="2658" data-end="2676"&gt;Without a gateway:&lt;/P&gt;
&lt;UL data-start="2677" data-end="2832"&gt;
&lt;LI data-start="2677" data-end="2713"&gt;Token consumption spikes globally.&lt;/LI&gt;
&lt;LI data-start="2714" data-end="2764"&gt;Other apps experience 429s and degraded latency.&lt;/LI&gt;
&lt;LI data-start="2765" data-end="2832"&gt;Root cause takes time to identify because telemetry is scattered.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2834" data-end="2873"&gt;With an AI Gateway in front of Foundry:&lt;/P&gt;
&lt;UL data-start="2874" data-end="3102"&gt;
&lt;LI data-start="2874" data-end="2924"&gt;Token-based limits are enforced per application.&lt;/LI&gt;
&lt;LI data-start="2925" data-end="2970"&gt;The faulty app is throttled at the gateway.&lt;/LI&gt;
&lt;LI data-start="2971" data-end="3020"&gt;Other applications continue operating normally.&lt;/LI&gt;
&lt;LI data-start="3021" data-end="3102"&gt;The gateway telemetry immediately shows which consumer is exhausting the quota.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3104" data-end="3112"&gt;Outcome:&lt;/P&gt;
&lt;UL data-start="3113" data-end="3215"&gt;
&lt;LI data-start="3113" data-end="3164"&gt;Incident blast radius is limited to one consumer.&lt;/LI&gt;
&lt;LI data-start="3165" data-end="3184"&gt;No global outage.&lt;/LI&gt;
&lt;LI data-start="3185" data-end="3215"&gt;Faster root cause isolation.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="3222" data-end="3267"&gt;Scenario B: Zero-downtime model migration&lt;/H4&gt;
&lt;P data-start="3269" data-end="3348"&gt;A startup is migrating from one production deployment to a newer model version.&lt;/P&gt;
&lt;P data-start="3350" data-end="3427"&gt;They deploy the new model alongside the old one and configure the gateway to:&lt;/P&gt;
&lt;UL data-start="3428" data-end="3520"&gt;
&lt;LI data-start="3428" data-end="3479"&gt;Route 5 percent of traffic to the new deployment.&lt;/LI&gt;
&lt;LI data-start="3480" data-end="3520"&gt;Keep 95 percent on the old deployment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3522" data-end="3535"&gt;They observe:&lt;/P&gt;
&lt;UL data-start="3536" data-end="3576"&gt;
&lt;LI data-start="3536" data-end="3549"&gt;Error rate.&lt;/LI&gt;
&lt;LI data-start="3550" data-end="3560"&gt;Latency.&lt;/LI&gt;
&lt;LI data-start="3561" data-end="3576"&gt;Token growth.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3578" data-end="3681"&gt;Over several days they progressively shift traffic to 100 percent without requiring any client changes.&lt;/P&gt;
&lt;P data-start="3683" data-end="3691"&gt;Outcome:&lt;/P&gt;
&lt;UL data-start="3692" data-end="3828"&gt;
&lt;LI data-start="3692" data-end="3718"&gt;No forced redeployments.&lt;/LI&gt;
&lt;LI data-start="3719" data-end="3752"&gt;No mass client reconfiguration.&lt;/LI&gt;
&lt;LI data-start="3753" data-end="3828"&gt;Rollback is a gateway configuration change, not an emergency code change.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="3835" data-end="3881"&gt;Scenario C: Cost-controlled burst handling&lt;/H4&gt;
&lt;P data-start="3883" data-end="3983"&gt;A product runs steady baseline traffic on provisioned capacity and experiences unpredictable spikes.&lt;/P&gt;
&lt;P data-start="3985" data-end="4007"&gt;Gateway configuration:&lt;/P&gt;
&lt;UL data-start="4008" data-end="4143"&gt;
&lt;LI data-start="4008" data-end="4032"&gt;Priority backend pool.&lt;/LI&gt;
&lt;LI data-start="4033" data-end="4069"&gt;Provisioned deployment as primary.&lt;/LI&gt;
&lt;LI data-start="4070" data-end="4105"&gt;Standard deployment as secondary.&lt;/LI&gt;
&lt;LI data-start="4106" data-end="4143"&gt;Circuit breaker honors Retry-After.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4145" data-end="4162"&gt;Normal operation:&lt;/P&gt;
&lt;UL data-start="4163" data-end="4212"&gt;
&lt;LI data-start="4163" data-end="4212"&gt;Nearly all traffic hits provisioned throughput.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4214" data-end="4228"&gt;During spikes:&lt;/P&gt;
&lt;UL data-start="4229" data-end="4322"&gt;
&lt;LI data-start="4229" data-end="4267"&gt;Overflow is routed to standard tier.&lt;/LI&gt;
&lt;LI data-start="4268" data-end="4322"&gt;The gateway absorbs throttling behavior and retries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4324" data-end="4332"&gt;Outcome:&lt;/P&gt;
&lt;UL data-start="4333" data-end="4470"&gt;
&lt;LI data-start="4333" data-end="4374"&gt;Provisioned capacity is fully utilized.&lt;/LI&gt;
&lt;LI data-start="4375" data-end="4418"&gt;Spikes are handled without hard failures.&lt;/LI&gt;
&lt;LI data-start="4419" data-end="4470"&gt;Clients are unaware that backend routing changed.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="4477" data-end="4519"&gt;Scenario D: Subscription quota pooling&lt;/H4&gt;
&lt;P data-start="4521" data-end="4599"&gt;An organization reaches standard tier quota ceilings in a single subscription.&lt;/P&gt;
&lt;P data-start="4601" data-end="4697"&gt;They deploy Foundry resources across multiple subscriptions and place a single gateway in front.&lt;/P&gt;
&lt;P data-start="4699" data-end="4716"&gt;Gateway behavior:&lt;/P&gt;
&lt;UL data-start="4717" data-end="4847"&gt;
&lt;LI data-start="4717" data-end="4761"&gt;Distributes requests across subscriptions.&lt;/LI&gt;
&lt;LI data-start="4762" data-end="4797"&gt;Applies unified token governance.&lt;/LI&gt;
&lt;LI data-start="4798" data-end="4847"&gt;Exposes one API endpoint to all internal teams.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4849" data-end="4857"&gt;Outcome:&lt;/P&gt;
&lt;UL data-start="4858" data-end="4982"&gt;
&lt;LI data-start="4858" data-end="4893"&gt;Aggregate usable quota increases.&lt;/LI&gt;
&lt;LI data-start="4894" data-end="4936"&gt;Organizational boundaries are preserved.&lt;/LI&gt;
&lt;LI data-start="4937" data-end="4982"&gt;Clients remain unaware of backend topology.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-start="7365" data-end="7388"&gt;&lt;U&gt;Operational playbook&lt;/U&gt;&lt;/H3&gt;
&lt;P data-start="7390" data-end="7463"&gt;This is the part that separates “it works” from “it survives production”.&lt;/P&gt;
&lt;H4 data-start="7465" data-end="7495"&gt;1. Authentication strategy&lt;/H4&gt;
&lt;P data-start="7497" data-end="7520"&gt;&lt;STRONG data-start="7497" data-end="7520"&gt;Recommended default&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="7521" data-end="7708"&gt;
&lt;LI data-start="7521" data-end="7560"&gt;Terminate client auth at the gateway.&lt;/LI&gt;
&lt;LI data-start="7561" data-end="7708"&gt;Reestablish gateway-to-backend authorization via Azure RBAC rather than passing through client secrets.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The AI Gateway workshop provides a concrete example using authentication-managed-identity and setting the Authorization header for the backend call.&lt;/P&gt;
&lt;P data-start="7902" data-end="7915"&gt;&lt;STRONG data-start="7902" data-end="7915"&gt;Guardrail&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="7916" data-end="8070"&gt;
&lt;LI data-start="7916" data-end="8070"&gt;If you choose pass-through client credentials, ensure clients cannot bypass the gateway or model restrictions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="8077" data-end="8113"&gt;2. Token throttling and fairness&lt;/H4&gt;
&lt;P data-start="8115" data-end="8179"&gt;You want limits that match how LLMs consume capacity and budget.&lt;/P&gt;
&lt;UL data-start="8181" data-end="8483"&gt;
&lt;LI data-start="8181" data-end="8322"&gt;APIM GenAI capabilities emphasize &lt;STRONG data-start="8217" data-end="8244"&gt;controlled token limits&lt;/STRONG&gt; and monitoring for cost efficiency.&lt;/LI&gt;
&lt;LI data-start="8323" data-end="8483"&gt;Foundry AI Gateway governance scenarios explicitly include configuring token limits for models at the project level.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="8485" data-end="8581"&gt;Use token throttling as your primary fairness control, then layer request-rate limits if needed.&lt;/P&gt;
&lt;H4 data-start="8588" data-end="8613"&gt;3. Failover semantics&lt;/H4&gt;
&lt;P data-start="8615" data-end="8668"&gt;Two rules that prevent most “self-inflicted outages”:&lt;/P&gt;
&lt;UL data-start="8670" data-end="9016"&gt;
&lt;LI data-start="8670" data-end="8871"&gt;&lt;STRONG data-start="8672" data-end="8695"&gt;Honor Retry-After&lt;/STRONG&gt; from the backend when implementing failover and circuit breaker behavior. Do not continuously hit a throttled endpoint returning 429.&lt;/LI&gt;
&lt;LI data-start="8872" data-end="9016"&gt;Prefer gateway-side retry and circuit breaking to avoid repeated code and to keep one place to test.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="9018" data-end="9237"&gt;The workshop shows a pragmatic retry condition on 429 and selected 503, combined with backend pool routing and a circuit breaker that can trip on 429 while checking Retry-After.&lt;/P&gt;
&lt;H4 data-start="9244" data-end="9289"&gt;4. Observability and consumption tracking&lt;/H4&gt;
&lt;P data-start="9291" data-end="9486"&gt;A gateway is uniquely positioned to publish telemetry across all consumed models to a single store, which makes unified dashboarding and alerting easier.&lt;/P&gt;
&lt;P data-start="9488" data-end="9777"&gt;APIM’s GenAI positioning highlights token monitoring as part of “cost efficiency”. &lt;BR data-start="9610" data-end="9613" /&gt;The workshop navigation includes model monitoring and consumption tracking as first-class steps in the AI Gateway journey.&lt;/P&gt;
&lt;P data-start="9779" data-end="9941"&gt;Operationally, decide up front what you will dimension your telemetry by (project, tenant, application, environment) and enforce those identifiers at the gateway.&lt;/P&gt;
&lt;H4 data-start="9948" data-end="9998"&gt;5. APIOps: Treat gateway configuration as code&lt;/H4&gt;
&lt;P data-start="10000" data-end="10093"&gt;Even if you configure the first version in the portal, production systems need repeatability:&lt;/P&gt;
&lt;UL data-start="10095" data-end="10576"&gt;
&lt;LI data-start="10095" data-end="10248"&gt;Use a code-driven workflow for policies and configuration so routing and governance changes are reviewed and promoted like any other production change.&lt;/LI&gt;
&lt;LI data-start="10249" data-end="10421"&gt;If you adopt a federated model, APIM Workspaces are positioned to help organizations manage APIs more productively and securely.&lt;/LI&gt;
&lt;LI data-start="10422" data-end="10576"&gt;Keep an eye on the APIM changelog and GenAI feature updates because gateway capabilities are evolving quickly.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-start="10583" data-end="10611"&gt;&lt;U&gt;When not to add a gateway&lt;/U&gt;&lt;/H3&gt;
&lt;P data-start="10613" data-end="10874"&gt;The Architecture Center guide is explicit: If controlling client configuration is as easy as controlling gateway routing, the added reliability, security, cost, maintenance, and performance impact might not be worth it.&lt;/P&gt;
&lt;P data-start="10876" data-end="11169"&gt;Also, if you are using a single instance with multiple deployments primarily to simulate identity segmentation, you might be better served by multiple instances with distinct Azure RBAC boundaries instead of pushing that complexity into gateway logic.&lt;/P&gt;
&lt;H3 data-start="11176" data-end="11194"&gt;&lt;U&gt;Closing thought&lt;/U&gt;&lt;/H3&gt;
&lt;P data-start="11196" data-end="11276"&gt;A gateway is not a prerequisite for Foundry. It is an operational maturity step.&lt;/P&gt;
&lt;P data-start="11278" data-end="11593"&gt;When Foundry usage becomes multi-tenant, SLO-driven, and quota-sensitive, the gateway stops being “extra architecture” and becomes the place you express your platform intent. Auth boundaries. Token governance. Failover semantics. Telemetry. And a repeatable APIOps process to keep it all sane as the system evolves.&lt;/P&gt;
&lt;H3 data-start="11278" data-end="11593"&gt;&lt;U&gt;References&lt;/U&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-multi-backend" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Use a gateway in front of multiple Azure OpenAI deployments or instances - Azure Architecture Center"&gt;Use a gateway in front of multiple Azure OpenAI deployments or instances&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/ai-foundry/configuration/enable-ai-api-management-gateway-portal" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Configure AI Gateway in your Foundry resources - Microsoft Foundry"&gt;Configure AI Gateway in your Foundry resources &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI gateway in Azure API Management"&gt;AI gateway in Azure API Management&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://azure.github.io/api-management-resources/#:~:text=,Enhanced%20Governance%20with%20runtime%20policies" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Azure API Management - apimlove"&gt;Azure API Management &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://azure-samples.github.io/AI-Gateway/docs/azure-openai/dynamic-failover" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Ensure resiliency and optimized resource consumption with load balancer &amp;amp; circuit breaker | AI Gateway workshop"&gt;Ensure resiliency and optimized resource consumption with load balancer &amp;amp; circuit breaker&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://azure-samples.github.io/AI-Gateway/docs/azure-openai/rate-limit" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Control cost and performance with token quotas and limits | AI Gateway workshop"&gt;Control cost and performance with token quotas and limits &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://azure-samples.github.io/AI-Gateway/docs/azure-openai/track-consumption" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="Keep visibility into AI consumption with model monitoring | AI Gateway workshop"&gt;Keep visibility into AI consumption with model monitoring &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="GitHub - Azure-Samples/AI-Gateway: Labs to explore AI Models, MCP servers, and Agents with the AI Gateway powered by Azure API Management and Microsoft Foundry 🚀"&gt;GitHub - Azure-Samples/AI-Gateway&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/access-controlling" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/access-controlling at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/access-controlling &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/function-calling" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/function-calling at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/function-calling &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/model-context-protocol" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/model-context-protocol at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/model-context-protocol &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/openai-agents" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/openai-agents at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/openai-agents &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/ai-agent-service" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/ai-agent-service at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/ai-agent-service &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/semantic-caching" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/semantic-caching at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/semantic-caching &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/finops-framework" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/finops-framework at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/finops-framework&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/slm-self-hosting" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/slm-self-hosting at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/slm-self-hosting &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/ai-foundry-deepseek" target="_blank" rel="noopener" data-lia-auto-title-active="0" data-lia-auto-title="AI-Gateway/labs/ai-foundry-deepseek at main · Azure-Samples/AI-Gateway"&gt;AI-Gateway/labs/ai-foundry-deepseek &lt;/A&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jan 2026 19:35:08 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/production-grade-api-gateway-patterns-for-microsoft-foundry/ba-p/4490494</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-01-29T19:35:08Z</dc:date>
    </item>
    <item>
      <title>When and why startups add a gateway in front of Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/when-and-why-startups-add-a-gateway-in-front-of-microsoft/ba-p/4489490</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;Note: This post focuses on when and why startups begin adopting a gateway in front of Microsoft Foundry. In a follow-up article, we’ll go into a technical deep dive, covering design decisions, operational tradeoffs, latency considerations, observability, and patterns used in production-scale environments.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-start="2865" data-end="2938"&gt;&lt;BR /&gt;Most teams don’t hit scaling challenges with Microsoft Foundry on day one.&lt;/P&gt;
&lt;P data-start="2940" data-end="3134"&gt;Early on, things are simple. One or two applications call Foundry directly. Traffic is predictable. Model experimentation moves fast. Everything works, and there’s no reason to add extra layers.&lt;/P&gt;
&lt;P data-start="3136" data-end="3156"&gt;Then adoption grows. More applications start calling the same models. Traffic becomes spiky. Teams want better visibility into usage. Questions about rate limits, authentication, and how to evolve models over time begin to surface.&lt;/P&gt;
&lt;P data-start="3370" data-end="3488"&gt;This is usually the moment when teams start asking: &lt;STRONG data-start="3424" data-end="3488"&gt;“Do we need some kind of control &lt;/STRONG&gt;&lt;STRONG style="font-style: var(--lia-blog-font-style); font-family: var(--lia-blog-font-family); font-size: var(--lia-bs-font-size-base); -webkit-tap-highlight-color: hsla(var(--lia-bs-black-h),var(--lia-bs-black-s),var(--lia-bs-black-l),0); -webkit-text-size-adjust: 100%;" data-start="3424" data-end="3488"&gt;layer in front of Foundry?”&lt;/STRONG&gt;&lt;/P&gt;
&lt;H2 data-start="3490" data-end="3526"&gt;The signals that start to show up&lt;/H2&gt;
&lt;P data-start="3528" data-end="3607"&gt;Across many startups, the same patterns tend to emerge as Foundry usage scales:&lt;/P&gt;
&lt;UL data-start="3609" data-end="3876"&gt;
&lt;LI data-start="3609" data-end="3677"&gt;Multiple clients and services calling the same Foundry endpoints&lt;/LI&gt;
&lt;LI data-start="3678" data-end="3738"&gt;The need for consistent rate limiting and access control&lt;/LI&gt;
&lt;LI data-start="3739" data-end="3813"&gt;A desire to evolve models or deployments without touching every client&lt;/LI&gt;
&lt;LI data-start="3814" data-end="3876"&gt;Limited visibility into who is calling what, and how often&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3878" data-end="3971"&gt;None of these are problems at small scale. But together, they create friction as usage grows.&lt;/P&gt;
&lt;H2 data-start="3973" data-end="4011"&gt;A pattern we often see working well&lt;/H2&gt;
&lt;P data-start="4013" data-end="4103"&gt;A common pattern at this stage is placing a &lt;STRONG data-start="4057" data-end="4102"&gt;gateway in front of Microsoft Foundry APIs&lt;/STRONG&gt;.&lt;/P&gt;
&lt;img&gt;Client applications call a single gateway endpoint, where policies such as authentication, rate limits, and routing are applied before requests are forwarded to Foundry model deployments.&lt;/img&gt;
&lt;P data-start="4105" data-end="4324"&gt;Rather than having every application talk directly to Foundry, teams introduce a control layer that sits between clients and Foundry.&lt;/P&gt;
&lt;P data-start="4105" data-end="4324"&gt;On Azure, this is often implemented using &lt;STRONG data-start="4281" data-end="4323"&gt;API Management with GenAI capabilities&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="4326" data-end="4470"&gt;This gateway does not replace Foundry. Foundry remains the model and AI platform. The gateway simply becomes the entry point for client traffic.&lt;/P&gt;
&lt;H2 data-start="4472" data-end="4504"&gt;What this enables in practice&lt;/H2&gt;
&lt;P data-start="4506" data-end="4576"&gt;When teams introduce a gateway layer, a few things become much easier:&lt;/P&gt;
&lt;UL data-start="4578" data-end="4914"&gt;
&lt;LI data-start="4578" data-end="4669"&gt;&lt;STRONG data-start="4580" data-end="4612"&gt;A single, stable API surface&lt;/STRONG&gt; for applications, even as models or deployments evolve&lt;/LI&gt;
&lt;LI data-start="4670" data-end="4748"&gt;&lt;STRONG data-start="4672" data-end="4717"&gt;Centralized throttling and authentication&lt;/STRONG&gt;, instead of per-client logic&lt;/LI&gt;
&lt;LI data-start="4749" data-end="4828"&gt;&lt;STRONG data-start="4751" data-end="4775"&gt;Policy-based routing&lt;/STRONG&gt; across models or backends without changing clients&lt;/LI&gt;
&lt;LI data-start="4829" data-end="4914"&gt;&lt;STRONG data-start="4831" data-end="4871"&gt;Improved request-level observability&lt;/STRONG&gt; into usage patterns, latency, and errors&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4916" data-end="5077"&gt;Importantly, this structure lets teams scale without slowing down experimentation. Model teams can continue to iterate, while platform concerns stay centralized.&lt;/P&gt;
&lt;H2 data-start="5079" data-end="5106"&gt;What this pattern is not&lt;/H2&gt;
&lt;P data-start="5108" data-end="5159"&gt;It’s worth calling out what this approach is &lt;EM data-start="5153" data-end="5158"&gt;not&lt;/EM&gt;:&lt;/P&gt;
&lt;UL data-start="5161" data-end="5289"&gt;
&lt;LI data-start="5161" data-end="5197"&gt;It’s &lt;STRONG data-start="5168" data-end="5195"&gt;not required on day one&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="5198" data-end="5242"&gt;It’s &lt;STRONG data-start="5205" data-end="5240"&gt;not mandatory for every startup&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="5243" data-end="5289"&gt;It’s &lt;STRONG data-start="5250" data-end="5287"&gt;not about adding complexity early&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="5291" data-end="5468"&gt;Many teams run successfully without a gateway for a long time. This pattern becomes useful when scale, team size, or operational needs make direct integrations harder to manage.&lt;/P&gt;
&lt;H2 data-start="5470" data-end="5505"&gt;When teams usually consider this&lt;/H2&gt;
&lt;P data-start="5507" data-end="5564"&gt;From experience, teams tend to explore this pattern when:&lt;/P&gt;
&lt;UL data-start="5566" data-end="5794"&gt;
&lt;LI data-start="5566" data-end="5620"&gt;Foundry usage spans multiple applications or teams&lt;/LI&gt;
&lt;LI data-start="5621" data-end="5675"&gt;Rate limits and quotas need consistent enforcement&lt;/LI&gt;
&lt;LI data-start="5676" data-end="5740"&gt;There’s a desire to future-proof model or deployment changes&lt;/LI&gt;
&lt;LI data-start="5741" data-end="5794"&gt;Observability and governance start to matter more&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="5796" data-end="5895"&gt;If those conversations are already happening, it’s often a good time to look at a gateway approach.&lt;/P&gt;
&lt;H2 data-start="5897" data-end="5923"&gt;How this looks on Azure&lt;/H2&gt;
&lt;P data-start="5925" data-end="5978"&gt;On Azure, this pattern is commonly implemented using:&lt;/P&gt;
&lt;UL data-start="5980" data-end="6147"&gt;
&lt;LI data-start="5980" data-end="6023"&gt;&lt;STRONG data-start="5982" data-end="6006"&gt;Azure API Management&lt;/STRONG&gt; as the gateway&lt;/LI&gt;
&lt;LI data-start="6024" data-end="6092"&gt;&lt;STRONG data-start="6026" data-end="6047"&gt;AI-aware policies&lt;/STRONG&gt; for rate limiting, routing, and governance&lt;/LI&gt;
&lt;LI data-start="6093" data-end="6147"&gt;&lt;STRONG data-start="6095" data-end="6115"&gt;Microsoft Foundry&lt;/STRONG&gt; as the backend model platform&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="6149" data-end="6252"&gt;The architecture stays flexible. Teams can start simple and add capabilities over time as needs evolve.&lt;/P&gt;
&lt;H2 data-start="6254" data-end="6273"&gt;Closing thoughts&lt;/H2&gt;
&lt;P data-start="6275" data-end="6332"&gt;This pattern is less about tooling and more about timing.&lt;/P&gt;
&lt;P data-start="6334" data-end="6543"&gt;Adding a gateway too early can slow teams down. Adding it too late can make change painful. The right moment is usually when Foundry usage starts to feel like a shared platform rather than a single experiment.&lt;/P&gt;
&lt;P data-start="6545" data-end="6637"&gt;For teams approaching that stage, a gateway can provide structure without taking away speed.&lt;/P&gt;
&lt;H2 data-start="6639" data-end="6652"&gt;References&lt;/H2&gt;
&lt;UL data-start="6654" data-end="7120"&gt;
&lt;LI data-start="6654" data-end="6830"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/ai-foundry/configuration/enable-ai-api-management-gateway-portal?view=foundry" target="_blank" rel="noopener"&gt;Enable API Management gateway for Microsoft Foundry &lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="6654" data-end="6830"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities" target="_blank" rel="noopener"&gt;GenAI gateway capabilities in API Management&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="6968" data-end="7120"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-multi-backend" target="_blank" rel="noopener"&gt;Gateway patterns for multi-backend AI setups&lt;/A&gt;&lt;BR data-start="7014" data-end="7017" /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 27 Jan 2026 03:24:31 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/when-and-why-startups-add-a-gateway-in-front-of-microsoft/ba-p/4489490</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-01-27T03:24:31Z</dc:date>
    </item>
    <item>
      <title>Founders Hub billing issue - stuck between Microsoft and ISV with no resolution path</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/founders-hub-billing-issue-stuck-between-microsoft-and-isv-with/m-p/4486084#M110</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm reaching out to the community because I've hit a wall with standard support channels and hoping someone from the Founders Hub team or other founders who've faced similar issues can help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;QUICK SUMMARY&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I'm a Founders Hub participant who got charged ~€1,000 for using Claude (Anthropic) models through Azure AI Foundry. I assumed these were covered by my $25k Sponsorship credits since:&lt;/P&gt;&lt;P&gt;• I used ai.azure.com (not a separate marketplace)&lt;/P&gt;&lt;P&gt;• There was no clear warning about separate billing&lt;/P&gt;&lt;P&gt;• There's no way to monitor Marketplace spending in Founders Hub&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;THE PROBLEM:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I'm stuck in a loop:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;• Microsoft says "contact the ISV for refund approval"&lt;/P&gt;&lt;P&gt;• Anthropic says "billing is handled by Microsoft, not us"&lt;/P&gt;&lt;P&gt;• The "Support" link Microsoft points to redirects back to Microsoft Support&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've sent detailed emails explaining this circular situation. Support keeps responding with the same copy-paste policy text, ignoring my request for escalation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;MY ASK&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1. Is there a dedicated support channel for Founders Hub billing issues?&lt;/P&gt;&lt;P&gt;2. Has anyone from the Founders Hub team dealt with similar Marketplace confusion before?&lt;/P&gt;&lt;P&gt;3. Any founders here who successfully resolved unexpected Marketplace charges?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not trying to get something for free - I already paid the first invoice (€460). I just want help with the second invoice that surprised me a month later.&lt;/P&gt;&lt;P&gt;The Founders Hub program has been great otherwise, but this experience has been really frustrating. Any pointers would be hugely appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;Bartek&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jan 2026 19:39:05 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/founders-hub-billing-issue-stuck-between-microsoft-and-isv-with/m-p/4486084#M110</guid>
      <dc:creator>Playpals</dc:creator>
      <dc:date>2026-01-15T19:39:05Z</dc:date>
    </item>
    <item>
      <title>Azure has three permission systems, and you're probably confusing them</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-has-three-permission-systems-and-you-re-probably-confusing/ba-p/4471854</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Series: Azure Governance for Digital Natives and Startups:&amp;nbsp;&lt;/STRONG&gt;This is&amp;nbsp;&lt;STRONG&gt;Part 1&lt;/STRONG&gt;&amp;nbsp;of a 3-part series on Azure governance for digital-native companies scaling on Azure.&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Part 1&lt;/STRONG&gt; (this post): The three-plane model: Identity, Resources, and Billing&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Part 2&lt;/STRONG&gt;: &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/marketplace-governance-and-the-cross-plane-bridge/4510067" target="_blank" rel="noopener" data-lia-auto-title="Marketplace, Managed Identity, and where the planes collide&amp;nbsp;" data-lia-auto-title-active="0"&gt;Marketplace, Managed Identity, and where the planes collide&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Part 3&lt;/STRONG&gt;: &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/role-structures-anti-patterns-and-the-10-governance-principles/4510070" target="_blank" rel="noopener" data-lia-auto-title="Anti-patterns, role structures, and the 10 principles of Azure governance" data-lia-auto-title-active="0"&gt;Anti-patterns, role structures, and the 10 principles of Azure governance&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Azure is a powerful cloud platform, but its governance model is widely misunderstood, especially in fast-moving, engineering-led organizations.&lt;/P&gt;
&lt;P&gt;After working with dozens of digital-native customers (AI startups, SaaS platforms, companies scaling from zero to millions in Azure spend), I've seen the same confusion play out over and over. Engineers can't see MACC credits. Finance can't see workloads. Global Admins think they own everything. And Marketplace purchases happen without anyone in Finance knowing.&lt;/P&gt;
&lt;P&gt;The root cause is always the same:&amp;nbsp;&lt;STRONG&gt;Azure is governed by three completely separate permission systems&lt;/STRONG&gt;, and most teams treat it like one.&lt;/P&gt;
&lt;P&gt;If you're a customer moving fast on Azure, you've likely heard these questions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;"Why can't my engineering Owner see MACC credits?"&lt;/LI&gt;
&lt;LI&gt;"Why can't a Billing Contributor deploy a VM?"&lt;/LI&gt;
&lt;LI&gt;"Why doesn't Global Admin let me access subscriptions?"&lt;/LI&gt;
&lt;LI&gt;"Why can a Contributor deploy AKS but not buy Snowflake?"&lt;/LI&gt;
&lt;LI&gt;"Why does Cost Management Reader show cost but not credit balance?"&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These questions appear in nearly every customer I work with: AI companies consuming Azure OpenAI at scale, SaaS companies running global AKS footprints, and digital natives under Microsoft Azure Consumption Commitments (MACC).&lt;/P&gt;
&lt;P&gt;This guide breaks down the entire model with practical patterns and deep insight into each plane — so these questions are never confusing again.&lt;/P&gt;
&lt;H2&gt;Why digital natives struggle with this&lt;/H2&gt;
&lt;P&gt;Before diving into the technical model, it's worth understanding&amp;nbsp;&lt;EM&gt;why&lt;/EM&gt;&amp;nbsp;this causes so much friction in digital-native companies specifically. These problems hit startups and scaling companies harder than traditional enterprises for three reasons:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Speed over governance.&lt;/STRONG&gt;&amp;nbsp;Engineering-led companies prioritize shipping over process. Governance is added retroactively, often after something goes wrong.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Flat org structures.&lt;/STRONG&gt; Without clear Platform, Finance, and Security functions, the same people end up with roles across multiple planes creating exactly the kind of role sprawl the three-plane model was designed to prevent.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;MACC commitments.&lt;/STRONG&gt;&amp;nbsp;Digital natives under MACC have a financial relationship with Azure that most team members don't even know exists. When engineers can't see MACC burn and finance can't see resource usage, nobody has the full picture.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The result is predictable:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;What They Expect&lt;/th&gt;&lt;th&gt;What They Actually Get&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Engineers&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;"I'm Owner, I should see everything, including billing"&lt;/td&gt;&lt;td&gt;RBAC gives full resource control but zero billing visibility&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Finance&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;"I need to see what's running so I can forecast"&lt;/td&gt;&lt;td&gt;Billing Reader shows credits and invoices but not workloads&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Security&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;"I'm Global Admin, I have total control"&lt;/td&gt;&lt;td&gt;Entra controls identity but not resources or billing&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Procurement&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;"I need to buy Marketplace software for the team"&lt;/td&gt;&lt;td&gt;Marketplace purchases require billing roles, not RBAC&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Leadership&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;"I want a single dashboard for cost, resources, and credits"&lt;/td&gt;&lt;td&gt;No single role spans all three planes; you need a combination&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;When these expectations go unaddressed: engineers get billing access "just to see costs" (creating financial risk), Marketplace purchases happen without finance oversight, and Global Admin is treated as the "master key" when it controls only one of three planes.&lt;/P&gt;
&lt;P&gt;The fix isn't more permissions. It's&amp;nbsp;&lt;STRONG&gt;the right permissions in the right plane for the right people&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;The three-plane model&lt;/H2&gt;
&lt;P&gt;Everything in Azure governance flows from this single truth:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Plane&lt;/th&gt;&lt;th&gt;Controls&lt;/th&gt;&lt;th&gt;Example Roles&lt;/th&gt;&lt;th&gt;See Billing?&lt;/th&gt;&lt;th&gt;Deploy Resources?&lt;/th&gt;&lt;th&gt;Manage Identity?&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Microsoft Entra&lt;/STRONG&gt;&amp;nbsp;(Identity)&lt;/td&gt;&lt;td&gt;Users, groups, MFA, PIM, Conditional Access&lt;/td&gt;&lt;td&gt;Global Admin, Groups Admin, PIM Admin&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Azure RBAC&lt;/STRONG&gt;&amp;nbsp;(Resources)&lt;/td&gt;&lt;td&gt;VMs, AKS, Storage, AOAI, networking, policies&lt;/td&gt;&lt;td&gt;Owner, Contributor, Reader&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Billing / Commerce&lt;/STRONG&gt;&amp;nbsp;(Financial)&lt;/td&gt;&lt;td&gt;Invoices, credits, MACC, payments, Marketplace purchases&lt;/td&gt;&lt;td&gt;Billing Owner, Billing Reader&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Three planes. Zero overlap. A role in one plane grants&amp;nbsp;&lt;STRONG&gt;zero&lt;/STRONG&gt;&amp;nbsp;access in the others.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Entra Global Admin can't access subscriptions.&lt;/LI&gt;
&lt;LI&gt;Subscription Owner can't see the MACC balance.&lt;/LI&gt;
&lt;LI&gt;Billing Account Owner can't deploy resources.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This separation is by design. Once your company internalizes it, governance becomes dramatically more predictable.&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Plane 1: Microsoft Entra (Identity Plane)&lt;/H2&gt;
&lt;P&gt;&lt;EM&gt;Security, authentication, authorization, administrative boundaries.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Microsoft Entra (formerly Azure AD) is the authoritative identity provider for Azure. It governs identity, authentication, Conditional Access, PIM (Privileged Identity Management), group membership, and tenant-wide administrative policies. Entra is the security boundary for the entire tenant.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;💡 Common misunderstanding:&lt;/STRONG&gt;&amp;nbsp;&lt;EM&gt;"I'm Global Admin, why can't I access subscriptions?"&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;Because Entra roles do&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt;&amp;nbsp;grant Azure RBAC permissions by default. This behavior is intentional and foundational. A compromised Global Admin cannot delete all subscriptions. A compromised Subscription Owner cannot compromise directory security. Identity and infrastructure operate independently for resiliency.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;What Entra roles can do&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Create and manage users&lt;/LI&gt;
&lt;LI&gt;Manage MFA &amp;amp; Conditional Access&lt;/LI&gt;
&lt;LI&gt;Approve PIM requests&lt;/LI&gt;
&lt;LI&gt;Manage security settings&lt;/LI&gt;
&lt;LI&gt;Create/assign groups (which can then hold RBAC roles)&lt;/LI&gt;
&lt;LI&gt;Manage enterprise applications, OIDC apps, etc.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;What Entra roles cannot do&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Action&lt;/th&gt;&lt;th&gt;Allowed?&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Deploy resources&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Access subscriptions&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;View MACC credits&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Make Marketplace purchases&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Modify billing profiles&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Change RBAC roles&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Access data or storage accounts&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Most relevant Entra roles for startups&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Entra Role&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Global Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Full directory control (identity, security)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Privileged Role Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Manages privileged role assignments&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Groups Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Creates and manages groups (often used for RBAC assignments)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Conditional Access Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Manages CA policies&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Authentication Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Controls authentication settings&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Security Administrator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Manages security policies and alerts&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Key insight:&lt;/STRONG&gt;&amp;nbsp;Entra governs&amp;nbsp;&lt;EM&gt;identity and security&lt;/EM&gt;, not cloud resources or billing. Because Entra manages groups, and groups are often used for RBAC assignments, Entra is the root of&amp;nbsp;&lt;EM&gt;who can be given access,&amp;nbsp;&lt;/EM&gt;but not&amp;nbsp;&lt;EM&gt;what access they have&lt;/EM&gt;. This is where many organizations misunderstand the boundary.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Plane 2: Azure RBAC (Resource Plane)&lt;/H2&gt;
&lt;P&gt;&lt;EM&gt;Everything engineering touches: workloads, clusters, deployments, pipelines, resources.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Azure RBAC is the backbone of the Azure operational model. It controls all deployments (IaC, CLI, Portal, API), resource creation &amp;amp; modification, monitoring &amp;amp; diagnostics, Key Vault, Storage, Networking, AKS cluster operations, Azure OpenAI deployments, everything under Azure Resource Manager (ARM).&lt;/P&gt;
&lt;H3&gt;RBAC scopes&lt;/H3&gt;
&lt;P&gt;RBAC can be assigned at: Tenant root → Management group → Subscription → Resource group → Individual resource → Sub-resource (e.g., Key Vault secret).&lt;/P&gt;
&lt;H3&gt;RBAC role behavior&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Can Deploy?&lt;/th&gt;&lt;th&gt;Can View Usage Cost?&lt;/th&gt;&lt;th&gt;Can View Billing/MACC?&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Owner&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Contributor&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Reader&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes (limited)&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Cost Management Reader&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;User Access Admin&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;The critical point:&lt;/STRONG&gt;&amp;nbsp;RBAC cannot see billing. RBAC cannot view MACC. RBAC cannot read invoices. RBAC cannot approve Marketplace purchases. Even&amp;nbsp;&lt;EM&gt;Owner&lt;/EM&gt;, the highest role in the resource plane, is&amp;nbsp;&lt;STRONG&gt;blind&lt;/STRONG&gt;&amp;nbsp;to billing.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Plane 3: Azure Billing/Commerce (Financial Plane)&lt;/H2&gt;
&lt;P&gt;&lt;EM&gt;Governed by the Microsoft Commerce Platform, not Azure Resource Manager.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;This plane governs the financial relationship between the customer and Microsoft: billing accounts, invoices, credits (MACC, Azure credits, grants), commitments, discounts, payment methods, invoice sections, Marketplace SaaS purchases, reservations &amp;amp; savings plans, and private offers. Commerce roles live in an entirely different system from RBAC.&lt;/P&gt;
&lt;H3&gt;Common billing roles&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Can see credits?&lt;/th&gt;&lt;th&gt;Can deploy?&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Billing Account Owner&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;Full financial authority&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Billing Contributor&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;Can update payment methods&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Billing Reader&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;Most finance teams use this&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Invoice Section Owner&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;Scoped financial management&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;What billing roles can see:&lt;/STRONG&gt;&amp;nbsp;MACC balance, credits, invoices, payment history, reservations &amp;amp; savings plans (financial side), and Marketplace purchase capabilities.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What billing roles cannot do:&lt;/STRONG&gt;&amp;nbsp;deploy anything, modify RBAC, access resources, see workloads, change policy, or access cost analysis at resource group level.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Billing is where MACC lives.&lt;/STRONG&gt;&amp;nbsp;MACC (Azure Consumption Commitment) visibility is tied to Billing Account Owner, Billing Account Contributor, and Billing Reader. Even a subscription Owner cannot see MACC burn rate. This single point causes confusion in almost every startup onboarding Azure.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Full comparison matrix&lt;/H2&gt;
&lt;P&gt;When you need to answer "who can see what?" Use this table:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Data type&lt;/th&gt;&lt;th&gt;System&lt;/th&gt;&lt;th&gt;Who can see it&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Resource usage cost&lt;/td&gt;&lt;td&gt;ARM (RBAC)&lt;/td&gt;&lt;td&gt;Cost Mgmt Reader, Owner, Contributor&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Resource inventory&lt;/td&gt;&lt;td&gt;ARM (RBAC)&lt;/td&gt;&lt;td&gt;Owner, Contributor, Reader&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Budgets &amp;amp; cost alerts&lt;/td&gt;&lt;td&gt;ARM (RBAC)&lt;/td&gt;&lt;td&gt;Owner, Contributor, Cost Mgmt Reader&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Azure OpenAI cost analysis&lt;/td&gt;&lt;td&gt;ARM (RBAC)&lt;/td&gt;&lt;td&gt;RBAC roles&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;MACC credit balance&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Invoices &amp;amp; payments&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Marketplace private offers&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Commercial discounts&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;💡 If your engineering lead says "I can see costs" and your CFO says "I can see costs", they are looking at different data from different systems.&lt;/STRONG&gt;&amp;nbsp;Both are right. Neither has the full picture.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;The #1 source of confusion: Cost Management Reader vs. Billing Reader&lt;/H2&gt;
&lt;P&gt;This is the single most frequent misunderstanding in Azure governance. These two roles sound similar. They are completely different systems.&lt;/P&gt;
&lt;H3&gt;Cost Management Reader (RBAC Plane)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Can see:&lt;/STRONG&gt;&amp;nbsp;usage-based resource cost, cost by tags, cost by resource, cost forecast, budgets &amp;amp; alerts.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Cannot see:&lt;/STRONG&gt;&amp;nbsp;credits, invoices, payments, MACC, private offers, or contract terms.&lt;/P&gt;
&lt;H3&gt;Billing Reader (Commerce Plane)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Can see:&lt;/STRONG&gt;&amp;nbsp;invoices, credits, payments, MACC balance, Marketplace transaction history.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Cannot see:&lt;/STRONG&gt;&amp;nbsp;resource-level cost breakdown, cost by tags, subscription usage trends, or resource inventory.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Data type&lt;/th&gt;&lt;th&gt;Where it lives&lt;/th&gt;&lt;th&gt;Who can see it&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource usage cost&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Azure Cost Management (ARM)&lt;/td&gt;&lt;td&gt;Cost Mgmt Reader, Owner, Contributor&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Budgets &amp;amp; cost alerts&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;ARM&lt;/td&gt;&lt;td&gt;Owner, Contributor, Cost Mgmt Reader&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;MACC credit balance&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Invoices&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Marketplace private offers&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Commercial discounts&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Commerce Platform&lt;/td&gt;&lt;td&gt;Billing roles only&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Cost visibility (usage-based cost) comes from RBAC. Billing visibility (credits, invoices, MACC) comes from Commerce. These are two completely different datasets. When you understand this distinction, half of the "why can't I see…?" questions answer themselves.&lt;/P&gt;
&lt;H2&gt;Quick start: where to set this up&lt;/H2&gt;
&lt;P&gt;Here's exactly where each plane is configured, in the Portal and via CLI.&lt;/P&gt;
&lt;H3&gt;Microsoft Entra (Identity Plane)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Portal:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://portal.azure.com" target="_blank" rel="noopener"&gt;Azure Portal&lt;/A&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Microsoft Entra ID&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Roles and administrators&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# List Entra directory role assignments az rest --method GET --url "https://graph.microsoft.com/v1.0/directoryRoles" # Add a user to a directory role az ad group member add --group "Groups Administrator" --member-id &amp;lt;user-object-id&amp;gt;&lt;/LI-CODE&gt;
&lt;H3&gt;Azure RBAC (Resource Plane)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Portal:&lt;/STRONG&gt;&amp;nbsp;Subscription →&amp;nbsp;&lt;STRONG&gt;Access Control (IAM)&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Add role assignment&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Assign Contributor at subscription scope az role assignment create \ --assignee "user@contoso.com" \ --role "Contributor" \ --scope "/subscriptions/{subscription-id}" # Assign Cost Management Reader at resource group scope az role assignment create \ --assignee "user@contoso.com" \ --role "Cost Management Reader" \ --scope "/subscriptions/{sub-id}/resourceGroups/{rg-name}"&lt;/LI-CODE&gt;
&lt;H3&gt;Azure Billing/Commerce (Financial Plane)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Portal:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://portal.azure.com" target="_blank" rel="noopener"&gt;Azure Portal&lt;/A&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Cost Management + Billing&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Billing scopes&lt;/STRONG&gt;&amp;nbsp;→ select billing account →&amp;nbsp;&lt;STRONG&gt;Access control (IAM)&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# List billing accounts az billing account list --output table # Assign Billing Reader via REST API az rest --method PUT \ --url "https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billing-account-id}/billingRoleAssignments/{id}?api-version=2024-04-01" \ --body '{"properties":{"principalId":"{user-object-id}","roleDefinitionId":"/providers/Microsoft.Billing/billingAccounts/{billing-account-id}/billingRoleDefinitions/{billing-reader-role-id}"}}'&lt;/LI-CODE&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/overview" target="_blank" rel="noopener"&gt;Azure RBAC Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/role-based-access-control/rbac-and-directory-admin-roles" target="_blank" rel="noopener"&gt;Entra Directory &amp;amp; Admin Roles&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cost-management-billing/manage/understand-mca-roles" target="_blank" rel="noopener"&gt;Billing Roles (Microsoft Customer Agreement)&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/cost-management-billing/costs/assign-access-acm-data" target="_blank" rel="noopener"&gt;Assign Access to Cost Management Data&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;What's next →&lt;/STRONG&gt;&amp;nbsp;This post established the foundation: Azure's three permission planes are separate by design. But the real complexity begins where these planes&amp;nbsp;&lt;EM&gt;intersect&lt;/EM&gt;. &lt;BR /&gt;&lt;BR /&gt;In&amp;nbsp; the &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/marketplace-governance-and-the-cross-plane-bridge/4510067" target="_blank" rel="noopener" data-lia-auto-title="part 2" data-lia-auto-title-active="0"&gt;part 2&lt;/A&gt;, we'll explore &lt;STRONG&gt;Marketplace governance,&amp;nbsp;&lt;/STRONG&gt;where resource deployment meets financial authority along with&amp;nbsp;&lt;STRONG&gt;Managed Identity&lt;/STRONG&gt;, the one construct that bridges two planes, and&amp;nbsp;&lt;STRONG&gt;ABAC&lt;/STRONG&gt; for advanced conditional governance.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Thu, 09 Apr 2026 21:10:41 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-has-three-permission-systems-and-you-re-probably-confusing/ba-p/4471854</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-04-09T21:10:41Z</dc:date>
    </item>
    <item>
      <title>Azure capacity planning: Using quotas, reservations, vmss instance mix, and compute fleet</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-capacity-planning-using-quotas-reservations-vmss-instance/ba-p/4464893</link>
      <description>&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Introduction&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Over the past few months,&amp;nbsp;I’ve&amp;nbsp;been helping several digital-native customers navigate&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;capacity constraints&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;while scaling AI and compute-intensive workloads on Azure.&lt;/SPAN&gt; &lt;SPAN data-contrast="auto"&gt;Many teams run into the same frustrating message:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;“SkuNotAvailable&amp;nbsp;– The requested size is currently not available in the location.”&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This post summarizes the strategy&amp;nbsp;I’ve&amp;nbsp;been using to help customers design around these challenges combining&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Quota Groups&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservations (ODCR)&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, VMSS Instance Mix, and &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;These tools&amp;nbsp;don’t&amp;nbsp;create capacity where none exists, but together, when paired with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;proactive alerts&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, they form a practical playbook for scaling reliably through regional constraints.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Quota vs. Capacity:&amp;nbsp;What’s&amp;nbsp;the&amp;nbsp;difference?&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Concept&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;What It Is&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Who Controls It&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Can You Fix It Yourself?&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Quota&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;A&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;logical limit&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;on how many vCPUs or specific VM series you can deploy.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Microsoft (adjustable on request).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Yes,&amp;nbsp;request an increase.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Capacity&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;physical availability&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;of hardware in the datacenter.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure datacenter (supply and&amp;nbsp;utilization).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;❌&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;No,&amp;nbsp;if no servers exist, no deployment will succeed.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Example:&lt;/SPAN&gt;&lt;/U&gt;&lt;SPAN data-contrast="auto"&gt; You have 300 vCPUs of quota for the D-series in East US 2. You try to deploy 100 D8as_v5 VMs and get a failure. You open a support request and find:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="18" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Your quota is fine&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="18" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;But the region has no physical capacity for D8as_v5&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Even if Microsoft raised your quota to 1,000 vCPUs, the deployment would still fail because&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;quota ≠ capacity&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;Quota issue:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;You’ll see errors like&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;OperationNotAllowed&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;or&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;QuotaExceeded&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;Capacity issue:&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;&amp;nbsp;The message will be&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;SkuNotAvailable&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;&amp;nbsp;or&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;AllocationFailed&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;If you see a quota error, open the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Usage + quotas&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;blade and request an increase.&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN data-contrast="auto"&gt;If it’s a capacity error, switching zones, SKUs, or regions, or using VMSS Instance Mix &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;or Compute Fleet &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;is your best next step.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:720,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;“Quota is a number on paper. Capacity is&amp;nbsp;what’s&amp;nbsp;physically sitting in the racks.”&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Strategy 1: Quota management and Quota Groups&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure applies vCPU quotas by region and VM family (e.g., Dsv5, Esv5).&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Quota Groups&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;provide&amp;nbsp;a consolidated way to&amp;nbsp;monitor&amp;nbsp;and manage these logical limits across families.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Learn more:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/quotas/quota-groups" target="_blank" rel="noopener"&gt;Azure Quota Groups – Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/quotas/how-to-guide-monitoring-alerting" target="_blank" rel="noopener"&gt;Set up monitoring and alerts for quotas&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Quota limits are easy to overlook until automation or scale pipelines fail.&amp;nbsp;AI-heavy startups often discover too late that&amp;nbsp;they’ve&amp;nbsp;maxed out their quota family.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Best practices:&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI aria-setsize="-1" data-leveltext="%1." data-font="" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:0,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769242&amp;quot;:[65533,0],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;%1.&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Monitor with Quota Group alerts&lt;/STRONG&gt;&lt;U&gt;:&lt;/U&gt; &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Use&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Quota Alerts (preview)&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;to automatically&amp;nbsp;notify you&amp;nbsp;when usage reaches thresholds (e.g., 80%).&amp;nbsp;Alerts integrate with Azure Monitor and Action Groups.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="%1." data-font="" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:0,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769242&amp;quot;:[65533,0],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;%1.&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Request increases proactively&lt;/STRONG&gt;: &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Portal path:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Subscriptions → Usage + quotas → Request increase&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&amp;nbsp;Most CPU SKUs are approved quickly; GPUs can take longer.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="%1." data-font="" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:0,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769242&amp;quot;:[65533,0],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;%1.&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Plan by family, not by SKU&lt;/STRONG&gt;: &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;If you only check “D8as_v5 usage,” you may miss that the entire D-series family is at its quota limit.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Strategy 2: Capacity Reservations (ODCR)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;A&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservation&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;(formally&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;On-Demand Capacity Reservation&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, ODCR) lets you&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;pre-book physical infrastructure&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;in a specific region, zone, and VM size.&amp;nbsp;You’re&amp;nbsp;reserving capacity, not&amp;nbsp;committing to&amp;nbsp;a term or discount.&amp;nbsp;Azure holds those servers for your subscription, ensuring your workloads can always start.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Learn more:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="20" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;Capacity Reservations in Azure Virtual Machines&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="20" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cost-management-billing/reservations/save-compute-costs-reservations" target="_blank" rel="noopener"&gt;Save on compute with Azure Reservations&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservation vs. Reserved Instance (RI)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Aspect&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservation (ODCR)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Reserved Instance (RI)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Purpose&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Guarantees&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;capacity&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;(hardware availability).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Locks in&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;price&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;(discounted rate).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Scope&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Specific region, zone, and VM size.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Region and VM family.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Billing&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Pay-as-you-go,&amp;nbsp;no term commitment.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;1 or 3-year fixed term.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Guarantee&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Yes,&amp;nbsp;hardware is held for you.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;❌&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;No,&amp;nbsp;no guarantee.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Price Benefit&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;❌&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;None,&amp;nbsp;PAYG rate.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Up to ~70% discount.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Flexibility&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Modify or cancel anytime.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Bound to term.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;In short:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="15" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;ODCR = “Hold my spot in the datacenter.”&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="15" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;RI = “Give me a discount because I’ll keep using it.”&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;You can use both: &lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;ODCR&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;&amp;nbsp;for capacity,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;RI&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;&amp;nbsp;for savings.&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Example:&lt;/SPAN&gt;&lt;/U&gt;&lt;SPAN data-contrast="auto"&gt; A startup consistently runs 20× D16as_v5 VMs nightly for training. They reserve that capacity (ODCR) and apply RIs for discounts ensuring predictable performance&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;and&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;cost.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Limitations:&lt;/SPAN&gt;&lt;/U&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;You can’t reserve SKUs already out of stock.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;ODCR doesn’t autoscale, it holds your baseline.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Best for &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;core workloads&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, not ephemeral jobs.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Strategy 3: VMSS Instance Mix&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;Virtual Machine Scale Set (VMSS) Instance Mix is a feature of VMSS Flex that enables capacity-aware scaling across multiple VM sizes, and even across different purchase options (Standard and Spot).&amp;nbsp;When you define more than one acceptable VM size, Azure automatically chooses whichever has capacity available during scale-out.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Learn more:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/instance-mix-overview" target="_blank" rel="noopener"&gt;VMSS Instance Mix – Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Example:&lt;/SPAN&gt;&lt;/U&gt;&lt;SPAN data-contrast="auto"&gt; Here’s a simplified configuration snippet from an ARM or Bicep template using Instance Mix:&lt;/SPAN&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/U&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;"virtualMachineProfile": {
  "hardwareProfile": {
    "vmSizeProperties": {
      "vmSizes": [
        "Standard_D8as_v5",
        "Standard_E8as_v5",
        "Standard_F8s_v2"
      ]
    }
  }
}
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;VMSS Instance Mix helps you survive temporary SKU shortages by dynamically selecting the next available size, while Spot Priority Mix lets you blend Spot and Standard instances to reduce cost and improve resilience. This makes it ideal for large-scale app tiers, batch processing, and AI inference.&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Limitations:&lt;/SPAN&gt;&lt;/U&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Works across zones, not regions.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Doesn’t mix Spot + Standard in the same pool.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;Doesn’t reserve hardware capacity, it only improves allocation success rates.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Strategy&amp;nbsp;4:&amp;nbsp;Azure Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;can deploy up to&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;10,000 VMs&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;across multiple SKUs, zones, and (in preview) regions.&amp;nbsp;You define acceptable SKUs, and Azure picks the ones that have capacity.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Learn more:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="12" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-compute-fleet/overview" target="_blank" rel="noopener"&gt;Azure Compute Fleet – Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Fleet automatically:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Tries alternate SKUs (D8as_v5 → E8as_v5).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Expands to other zones or regions.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Combines &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Standard&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Spot&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;instances.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;In short, it automates the “try this, then that” logic,&amp;nbsp;improving your odds of successful deployment.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Example:&lt;/SPAN&gt;&lt;/U&gt;&lt;SPAN data-contrast="auto"&gt; A rendering studio needs 2,000 VMs nightly.&amp;nbsp;Fleet dynamically uses D8s_v5, D16s_v5, or E8s_v5 across East US 2 and West US 2, depending on live availability.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;SPAN data-contrast="auto"&gt;Limitations:&lt;/SPAN&gt;&lt;/U&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Fleet&amp;nbsp;doesn’t&amp;nbsp;create capacity&amp;nbsp;it just searches smarter.&amp;nbsp;If every zone and region is full, it still fails.&amp;nbsp;Ideal for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;AI training, batch jobs,&amp;nbsp;rendering, or HPC,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;not for stateful services.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H4&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;When to&amp;nbsp;use&amp;nbsp;what&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="width: 67.4074%; height: 234px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Scenario&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Best&amp;nbsp;tool&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;What&amp;nbsp;it solves&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Logical limits before deployment&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Quota Groups + Alerts&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Prevent hitting soft limits.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Guaranteed baseline&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservation (ODCR)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Reserve real hardware.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Managed autoscaling&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;VMSS Instance Mix&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Scale out despite partial shortages.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Large-scale/bursty workloads&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Try alternate SKUs and regions.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;GPU/high-demand SKUs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;ODCR + Fleet&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Reserve base, burst flexibly.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&lt;BR /&gt;Real Talk: There’s no magic when a datacenter is full. &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Let’s&amp;nbsp;be transparent:&amp;nbsp;If a region has no physical servers available, no tool can make capacity appear.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Quota Groups&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;remove logical blockers.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservations&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;secure what you need.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;VMSS Instance Mix&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;increase the odds of success.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Together, they&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;maximize probability&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, but none can override a physically full region.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;The Azure&amp;nbsp;capacity&amp;nbsp;strategy&amp;nbsp;flow&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H4&gt;&lt;SPAN data-contrast="auto"&gt;Final&amp;nbsp;thoughts&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;For fast-scaling digital-native companies, the right question isn’t&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;“How do I guarantee capacity?”. I&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;t’s&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;“How do I design for capacity uncertainty?”&lt;/SPAN&gt; &lt;SPAN data-contrast="auto"&gt;Start by putting the basics on autopilot:&lt;/SPAN&gt; &lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;Configure&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;Quota Group alerts&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="auto"&gt;&amp;nbsp;to prevent silent blockers.&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Use&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Capacity Reservations (ODCR)&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;to secure your baseline compute.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Add elasticity through &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;VMSS Instance Mix&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;and, when flexibility allows,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Compute Fleet&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Monitor everything with &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Azure Monitor alerts&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;— from quotas and reservations to scale-out failures and Fleet allocation health.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;💡&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Pro tip:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Combine&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Quota Group Alerts&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Reservation coverage monitoring&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;VMSS/Fleet deployment telemetry&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;in&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Azure Monitor&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;to detect issues early.&lt;/SPAN&gt; &amp;nbsp;&lt;SPAN data-contrast="auto"&gt;The faster you know&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;what kind of failure&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;you’re hitting, the faster you can act.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Accept that&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;capacity is finite&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, but also that&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;visibility is your greatest advantage&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;. Azure gives you multiple levers; success comes from knowing when and how to use each one together.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Over the past few months, I’ve supported multiple customers, from AI platforms to SaaS startups, who faced real capacity challenges in regions like East US 2 and West US 2. This post came directly from those experiences, with one goal: to help others move from reactive firefighting to&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;proactive, layered capacity planning&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt; &lt;SPAN data-contrast="auto"&gt;If your workloads are scaling fast, I hope this guide helps you build not just a plan, but a mindset, for running reliably when the cloud gets crowded.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H4&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Further&amp;nbsp;reading&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/quotas/quota-groups" target="_blank" rel="noopener"&gt;Azure Quota Groups – Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/quotas/how-to-guide-monitoring-alerting" target="_blank" rel="noopener"&gt;Monitoring and Alerting for Quotas&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-quota-alerts-preview-still-overlooked-but-incredibly-useful/4447140" target="_blank" rel="noopener" data-lia-auto-title="Azure Quota Alerts" data-lia-auto-title-active="0"&gt;Azure Quota Alerts&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;Capacity Reservations Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cost-management-billing/reservations/save-compute-costs-reservations" target="_blank" rel="noopener"&gt;Save on Compute with Azure Reservations&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/instance-mix-overview" target="_blank" rel="noopener"&gt;VMSS Instance Mix Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="21" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-compute-fleet/overview" target="_blank" rel="noopener"&gt;Azure Compute Fleet Overview&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 06 Nov 2025 16:47:25 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-capacity-planning-using-quotas-reservations-vmss-instance/ba-p/4464893</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-11-06T16:47:25Z</dc:date>
    </item>
    <item>
      <title>Azure Monitor 101: The missing guide to understanding monitoring on Azure</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-monitor-101-the-missing-guide-to-understanding-monitoring/ba-p/4462799</link>
      <description>&lt;H4 data-start="1044" data-end="1061"&gt;Introduction&lt;/H4&gt;
&lt;P data-start="1063" data-end="1363"&gt;Monitoring in the cloud is often misunderstood. Some think it’s about checking whether a virtual machine is up; others equate it with dashboards or alerts. In reality, &lt;STRONG data-start="1233" data-end="1292"&gt;monitoring is about visibility, correlation, and action&lt;/STRONG&gt;, and in Azure, that all converges in one platform: &lt;STRONG data-start="1343" data-end="1360"&gt;Azure Monitor&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="1365" data-end="1534"&gt;This article explains, in practical terms, how Azure Monitor works, the role of &lt;STRONG data-start="1445" data-end="1462"&gt;Log Analytics&lt;/STRONG&gt;, and how to build a foundation for observability across your workloads.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-start="1365" data-end="1534"&gt;If you’ve read our earlier posts, on &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/the-importance-of-setting-up-service-and-resource-health-monitoring-in-azure/4372478" target="_blank" rel="noopener" data-start="1577" data-end="1768" data-lia-auto-title="Service and Resource Health Monitoring" data-lia-auto-title-active="0"&gt;Service and Resource Health Monitoring&lt;/A&gt;, &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/advanced-alerting-strategies-for-azure-monitoring/4268698" target="_blank" rel="noopener" data-start="1770" data-end="1924" data-lia-auto-title="Advanced Alerting Strategies" data-lia-auto-title-active="0"&gt;Advanced Alerting Strategies&lt;/A&gt;, &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-workbooks-advanced-customization-and-data-visualization-in-azure/4369588" target="_blank" rel="noopener" data-start="1926" data-end="2102" data-lia-auto-title="Azure Workbooks Customization" data-lia-auto-title-active="0"&gt;Azure Workbooks Customization&lt;/A&gt;, or &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-monitor--melt-a-comprehensive-approach-to-cloud-observability/4251166" target="_blank" rel="noopener" data-start="2107" data-end="2271" data-lia-auto-title="Azure Monitor &amp;amp; MELT" data-lia-auto-title-active="0"&gt;Azure Monitor &amp;amp; MELT, &lt;/A&gt;this post ties them all together.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4 data-start="2314" data-end="2341"&gt;What Is Azure Monitor?&lt;/H4&gt;
&lt;P data-start="2343" data-end="2558"&gt;&lt;STRONG data-start="2343" data-end="2360"&gt;Azure Monitor&lt;/STRONG&gt; is Microsoft’s unified platform for collecting, analyzing, and acting on telemetry across applications, infrastructure, and networks, whether they run on Azure, hybrid, or multicloud environments.&lt;/P&gt;
&lt;P data-start="2560" data-end="2597"&gt;It helps you answer four questions:&lt;/P&gt;
&lt;OL data-start="2599" data-end="2723"&gt;
&lt;LI data-start="2599" data-end="2632"&gt;&lt;EM data-start="2602" data-end="2630"&gt;Is my environment healthy?&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-start="2633" data-end="2667"&gt;&lt;EM data-start="2636" data-end="2665"&gt;What’s happening right now?&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-start="2668" data-end="2693"&gt;&lt;EM data-start="2671" data-end="2691"&gt;Why did it happen?&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-start="2694" data-end="2723"&gt;&lt;EM data-start="2697" data-end="2721"&gt;What should I do next?&lt;/EM&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;The Building Blocks&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Layer&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Examples&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;1. Data Sources&lt;/td&gt;&lt;td&gt;Where telemetry originates: VMs, AKS, databases, applications, networks.&lt;/td&gt;&lt;td&gt;Activity Logs, Performance Counters, Container Metrics, App Insights telemetry.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2. Data Platform (Log Analytics)&lt;/td&gt;&lt;td&gt;Central workspace where logs are stored and queried using &lt;STRONG&gt;KQL&lt;/STRONG&gt;.&lt;/td&gt;&lt;td&gt;Diagnostic Settings → Workspace → Query → Alert/Workbook.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3.&amp;nbsp;&amp;nbsp;Insights &amp;amp; Visualizations&lt;/td&gt;&lt;td&gt;Built-in experiences that interpret raw data.&lt;/td&gt;&lt;td&gt;Azure Monitor for VMs, Containers, Apps, Network.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4.&amp;nbsp;&lt;STRONG&gt; &lt;/STRONG&gt;Action &amp;amp; Automation&lt;/td&gt;&lt;td&gt;Responding through alerts, workflows, or ITSM integrations.&lt;/td&gt;&lt;td&gt;Alerts + Action Groups → Teams, Logic Apps, PagerDuty.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H6&gt;&lt;STRONG&gt;Azure Monitor core layers&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H4&gt;&amp;nbsp;&lt;/H4&gt;
&lt;H4&gt;&amp;nbsp;&lt;/H4&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Metrics vs. Logs&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Aspect&lt;/th&gt;&lt;th&gt;Metrics&lt;/th&gt;&lt;th&gt;Logs&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Format&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Numeric values sampled over time&lt;/td&gt;&lt;td&gt;Text-based records with context&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Best for&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Performance monitoring and thresholds&lt;/td&gt;&lt;td&gt;Troubleshooting and auditing&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Examples&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;CPU %, latency, requests/sec&lt;/td&gt;&lt;td&gt;Error messages, policy changes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Store&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Azure Monitor metrics DB&lt;/td&gt;&lt;td&gt;Log Analytics workspace&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Metrics are fast and lightweight; logs are richer and more flexible. Both live under Azure Monitor.&lt;/P&gt;
&lt;H4&gt;The role of Log Analytics Workspace&lt;/H4&gt;
&lt;P data-start="4124" data-end="4197"&gt;If Azure Monitor is the nervous system, &lt;STRONG data-start="4164" data-end="4194"&gt;Log Analytics is the brain&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="4199" data-end="4382"&gt;Resources send diagnostic and activity data via &lt;STRONG data-start="4247" data-end="4270"&gt;Diagnostic Settings&lt;/STRONG&gt;, agents, or connectors. Once in the workspace, you can query everything using &lt;STRONG data-start="4351" data-end="4381"&gt;Kusto Query Language (KQL)&lt;/STRONG&gt;.&lt;/P&gt;
&lt;LI-CODE lang="kusto"&gt;AzureActivity
| where OperationNameValue contains "Delete"
| summarize Count = count() by Caller, bin(TimeGenerated, 1d)
&lt;/LI-CODE&gt;
&lt;P data-start="4517" data-end="4532"&gt;You can then:&lt;/P&gt;
&lt;UL data-start="4533" data-end="4699"&gt;
&lt;LI data-start="4533" data-end="4582"&gt;Create &lt;STRONG data-start="4542" data-end="4552"&gt;alerts&lt;/STRONG&gt; that fire on query results.&lt;/LI&gt;
&lt;LI data-start="4583" data-end="4639"&gt;Build &lt;STRONG data-start="4591" data-end="4604"&gt;workbooks&lt;/STRONG&gt; for dashboards and storytelling.&lt;/LI&gt;
&lt;LI data-start="4640" data-end="4699"&gt;Export data to &lt;STRONG data-start="4657" data-end="4670"&gt;Event Hub&lt;/STRONG&gt;, &lt;STRONG data-start="4672" data-end="4683"&gt;Storage&lt;/STRONG&gt;, or &lt;STRONG data-start="4688" data-end="4696"&gt;SIEM&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6 data-start="4701" data-end="4842"&gt;&lt;STRONG data-start="4701" data-end="4755"&gt;Log Analytics as the central data plane&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H6&gt;&lt;STRONG&gt;Data flow overview&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H4 data-start="5210" data-end="5229"&gt;The MELT Model&lt;/H4&gt;
&lt;P data-start="5231" data-end="5530"&gt;To understand observability holistically, adopt the &lt;STRONG data-start="5283" data-end="5291"&gt;MELT&lt;/STRONG&gt; framework:&amp;nbsp;&lt;STRONG data-start="5302" data-end="5339"&gt;Metrics, Events, Logs, and Traces,&amp;nbsp;&lt;/STRONG&gt;explained in detail in &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-monitor--melt-a-comprehensive-approach-to-cloud-observability/4251166" target="_blank" rel="noopener" data-start="5363" data-end="5527" data-lia-auto-title="Azure Monitor &amp;amp; MELT" data-lia-auto-title-active="0"&gt;Azure Monitor &amp;amp; MELT&lt;/A&gt;.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="width: 33.6111%; height: 199px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 35px;"&gt;&lt;th style="height: 35px;"&gt;Pillar&lt;/th&gt;&lt;th style="height: 35px;"&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Metrics&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;How your system performs&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Events&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;What changed&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Logs&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Why it happened&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 59px;"&gt;&lt;td style="height: 59px;"&gt;&lt;STRONG&gt;Traces&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 59px;"&gt;How requests flow through components&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4 data-start="5740" data-end="5788"&gt;From data to action: alerts and automation&lt;/H4&gt;
&lt;P data-start="5790" data-end="5815"&gt;Azure Monitor includes:&lt;/P&gt;
&lt;UL data-start="5816" data-end="5956"&gt;
&lt;LI data-start="5816" data-end="5865"&gt;&lt;STRONG data-start="5818" data-end="5835"&gt;Metric alerts&lt;/STRONG&gt; (near real-time thresholds)&lt;/LI&gt;
&lt;LI data-start="5866" data-end="5910"&gt;&lt;STRONG data-start="5868" data-end="5882"&gt;Log alerts&lt;/STRONG&gt; (KQL queries on schedule)&lt;/LI&gt;
&lt;LI data-start="5911" data-end="5956"&gt;&lt;STRONG data-start="5913" data-end="5936"&gt;Activity Log alerts&lt;/STRONG&gt; (platform events)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="5958" data-end="6037"&gt;Use &lt;STRONG data-start="5962" data-end="5979"&gt;Action Groups&lt;/STRONG&gt; to define responses: email, Teams, Logic App, or ticket.&lt;/P&gt;
&lt;P data-start="6039" data-end="6284"&gt;For advanced patterns like dynamic thresholds and suppression, see &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/advanced-alerting-strategies-for-azure-monitoring/4268698" target="_blank" rel="noopener" data-start="6106" data-end="6281" data-lia-auto-title="Advanced Alerting Strategies for Azure Monitoring" data-lia-auto-title-active="0"&gt;Advanced Alerting Strategies for Azure Monitoring&lt;/A&gt;.&lt;/P&gt;
&lt;H6 data-start="6286" data-end="6333"&gt;&lt;STRONG data-start="6286" data-end="6333"&gt;Alerting and automation workflow&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H4 data-start="6340" data-end="6372"&gt;Visualization and Workbooks&lt;/H4&gt;
&lt;P data-start="6374" data-end="6488"&gt;Workbooks transform data into decisions. Combine KQL queries, parameters, and visuals: all within the Azure portal.&lt;/P&gt;
&lt;LI-CODE lang="kusto"&gt;Perf
| where ObjectName == "Processor"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), Computer&lt;/LI-CODE&gt;
&lt;P&gt;To go beyond basics: multi-resource joins, conditional formatting, custom JSON parameters, see &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-workbooks-advanced-customization-and-data-visualization-in-azure/4369588" target="_blank" rel="noopener" data-start="6709" data-end="6927" data-lia-auto-title="Azure Workbooks: Advanced Customization and Data Visualization in Azure" data-lia-auto-title-active="0"&gt;Azure Workbooks: Advanced Customization and Data Visualization in Azure&lt;/A&gt;.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Example workbook visualization&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H4 data-start="6984" data-end="7027"&gt;Health Monitoring and Platform Signals&lt;/H4&gt;
&lt;P data-start="7029" data-end="7248"&gt;Azure provides &lt;STRONG data-start="7044" data-end="7062"&gt;Service Health&lt;/STRONG&gt; and &lt;STRONG data-start="7067" data-end="7086"&gt;Resource Health&lt;/STRONG&gt; to help differentiate between Azure-side issues and workload issues. They complement Azure Monitor by tracking platform events and maintenance notifications.&lt;/P&gt;
&lt;P data-start="7250" data-end="7521"&gt;Configuration guidance is available in &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/the-importance-of-setting-up-service-and-resource-health-monitoring-in-azure/4372478" target="_blank" rel="noopener" data-start="7289" data-end="7518" data-lia-auto-title="The Importance of Setting Up Service and Resource Health Monitoring in Azure" data-lia-auto-title-active="0"&gt;The Importance of Setting Up Service and Resource Health Monitoring in Azure&lt;/A&gt;.&lt;/P&gt;
&lt;H6 data-start="7523" data-end="7584"&gt;&lt;STRONG data-start="7523" data-end="7584"&gt;Service Health and Resource Health integration&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;
&lt;H4 data-start="7591" data-end="7625"&gt;Best practices for workspaces&lt;/H4&gt;
&lt;OL data-start="7627" data-end="7978"&gt;
&lt;LI data-start="7627" data-end="7714"&gt;&lt;STRONG data-start="7630" data-end="7658"&gt;Centralize intelligently: &lt;/STRONG&gt;aggregate where cross-resource correlation matters.&lt;/LI&gt;
&lt;LI data-start="7715" data-end="7782"&gt;&lt;STRONG data-start="7718" data-end="7735"&gt;Control costs: &lt;/STRONG&gt;use Data Collection Rules to filter noise.&lt;/LI&gt;
&lt;LI data-start="7783" data-end="7839"&gt;&lt;STRONG data-start="7786" data-end="7806"&gt;Manage retention: &lt;/STRONG&gt;align with compliance needs.&lt;/LI&gt;
&lt;LI data-start="7840" data-end="7904"&gt;&lt;STRONG data-start="7843" data-end="7860"&gt;Secure access:&amp;nbsp;&lt;/STRONG&gt;apply RBAC and table-level permissions.&lt;/LI&gt;
&lt;LI data-start="7905" data-end="7978"&gt;&lt;STRONG data-start="7908" data-end="7931"&gt;Automate deployment: &lt;/STRONG&gt;define diagnostics via Bicep or Terraform.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H5 data-start="7985" data-end="8011"&gt;Quick start checklist&lt;/H5&gt;
&lt;OL data-start="8013" data-end="8268"&gt;
&lt;LI data-start="8013" data-end="8055"&gt;Create a &lt;STRONG data-start="8025" data-end="8052"&gt;Log Analytics workspace&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="8056" data-end="8110"&gt;Enable &lt;STRONG data-start="8066" data-end="8089"&gt;Diagnostic Settings&lt;/STRONG&gt; for key resources.&lt;/LI&gt;
&lt;LI data-start="8111" data-end="8157"&gt;Run a basic &lt;STRONG data-start="8126" data-end="8139"&gt;KQL query&lt;/STRONG&gt; to verify data.&lt;/LI&gt;
&lt;LI data-start="8158" data-end="8213"&gt;Configure a &lt;STRONG data-start="8173" data-end="8189"&gt;metric alert&lt;/STRONG&gt; and &lt;STRONG data-start="8194" data-end="8210"&gt;action group&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="8214" data-end="8268"&gt;Build a simple &lt;STRONG data-start="8232" data-end="8244"&gt;workbook&lt;/STRONG&gt; to visualize results.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-start="8270" data-end="8346"&gt;You now have a full feedback loop: &lt;EM data-start="8305" data-end="8346"&gt;data → query → alert → visualize → act.&lt;/EM&gt;&lt;/P&gt;
&lt;H4 data-start="8353" data-end="8386"&gt;Next steps &amp;amp; further reading&lt;/H4&gt;
&lt;UL data-start="8388" data-end="9131"&gt;
&lt;LI data-start="8388" data-end="8592"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/the-importance-of-setting-up-service-and-resource-health-monitoring-in-azure/4372478" target="_blank" rel="noopener" data-start="8390" data-end="8590" data-lia-auto-title="Service and Resource Health Monitoring in Azure" data-lia-auto-title-active="0"&gt;Service and Resource Health Monitoring in Azure&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="8593" data-end="8772"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/advanced-alerting-strategies-for-azure-monitoring/4268698" target="_blank" rel="noopener" data-start="8595" data-end="8770" data-lia-auto-title="Advanced Alerting Strategies for Azure Monitoring" data-lia-auto-title-active="0"&gt;Advanced Alerting Strategies for Azure Monitoring&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="8773" data-end="8962"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-workbooks-advanced-customization-and-data-visualization-in-azure/4369588" target="_blank" rel="noopener" data-start="8775" data-end="8960" data-lia-auto-title="Azure Workbooks Advanced Customization" data-lia-auto-title-active="0"&gt;Azure Workbooks Advanced Customization&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="8963" data-end="9131"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/azure-monitor--melt-a-comprehensive-approach-to-cloud-observability/4251166" target="_blank" rel="noopener" data-start="8965" data-end="9129" data-lia-auto-title="Azure Monitor &amp;amp; MELT" data-lia-auto-title-active="0"&gt;Azure Monitor &amp;amp; MELT&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="9133" data-end="9225"&gt;Together these form a complete learning path, from monitoring basics to full observability.&lt;/P&gt;
&lt;H4 data-start="9232" data-end="9247"&gt;Conclusion&lt;/H4&gt;
&lt;P data-start="9249" data-end="9469"&gt;Azure Monitor is more than a tool, it’s the &lt;STRONG data-start="9292" data-end="9318"&gt;observability backbone&lt;/STRONG&gt; of Azure. Once you understand its layers, the rest of the ecosystem, health alerts, workbooks, advanced rules, and MELT falls naturally into place.&lt;/P&gt;
&lt;P data-start="9471" data-end="9611"&gt;Start simple. Connect a resource, explore your workspace, and let data guide your next question. That’s when monitoring becomes insight.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2025 14:24:09 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-monitor-101-the-missing-guide-to-understanding-monitoring/ba-p/4462799</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-10-20T14:24:09Z</dc:date>
    </item>
    <item>
      <title>Monitoring Azure OpenAI without switching from your existing observability platform</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/monitoring-azure-openai-without-switching-from-your-existing/ba-p/4458898</link>
      <description>&lt;P&gt;Recently, one of my customers asked me a simple but powerful question:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;“We already use Datadog for observability, but the rate-limit metrics we see in the Azure Portal don’t match what we get in Datadog. Why does Azure show higher TPM numbers?”&lt;/EM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;BR /&gt;That question led to a deeper conversation about how Azure measures rate limits for Azure OpenAI.&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-style: var(--lia-blog-font-style); font-family: var(--lia-blog-font-family); font-size: var(--lia-bs-font-size-base); -webkit-tap-highlight-color: hsla(var(--lia-bs-black-h),var(--lia-bs-black-s),var(--lia-bs-black-l),0); -webkit-text-size-adjust: 100%;"&gt;They weren’t necessarily trying to move away from Datadog, in fact, they already have a mature observability stack built on it, but they wanted to understand and monitor Azure OpenAI usage directly in the portal.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-style: var(--lia-blog-font-style); font-family: var(--lia-blog-font-family); font-size: var(--lia-bs-font-size-base); -webkit-tap-highlight-color: hsla(var(--lia-bs-black-h),var(--lia-bs-black-s),var(--lia-bs-black-l),0); -webkit-text-size-adjust: 100%;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN style="font-style: var(--lia-blog-font-style); font-family: var(--lia-blog-font-family); font-size: var(--lia-bs-font-size-base); -webkit-tap-highlight-color: hsla(var(--lia-bs-black-h),var(--lia-bs-black-s),var(--lia-bs-black-l),0); -webkit-text-size-adjust: 100%;"&gt;After reviewing &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota" target="_blank" rel="noopener"&gt;the documentation&lt;/A&gt; and confirming with Azure OpenAI Engineering team, the answer made sense:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure’s Tokens-Per-Minute (TPM) metric is based on an estimated token count derived from the character length of the request, not the exact tokenized count used for billing.&lt;/LI&gt;
&lt;LI&gt;This estimate accounts for the worst-case request scenario (prompt + max_tokens + best_of), so Azure’s TPM can appear “inflated” compared to Datadog, which measures actual tokens consumed after completion.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;That conversation inspired this post because many customers find themselves in a similar spot: they already have powerful observability tools but still want quick, built-in visibility into Azure OpenAI usage and rate limits without adding new integrations or switching platforms.&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: var(--lia-bs-headings-font-family); font-size: var(--lia-bs-h4-font-size); font-style: var(--lia-headings-font-style); letter-spacing: var(--lia-h4-letter-spacing); -webkit-tap-highlight-color: hsla(var(--lia-bs-black-h),var(--lia-bs-black-s),var(--lia-bs-black-l),0); -webkit-text-size-adjust: 100%;"&gt;&lt;BR /&gt;The two monitoring paths&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;When it comes to monitoring Azure OpenAI, there are two main options:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. The full flow &lt;/STRONG&gt;(most powerful, requires Log Analytics): This unlocks correlation, deep queries, and exporting metrics/logs to external tools.&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Azure OpenAI Service → Azure Monitor → Log Analytics → KQL, Workbooks, Alerts → integrations like Datadog, Grafana.&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;&lt;STRONG&gt;2. The lightweight flow&amp;nbsp;&lt;/STRONG&gt;(fast, free, no Log Analytics): This is what we’ll explore: simple dashboards and quota-based alerts right in the Azure Portal.&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Azure OpenAI Service → Azure Monitor (Metrics) → Portal Workbooks + Alerts.&lt;/P&gt;
&lt;/img&gt;
&lt;H4&gt;Metrics available in Azure OpenAI&lt;/H4&gt;
&lt;P&gt;Azure OpenAI publishes several key metrics natively (no ingestion required). According to the &lt;A href="https://learn.microsoft.com/azure/ai-foundry/openai/monitor-openai-reference" target="_blank" rel="noopener"&gt;official documentation&lt;/A&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Processed Inference Tokens → tokens consumed (prompt + completion).&lt;/LI&gt;
&lt;LI&gt;Azure OpenAI Requests → total API calls.&lt;/LI&gt;
&lt;LI&gt;Request Errors → failed requests (429s, 5xx).&lt;/LI&gt;
&lt;LI&gt;Availability Rate → percentage of successful calls.&lt;/LI&gt;
&lt;LI&gt;Latency metrics → TTFT (time to first token), TTLB (time to last byte).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;You can view these under:&amp;nbsp;&lt;STRONG&gt;AOAI Resource → Monitoring → Metrics.&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Azure OpenAI exposes native metrics like tokens, requests, errors, and latency directly in the Azure Portal&lt;/P&gt;
&lt;/img&gt;
&lt;H4&gt;Quotas: The other half of the picture&lt;BR /&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 16px;"&gt;&lt;BR /&gt;Metrics tell you usage. Quotas tell you capacity. Every deployment has fixed Tokens per Minute (TPM) and Requests per Minute (RPM) limits. You can find these under: &lt;STRONG&gt;AOAI Foundry Portal → Deployments → Select Deployment → Rate Limits.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;GPT-4.1-mini deployment →&lt;STRONG&gt; 250,000 TPM / 250 RPM&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These are the values you’ll compare against metrics and use in alerts.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Each deployment has fixed TPM/RPM quotas. Here, GPT-4.1-mini is capped at 250,000 TPM and 250 RPM.&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;If you prefer a more programmatically way, you could run this command:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;az rest --method get \
  --url "https://management.azure.com/subscriptions/&amp;lt;subscriptionId&amp;gt;/resourceGroups/&amp;lt;resourceGroup&amp;gt;/providers/Microsoft.CognitiveServices/accounts/&amp;lt;accountName&amp;gt;/deployments/&amp;lt;deploymentName&amp;gt;?api-version=2023-05-01" \
  --query "{deployment:name, TPM:properties.rateLimits[?key=='token'].count | [0], RPM:properties.rateLimits[?key=='request'].count | [0]}"
&lt;/LI-CODE&gt;
&lt;P&gt;Sample output:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;{
  "RPM": 250,
  "TPM": 250000,
  "deployment": "gpt-4.1-mini"
}&lt;/LI-CODE&gt;
&lt;H4&gt;Building a lightweight workbook&lt;/H4&gt;
&lt;P&gt;Even without Log Analytics, you can build a simple workbook to track usage vs quota:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to &lt;STRONG&gt;Azure Monitor → Workbooks → + New&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Add a metric visualization for Processed Inference Tokens (Sum).
&lt;UL&gt;
&lt;LI&gt;Metric: Processed Inference Tokens&lt;/LI&gt;
&lt;LI&gt;Aggregation: Sum&lt;/LI&gt;
&lt;LI&gt;Display name: Token Usage vs Quota.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Resource Type: Azure AI Foundry&lt;/LI&gt;
&lt;LI&gt;Azure AI Foundry: Select your instance&lt;/LI&gt;
&lt;LI&gt;Click to add metric&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;Add another metric for Azure OpenAI Requests (Count).&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Metric: Azure OpenAI Requests&lt;/LI&gt;
&lt;LI&gt;Aggregation: Count&lt;/LI&gt;
&lt;LI&gt;Display name: Requests per Minute vs Quota.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;Click to Run Metrics&lt;/LI&gt;
&lt;LI&gt;Save as AOAI Usage vs Capacity.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;img&gt;
&lt;P&gt;Workbooks let you visualize token and request usage against your deployment’s fixed quotas&lt;/P&gt;
&lt;/img&gt;
&lt;H4&gt;Creating alerts (proactive notification)&lt;/H4&gt;
&lt;P&gt;From the portal you can also configure alerts directly on metrics:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to &lt;STRONG&gt;Azure Monitor → Alerts → + Create → Alert rule&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Scope = your AOAI resource.&lt;/LI&gt;
&lt;LI&gt;Condition step:&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Signal name = Processed Inference Tokens.&lt;/LI&gt;
&lt;LI&gt;Threshold type: Static&lt;/LI&gt;
&lt;LI&gt;Value is: Greater than&lt;/LI&gt;
&lt;LI&gt;Unit: Count&lt;/LI&gt;
&lt;LI&gt;Threshold = 200,000 (warning) or 250,000 (critical).&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;Actions step:&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Use &lt;STRONG&gt;Quick Actions &lt;/STRONG&gt;→ add your email (or Azure mobile push).&lt;/LI&gt;
&lt;LI&gt;Or create an &lt;STRONG&gt;Action Group &lt;/STRONG&gt;for Teams/webhook integration.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;Details step:&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Name = AOAI-TPM-Warning / AOAI-TPM-Critical.&lt;/LI&gt;
&lt;LI&gt;Severity = 2 (Warning) or 0 (Critical).&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;Review + Create.&lt;/LI&gt;
&lt;LI&gt;Repeat for &lt;STRONG&gt;Azure OpenAI Requests&lt;/STRONG&gt; with thresholds of 200 (warning) and 250 (critical).&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Alert conditions:&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Configure alert conditions directly on metrics. Here, we trigger at 200,000 tokens per minute (80% of quota)&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;&lt;STRONG&gt;Quick Actions:&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Quick Actions let you add email or mobile notifications without creating a full Action Group.&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;&lt;STRONG&gt;Overview from the Alert:&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;Give your alert a descriptive name and severity. Here, AOAI-TPM-Warning at Severity 2.&lt;/P&gt;
&lt;/img&gt;
&lt;H4&gt;How this helps with 429 errors&lt;/H4&gt;
&lt;P&gt;One of the most common issues Azure OpenAI customers face is the dreaded “&lt;STRONG&gt;Too Many Requests” (429)&lt;/STRONG&gt; error.&lt;/P&gt;
&lt;P&gt;Why it happens:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Each deployment enforces hard TPM/RPM quotas.&lt;/LI&gt;
&lt;LI&gt;If you send more tokens or requests than allowed in a minute, the service rejects them with a 429.&lt;/LI&gt;
&lt;LI&gt;You may see headers like x-ms-retry-after-ms telling you how long to wait.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;How monitoring helps:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Metrics as early warning&lt;/STRONG&gt;: Watching token/request metrics shows when you’re approaching the cap.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Alerts before throttling&lt;/STRONG&gt;: Warning alerts at 80% (200k TPM / 200 RPM) give you time to react before 429s hit.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Critical alerts at 100%&lt;/STRONG&gt;: Confirm you’ve saturated the quota and need to adjust.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Important note:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Monitoring doesn’t prevent 429s, your app should still implement retry with backoff and consider batching/queuing requests.&lt;/LI&gt;
&lt;LI&gt;But with this setup, you’ll know before the error storm begins, and can respond faster.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Why this matters&lt;/H4&gt;
&lt;P&gt;For many companies, time-to-value is more important than building a new monitoring stack.&lt;/P&gt;
&lt;P&gt;This approach means:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No Log Analytics ingestion.&lt;/LI&gt;
&lt;LI&gt;No need to replace Datadog or Splunk.&lt;/LI&gt;
&lt;LI&gt;Free visibility into &lt;STRONG&gt;usage vs quota&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Proactive notifications on approaching limits.&lt;/LI&gt;
&lt;LI&gt;Fewer surprises with 429 errors.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;And if later you want deeper insights, you can still enable Log Analytics and export into your existing observability platform.&lt;/P&gt;
&lt;H4&gt;References:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/monitor-openai" target="_blank"&gt;Monitor Azure OpenAI in Azure AI Foundry Models&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/visualize/workbooks-overview" target="_blank"&gt;Azure Workbooks overview - Azure Monitor&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/visualize/workbooks-templates" target="_blank"&gt;Azure Workbooks templates - Azure Monitor&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://techcommunity.microsoft.com/blog/fasttrackforazureblog/azure-openai-insights-monitoring-ai-with-confidence/4026850" target="_blank"&gt;Monitoring Azure OpenAI&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/diagnostic-logging" target="_blank"&gt;Enable diagnostic logging - Azure AI services&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://techcommunity.microsoft.com/blog/startupsatmicrosoftblog/the-importance-of-setting-up-service-and-resource-health-monitoring-in-azure/4372478" target="_blank"&gt;The importance of setting up Service and Resource Health monitoring in Azure&amp;nbsp;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Closing thoughts&lt;/H4&gt;
&lt;P&gt;This article was inspired by a customer request, but I believe many others will benefit from the same approach. In just a few minutes, you can build a dashboard, set alerts, and gain confidence in your Azure OpenAI usage, all without leaving the Azure Portal.&lt;BR /&gt;&lt;BR /&gt;I’d love to hear from you: how is your team monitoring Azure OpenAI today? Share in the comments, your feedback will help shape what we build next.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 20:16:23 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/monitoring-azure-openai-without-switching-from-your-existing/ba-p/4458898</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-10-09T20:16:23Z</dc:date>
    </item>
    <item>
      <title>Azure routing preference: A hidden lever for performance vs. cost trade-offs</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-routing-preference-a-hidden-lever-for-performance-vs-cost/ba-p/4451425</link>
      <description>&lt;img /&gt;
&lt;P data-start="313" data-end="540"&gt;For Digital Native companies, every engineering decision is also a business decision. How you design your cloud architecture affects not just performance but also your burn rate, margins, and ultimately your ability to scale.&lt;/P&gt;
&lt;P data-start="542" data-end="729"&gt;One of the most overlooked levers in Azure networking is &lt;STRONG data-start="599" data-end="621"&gt;Routing Preference, &lt;/STRONG&gt;a simple setting that determines how your outbound internet traffic leaves Azure. The choice is binary:&lt;/P&gt;
&lt;UL data-start="731" data-end="920"&gt;
&lt;LI data-start="731" data-end="844"&gt;&lt;STRONG data-start="733" data-end="771"&gt;Microsoft Global Network (Premium): &lt;/STRONG&gt;High-quality, low-latency routing on Microsoft’s backbone (default).&lt;/LI&gt;
&lt;LI data-start="845" data-end="920"&gt;&lt;STRONG data-start="847" data-end="881"&gt;ISP Transit (Internet Routing): &lt;/STRONG&gt;Lower-cost routing via local ISPs.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="922" data-end="1064"&gt;Most startups never change the default, but understanding when to switch can save you serious money without sacrificing customer experience.&lt;/P&gt;
&lt;H4 data-start="1071" data-end="1110"&gt;Why it matters for digital natives&lt;/H4&gt;
&lt;P data-start="1112" data-end="1339"&gt;Bandwidth is one of those quiet COGS items (&lt;EM data-start="1060" data-end="1138"&gt;Cost of Goods Sold, the direct cost of delivering your product to customers&lt;/EM&gt;) that doesn’t make noise until the bill arrives. If your product depends on moving data, whether streaming, analytics, or SaaS APIs, outbound traffic is part of your unit economics.&lt;/P&gt;
&lt;P data-start="1341" data-end="1426"&gt;Routing Preference is your &lt;STRONG data-start="1368" data-end="1407"&gt;toggle between performance and cost&lt;/STRONG&gt;. Think of it as:&lt;/P&gt;
&lt;UL data-start="1427" data-end="1582"&gt;
&lt;LI data-start="1427" data-end="1498"&gt;&lt;STRONG data-start="1429" data-end="1465"&gt;Business class routing (Premium): &lt;/STRONG&gt;smoother ride, higher price.&lt;/LI&gt;
&lt;LI data-start="1499" data-end="1582"&gt;&lt;STRONG data-start="1501" data-end="1526"&gt;Economy routing (ISP): &lt;/STRONG&gt;cheaper seat, gets you there, but less predictable.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="1589" data-end="1610"&gt;Pricing Snapshot&lt;/H4&gt;
&lt;P data-start="1612" data-end="1755"&gt;Outbound internet rates vary by region, and they differ sharply between routing options. For example (first 10 TB/month, beyond free 100 GB):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 71.7593%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&amp;nbsp;Region&lt;/th&gt;&lt;th&gt;&amp;nbsp;Microsoft Global Network (Premium)&lt;/th&gt;&lt;th&gt;&amp;nbsp;ISP Transit (Internet Routing)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;United States&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;$0.087 / GB&lt;/td&gt;&lt;td&gt;$0.04 / GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Australia&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;$0.12 / GB&lt;/td&gt;&lt;td&gt;$0.06 / GB&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://azure.microsoft.com/en-us/pricing/details/bandwidth/" target="_blank" rel="noopener" data-start="2073" data-end="2160"&gt;Azure Bandwidth Pricing&lt;/A&gt;&lt;/P&gt;
&lt;H4 data-start="2169" data-end="2223"&gt;Which Azure resources support routing preference?&lt;/H4&gt;
&lt;P data-start="2225" data-end="2357"&gt;Routing Preference applies to any Azure resource backed by a&amp;nbsp;&lt;STRONG data-start="2330" data-end="2343"&gt;Public IP&lt;/STRONG&gt;, including:&lt;/P&gt;
&lt;UL data-start="2359" data-end="2588"&gt;
&lt;LI data-start="2359" data-end="2385"&gt;Virtual Machines (VMs)&lt;/LI&gt;
&lt;LI data-start="2386" data-end="2423"&gt;Virtual Machine Scale Sets (VMSS)&lt;/LI&gt;
&lt;LI data-start="2424" data-end="2458"&gt;Azure Kubernetes Service (AKS)&lt;/LI&gt;
&lt;LI data-start="2459" data-end="2504"&gt;Public Load Balancers (NIC-based backend)&lt;/LI&gt;
&lt;LI data-start="2505" data-end="2528"&gt;Application Gateway&lt;/LI&gt;
&lt;LI data-start="2529" data-end="2547"&gt;Azure Firewall&lt;/LI&gt;
&lt;LI data-start="2548" data-end="2588"&gt;Storage Accounts (Blob, Files, etc.)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2590" data-end="2721"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/routing-preference-overview" target="_blank" rel="noopener" data-start="2593" data-end="2719"&gt;Routing preference overview&lt;/A&gt;&lt;/P&gt;
&lt;P data-start="2590" data-end="2721"&gt;&lt;STRONG&gt;How to configure it&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="2590" data-end="2721"&gt;&lt;U&gt;Public IP Example (CLI)&lt;/U&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;az network public-ip create \
  --name MyPublicIP \
  --resource-group MyResourceGroup \
  --location eastus \
  --ip-tags 'RoutingPreference=Internet' \
  --sku Standard \
  --allocation-method Static \
  --version IPv4&lt;/LI-CODE&gt;
&lt;P data-start="2590" data-end="2721"&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;P data-start="3212" data-end="3222"&gt;Docs:&lt;/P&gt;
&lt;UL data-start="3223" data-end="3485"&gt;
&lt;LI data-start="3223" data-end="3356"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/routing-preference-portal" target="_blank" rel="noopener" data-start="3225" data-end="3354"&gt;Routing Preference for Public IP&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="3357" data-end="3485"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/storage/common/network-routing-preference" target="_blank" rel="noopener" data-start="3359" data-end="3483"&gt;Routing Preference for Storage Accounts&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="3492" data-end="3522"&gt;A digital native playbook&lt;/H4&gt;
&lt;P data-start="3524" data-end="3566"&gt;Here’s a quick guide to help you decide:&lt;/P&gt;
&lt;table border="1" style="width: 89.8148%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Scenario&lt;/th&gt;&lt;th&gt;Recommended Routing&lt;/th&gt;&lt;th&gt;Why&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Latency-sensitive SaaS APIs&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Premium (Global Network)&lt;/td&gt;&lt;td&gt;Predictable performance, better customer experience&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Dev/Test environments&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;ISP Transit&lt;/td&gt;&lt;td&gt;Optimize cost where performance isn’t critical&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Bulk log exports, backups&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;ISP Transit&lt;/td&gt;&lt;td&gt;Cut bandwidth costs significantly&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Production workloads with end-users&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Premium&lt;/td&gt;&lt;td&gt;Protect SLA and latency for customers&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4 data-start="4045" data-end="4063"&gt;Key takeaways&lt;/H4&gt;
&lt;UL data-start="4065" data-end="4491"&gt;
&lt;LI data-start="4065" data-end="4146"&gt;By default, you’re paying for &lt;STRONG data-start="4097" data-end="4116"&gt;Premium routing&lt;/STRONG&gt; whether you need it or not.&lt;/LI&gt;
&lt;LI data-start="4147" data-end="4238"&gt;ISP Transit can &lt;STRONG data-start="4165" data-end="4193"&gt;cut costs nearly in half, &lt;/STRONG&gt;a huge win for cost-sensitive workloads.&lt;/LI&gt;
&lt;LI data-start="4239" data-end="4340"&gt;Routing Preference applies to &lt;STRONG data-start="4271" data-end="4337"&gt;VMs, AKS, Load Balancers, App Gateways, Firewalls, and Storage&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="4341" data-end="4491"&gt;The right choice depends on your &lt;STRONG data-start="4376" data-end="4410"&gt;growth stage and workload type&lt;/STRONG&gt;: optimize for performance where it matters, optimize for cost everywhere else.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="4498" data-end="4511"&gt;Closing&lt;/H4&gt;
&lt;P data-start="4513" data-end="4793"&gt;For digital natives, scaling is a balance: you need to &lt;STRONG data-start="4568" data-end="4589"&gt;delight customers&lt;/STRONG&gt; while &lt;STRONG data-start="4596" data-end="4613"&gt;watching COGS&lt;/STRONG&gt;. Routing Preference is a small Azure feature that gives you a big lever on both. Next time you spin up a VM, AKS cluster, or Storage account, don’t just go through defaults.&lt;/P&gt;
&lt;P data-start="4795" data-end="4910"&gt;Ask:&amp;nbsp;&lt;EM data-start="4803" data-end="4849"&gt;Do I want business class routing or economy? &lt;/EM&gt;That one decision could save you thousands as you scale.&lt;/P&gt;
&lt;P data-start="4795" data-end="4910"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Sep 2025 20:40:21 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-routing-preference-a-hidden-lever-for-performance-vs-cost/ba-p/4451425</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-09-05T20:40:21Z</dc:date>
    </item>
    <item>
      <title>Azure Quota Alerts (Preview): Still overlooked, but incredibly useful</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-quota-alerts-preview-still-overlooked-but-incredibly/ba-p/4447140</link>
      <description>&lt;P data-start="322" data-end="496"&gt;Quota limits are one of those hidden blockers that can catch digital native companies by surprise. You’re scaling fast, deploying more VMs or GPU nodes, and suddenly:&amp;nbsp;&lt;STRONG data-start="473" data-end="494"&gt;“Quota exceeded.”&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="498" data-end="733"&gt;Since late 2024, Azure has offered &lt;STRONG data-start="533" data-end="559"&gt;Quota Alerts (Preview), &lt;/STRONG&gt;a built-in way to monitor and get notified before you hit subscription limits. It’s not brand new, but many digital native companies still aren’t taking advantage of it.&lt;/P&gt;
&lt;H4 data-start="740" data-end="790"&gt;Why this matters for startups &amp;amp; digital natives&lt;/H4&gt;
&lt;UL data-start="791" data-end="1105"&gt;
&lt;LI data-start="791" data-end="877"&gt;&lt;STRONG data-start="793" data-end="820"&gt;Avoid outages at scale:&lt;/STRONG&gt; deployments won’t suddenly fail due to quota ceilings.&lt;/LI&gt;
&lt;LI data-start="878" data-end="949"&gt;&lt;STRONG data-start="880" data-end="906"&gt;Save engineering time:&lt;/STRONG&gt; no need for custom monitoring pipelines.&lt;/LI&gt;
&lt;LI data-start="950" data-end="1015"&gt;&lt;STRONG data-start="952" data-end="969"&gt;Simple setup:&lt;/STRONG&gt; alerts in minutes directly from the portal.&lt;/LI&gt;
&lt;LI data-start="1016" data-end="1105"&gt;&lt;STRONG data-start="1018" data-end="1046"&gt;Fits existing workflows:&lt;/STRONG&gt; integrates with Action Groups (email, Teams, PagerDuty).&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="1112" data-end="1161"&gt;How to create a quota alert in Azure (Preview)&lt;/H4&gt;
&lt;P data-start="1163" data-end="1197"&gt;&lt;STRONG&gt;1. Open &lt;EM data-start="1175" data-end="1183"&gt;Quotas&lt;/EM&gt; in the Portal&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="1198" data-end="1258"&gt;Search &lt;STRONG data-start="1205" data-end="1215"&gt;Quotas&lt;/STRONG&gt; in the Azure Portal and go to the blade.&lt;/P&gt;
&lt;P data-start="1265" data-end="1301"&gt;&lt;STRONG&gt;2. Go to &lt;EM data-start="1278" data-end="1301"&gt;Alert rules (Preview)&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="1302" data-end="1376"&gt;Click &lt;STRONG data-start="1308" data-end="1333"&gt;Alert rules (Preview)&lt;/STRONG&gt;, then &lt;STRONG data-start="1340" data-end="1373"&gt;+ Create Alert Rule (Preview)&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="1383" data-end="1409"&gt;&lt;STRONG&gt;3. Configure the Alert&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="1410" data-end="1432"&gt;On the &lt;STRONG data-start="668" data-end="695"&gt;Create usage alert rule&lt;/STRONG&gt; page:&lt;/P&gt;
&lt;UL data-start="705" data-end="1229"&gt;
&lt;LI data-start="705" data-end="752"&gt;&lt;STRONG data-start="707" data-end="723"&gt;Subscription&lt;/STRONG&gt; → choose the subscription.&lt;/LI&gt;
&lt;LI data-start="753" data-end="798"&gt;&lt;STRONG data-start="755" data-end="767"&gt;Provider&lt;/STRONG&gt; → e.g., &lt;EM data-start="776" data-end="785"&gt;Compute&lt;/EM&gt; for vCPUs.&lt;/LI&gt;
&lt;LI data-start="799" data-end="860"&gt;&lt;STRONG data-start="801" data-end="820"&gt;Alert rule name&lt;/STRONG&gt; → e.g., &lt;EM data-start="829" data-end="857"&gt;Quota Alert – EastUS vCPUs&lt;/EM&gt;.&lt;/LI&gt;
&lt;LI data-start="861" data-end="901"&gt;&lt;STRONG data-start="863" data-end="876"&gt;Threshold&lt;/STRONG&gt; → usage % (e.g., 80%).&lt;/LI&gt;
&lt;LI data-start="902" data-end="946"&gt;&lt;STRONG data-start="904" data-end="916"&gt;Severity&lt;/STRONG&gt; → pick according to policy.&lt;/LI&gt;
&lt;LI data-start="947" data-end="998"&gt;&lt;STRONG data-start="949" data-end="976"&gt;Frequency of evaluation&lt;/STRONG&gt; → e.g., 15 minutes.&lt;/LI&gt;
&lt;LI data-start="999" data-end="1042"&gt;&lt;STRONG data-start="1001" data-end="1019"&gt;Resource group&lt;/STRONG&gt; → select/create one.&lt;/LI&gt;
&lt;LI data-start="1043" data-end="1091"&gt;&lt;STRONG data-start="1045" data-end="1065"&gt;Managed Identity&lt;/STRONG&gt; → click &lt;STRONG data-start="1074" data-end="1088"&gt;Create new&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="1092" data-end="1147"&gt;&lt;STRONG data-start="1094" data-end="1110"&gt;Notify me by&lt;/STRONG&gt; → email, Action Group, Teams, etc.&lt;/LI&gt;
&lt;LI data-start="1148" data-end="1229"&gt;&lt;STRONG data-start="1150" data-end="1164"&gt;Dimensions&lt;/STRONG&gt; → select &lt;STRONG data-start="1174" data-end="1186"&gt;Location&lt;/STRONG&gt; and &lt;STRONG data-start="1191" data-end="1200"&gt;Quota&lt;/STRONG&gt; (e.g., DSv5 Family vCPUs).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1796" data-end="1820"&gt;Save, and you’re done. You can find more detailed configuration options in the&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/quotas/how-to-guide-monitoring-alerting" target="_blank" rel="noopener" data-start="546" data-end="652"&gt;official Microsoft docs&lt;/A&gt;.&lt;/P&gt;
&lt;P data-start="1260" data-end="1311"&gt;&lt;STRONG&gt;4. Assign Permissions to the Managed Identity&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="1312" data-end="1439"&gt;When the new Managed Identity is created (e.g., quota-alert-managed-identity), you must give it access to read quota usage.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL data-start="1441" data-end="1800"&gt;
&lt;LI data-start="1441" data-end="1488"&gt;Go to &lt;STRONG data-start="1449" data-end="1471"&gt;Managed Identities&lt;/STRONG&gt; in the portal.&lt;/LI&gt;
&lt;LI data-start="1489" data-end="1541"&gt;Select the identity created for the quota alert.&lt;/LI&gt;
&lt;LI data-start="1542" data-end="1616"&gt;Open &lt;STRONG data-start="1549" data-end="1575"&gt;Azure role assignments&lt;/STRONG&gt; → &lt;STRONG data-start="1578" data-end="1613"&gt;+ Add role assignment (Preview)&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="1617" data-end="1790"&gt;Set:
&lt;UL data-start="1628" data-end="1790"&gt;
&lt;LI data-start="1628" data-end="1655"&gt;&lt;STRONG data-start="1630" data-end="1639"&gt;Scope&lt;/STRONG&gt;: Subscription&lt;/LI&gt;
&lt;LI data-start="1658" data-end="1723"&gt;&lt;STRONG data-start="1660" data-end="1676"&gt;Subscription&lt;/STRONG&gt;: the subscription where quotas are monitored&lt;/LI&gt;
&lt;LI data-start="1726" data-end="1790"&gt;&lt;STRONG data-start="1728" data-end="1736"&gt;Role&lt;/STRONG&gt;: &lt;STRONG data-start="1738" data-end="1748"&gt;Reader&lt;/STRONG&gt; (or any role that includes read access)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-start="1791" data-end="1800"&gt;Save.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1827" data-end="1845"&gt;&lt;STRONG&gt;5. Track &amp;amp; Act&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="1846" data-end="1977"&gt;
&lt;LI data-start="1846" data-end="1914"&gt;Your rules are visible under &lt;STRONG data-start="1877" data-end="1911"&gt;Quotas → Alert rules (Preview)&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-start="1915" data-end="1977"&gt;Triggered alerts show up under &lt;STRONG data-start="1948" data-end="1974"&gt;Fired alerts (Preview)&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4 data-start="1984" data-end="2005"&gt;Old vs New Reality&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Approach&lt;/th&gt;&lt;th&gt;Custom Scripts &amp;amp; Logs&lt;/th&gt;&lt;th&gt;Quota Alerts (Preview)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Effort to set up&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;td&gt;Very low&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Extra services needed&lt;/td&gt;&lt;td&gt;Log Analytics, Automation&lt;/td&gt;&lt;td&gt;None&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Visibility&lt;/td&gt;&lt;td&gt;Manual dashboards&lt;/td&gt;&lt;td&gt;Native alert rules&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Best fit&lt;/td&gt;&lt;td&gt;Ops-heavy teams&lt;/td&gt;&lt;td&gt;Startups, lean teams&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4 data-start="2510" data-end="2523"&gt;Takeaway&lt;/H4&gt;
&lt;P data-start="2524" data-end="2752"&gt;Quota alerts may have been around since late 2024, but they remain one of the most &lt;STRONG data-start="2607" data-end="2630"&gt;underrated features&lt;/STRONG&gt; in Azure. For startups and digital native companies scaling quickly, they provide peace of mind with almost zero setup.&lt;/P&gt;
&lt;P data-start="2754" data-end="2946"&gt;Don’t wait until your next deployment fails, set up a&amp;nbsp;&lt;STRONG data-start="2812" data-end="2827"&gt;quota alert&lt;/STRONG&gt; today (start with Regional vCPUs in your main region). It only takes a couple of minutes and could save your launch.&lt;/P&gt;
&lt;P data-start="2953" data-end="3098"&gt;⚡️ Pro tip: You can reuse your &lt;STRONG data-start="2984" data-end="3001"&gt;Action Groups&lt;/STRONG&gt; for quota alerts, keeping all notifications consistent with your existing monitoring strategy.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 16:24:56 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-quota-alerts-preview-still-overlooked-but-incredibly/ba-p/4447140</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-08-22T16:24:56Z</dc:date>
    </item>
    <item>
      <title>Azure Support Slack Bot on Azure Container Apps: Production-ready guide</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-support-slack-bot-on-azure-container-apps-production-ready/ba-p/4436423</link>
      <description>&lt;P data-start="416" data-end="592"&gt;Launch a secure, scalable Slack bot for Azure support tickets in minutes — no secrets in code, no manual admin steps, and fully aligned with modern cloud-native best practices.&lt;/P&gt;
&lt;P data-start="594" data-end="960"&gt;This guide walks you through deploying the GitHub sample &lt;A class="lia-external-url" href="https://github.com/Azure-Samples/azure-support-slack-bot" target="_blank" rel="noopener" data-start="651" data-end="732"&gt;azure-support-slack-bot&lt;/A&gt; on &lt;STRONG data-start="736" data-end="760"&gt;Azure Container Apps&lt;/STRONG&gt;, using &lt;STRONG data-start="768" data-end="790"&gt;managed identities&lt;/STRONG&gt;, &lt;STRONG data-start="792" data-end="805"&gt;Key Vault&lt;/STRONG&gt;, and &lt;STRONG data-start="831" data-end="861"&gt;scale-to-zero architecture&lt;/STRONG&gt; that just works, whether you're building from scratch or plugging into your existing DevOps flow.&lt;/P&gt;
&lt;H3 data-start="962" data-end="991"&gt;Here’s what you’ll build:&lt;/H3&gt;
&lt;UL data-start="993" data-end="1330"&gt;
&lt;LI data-start="993" data-end="1066"&gt;Zero-admin secrets management with Managed Identity + Key Vault&lt;/LI&gt;
&lt;LI data-start="1067" data-end="1112"&gt;RBAC-first access to Azure Support APIs&lt;/LI&gt;
&lt;LI data-start="1113" data-end="1177"&gt;A clean, local-first development workflow (with ngrok support)&lt;/LI&gt;
&lt;LI data-start="1178" data-end="1224"&gt;Slack integration using manifest-based setup&lt;/LI&gt;
&lt;LI data-start="1225" data-end="1274"&gt;Observability with App Insights + Log Analytics&lt;/LI&gt;
&lt;LI data-start="1275" data-end="1330"&gt;Scale from 0 to N replicas, with autoscaling baked in&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1332" data-end="1443"&gt;And yes, all of this,&amp;nbsp;&lt;STRONG data-start="1355" data-end="1391"&gt;without ever hardcoding a secret&lt;/STRONG&gt; or exposing a public endpoint you didn’t intend to.&lt;/P&gt;
&lt;P data-start="1332" data-end="1443"&gt;If you’re running lean and building fast, this bot is a solid foundation. It’s not just a cool demo — it’s a production-ready blueprint for any digital native team that wants to integrate Slack with Azure support in a secure, automated, and developer-friendly way.&lt;/P&gt;
&lt;H3&gt;1. What you're building&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Features:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;/azure-support slash command&lt;/LI&gt;
&lt;LI&gt;Auto-scaling from 0→N replicas based on HTTP load&lt;/LI&gt;
&lt;LI&gt;Zero secrets in code or environment variables&lt;/LI&gt;
&lt;LI&gt;Comprehensive logging and Azure RBAC integration&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;2. Why Azure Container Apps (ACA)?&lt;/H3&gt;
&lt;P data-start="242" data-end="373"&gt;When you're building for speed, security, and scale without a huge ops team, &lt;STRONG data-start="322" data-end="352"&gt;Azure Container Apps (ACA)&lt;/STRONG&gt; hits the sweet spot.&lt;/P&gt;
&lt;P data-start="375" data-end="550"&gt;This Slack bot doesn't need a full-blown cluster. It needs &lt;STRONG data-start="434" data-end="456"&gt;event-driven scale&lt;/STRONG&gt;, &lt;STRONG data-start="458" data-end="481"&gt;zero-trust security&lt;/STRONG&gt;, and &lt;STRONG data-start="487" data-end="510"&gt;built-in automation&lt;/STRONG&gt;,&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;and that’s exactly what ACA delivers. Here’s why ACA is a better fit than the usual suspects:&lt;/P&gt;
&lt;P data-start="614" data-end="652"&gt;&lt;STRONG&gt;Azure Container Instances (ACI)&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="653" data-end="906"&gt;
&lt;LI data-start="653" data-end="766"&gt;Great for quick scripts or batch jobs , but&amp;nbsp;no built-in ingress, TLS, scaling rules, or managed identities.&lt;/LI&gt;
&lt;LI data-start="767" data-end="906"&gt;ACA gives you all that out of the box, with production features and native integration with Key Vault, App Insights, and autoscaling.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="908" data-end="937"&gt;&lt;STRONG&gt;Web App for Containers&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="938" data-end="1181"&gt;
&lt;LI data-start="938" data-end="1068"&gt;Web Apps are more suited for classic web hosting. You’ll hit limits with scaling flexibility, networking, and secret management.&lt;/LI&gt;
&lt;LI data-start="1069" data-end="1181"&gt;ACA gives you Kubernetes-grade scale and observability, without having to think about servers or patching.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1183" data-end="1220"&gt;&lt;STRONG&gt;Azure Kubernetes Service (AKS)&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="1221" data-end="1611"&gt;
&lt;LI data-start="1221" data-end="1338"&gt;Powerful, but heavy. You manage clusters, patch nodes, deal with autoscaler configs, ingress controllers, and more.&lt;/LI&gt;
&lt;LI data-start="1339" data-end="1433"&gt;ACA does the heavy lifting for you, zero node management, zero cluster maintenance.&lt;/LI&gt;
&lt;LI data-start="1434" data-end="1611"&gt;And here’s the kicker: AKS charges for the VM nodes 24/7, even when idle.
&lt;UL data-start="1516" data-end="1611"&gt;
&lt;LI data-start="1516" data-end="1611"&gt;ACA? Pay-per-request. When there’s no traffic, it scales to zero, and&amp;nbsp;you don’t pay.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-start="1618" data-end="1661"&gt;Cost Efficiency That Scales With You&lt;/H4&gt;
&lt;P data-start="1663" data-end="1792"&gt;For digital native teams, especially startups and growth-stage companies, ACA’s&amp;nbsp;&lt;STRONG data-start="1745" data-end="1773"&gt;serverless pricing model&lt;/STRONG&gt; is a game-changer:&lt;/P&gt;
&lt;UL data-start="1793" data-end="1957"&gt;
&lt;LI data-start="1793" data-end="1852"&gt;You scale from 0 to N replicas based on actual demand&lt;/LI&gt;
&lt;LI data-start="1853" data-end="1896"&gt;You only pay when your app is running&lt;/LI&gt;
&lt;LI data-start="1897" data-end="1957"&gt;No need to over-provision VMs or guess your future traffic&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1959" data-end="2091"&gt;This means you can launch a support bot, an internal API, or a microservice without worrying about burning cash while it's idle.&lt;/P&gt;
&lt;H4 data-start="2098" data-end="2126"&gt;Built for These Teams&lt;/H4&gt;
&lt;P data-start="2128" data-end="2145"&gt;ACA is ideal for:&lt;/P&gt;
&lt;UL data-start="2146" data-end="2486"&gt;
&lt;LI data-start="2146" data-end="2228"&gt;&lt;STRONG data-start="2151" data-end="2181"&gt;Platform engineering teams&lt;/STRONG&gt; who want secure templates, not snowflake infra&lt;/LI&gt;
&lt;LI data-start="2229" data-end="2306"&gt;&lt;STRONG data-start="2234" data-end="2256"&gt;DevOps-light teams&lt;/STRONG&gt; who need autoscaling without managing YAML storms&lt;/LI&gt;
&lt;LI data-start="2307" data-end="2397"&gt;&lt;STRONG data-start="2312" data-end="2343"&gt;Growth-stage product squads&lt;/STRONG&gt; rolling out bots, APIs, or event-driven services fast&lt;/LI&gt;
&lt;LI data-start="2398" data-end="2486"&gt;&lt;STRONG data-start="2403" data-end="2415"&gt;Startups&lt;/STRONG&gt; who care about velocity, observability, and not hiring a full SRE team&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Comparison table:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-solid" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Platform&lt;/STRONG&gt;&lt;/th&gt;&lt;th&gt;&lt;STRONG&gt;Best Fit&lt;/STRONG&gt;&lt;/th&gt;&lt;th&gt;&lt;STRONG&gt;Where It Falls Short&lt;/STRONG&gt;&lt;/th&gt;&lt;th&gt;&lt;STRONG&gt;Why ACA Wins&lt;/STRONG&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;ACI&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Short-lived scripts &amp;amp; jobs&lt;/td&gt;&lt;td&gt;No ingress, limited identity, lacks autoscaling&lt;/td&gt;&lt;td&gt;ACA supports scale-to-zero, secure access, and managed identity out of the box&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Web App&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Traditional web hosting&lt;/td&gt;&lt;td&gt;Rigid scaling, fewer network/runtime controls&lt;/td&gt;&lt;td&gt;ACA offers greater flexibility and microservice patterns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;AKS&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Complex, large-scale distributed apps&lt;/td&gt;&lt;td&gt;Operational overhead, always-on cost&lt;/td&gt;&lt;td&gt;ACA simplifies ops with managed scaling &amp;amp; lower cost&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;ACA&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Cloud-native APIs, internal tools, microservices&lt;/td&gt;&lt;td&gt;—&lt;/td&gt;&lt;td&gt;Built-in identity, ingress, scale-to-zero, lower total cost&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-start="3098" data-end="3157"&gt;ACA is your serverless container platform when you want:&lt;/P&gt;
&lt;UL data-start="3158" data-end="3366"&gt;
&lt;LI data-start="3158" data-end="3184"&gt;TLS and ingress baked in&lt;/LI&gt;
&lt;LI data-start="3185" data-end="3224"&gt;GitHub Actions support out of the box&lt;/LI&gt;
&lt;LI data-start="3225" data-end="3307"&gt;Built-in support for Key Vault, managed identities, and auto-scaling&lt;/LI&gt;
&lt;LI data-start="3308" data-end="3366"&gt;Production-grade infra, without managing a single VM&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3368" data-end="3461"&gt;If you're moving fast and don’t want to build a platform just to run a bot, ACA is the move.&lt;/P&gt;
&lt;H3&gt;3. Prerequisites &amp;amp; verification&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Required:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure subscription with&amp;nbsp;&lt;STRONG&gt;Contributor&lt;/STRONG&gt;&amp;nbsp;access&lt;/LI&gt;
&lt;LI&gt;Azure CLI ≥ 2.49.0 with `containerapp` extension&lt;/LI&gt;
&lt;LI&gt;Docker Desktop or equivalent &amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Slack workspace with app creation permissions&lt;/LI&gt;
&lt;LI&gt;Python 3.8+ for local development&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Optional:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://ngrok.com/" target="_blank" rel="noopener"&gt;ngrok &lt;/A&gt;&amp;nbsp;account for stable local testing URLs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Setup verification:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Verify Azure CLI and login
az --version
az login
az account set --subscription &amp;lt;your-subscription-id&amp;gt;
az extension add --name containerapp --upgrade

# Verify Docker and Python
docker --version
python --version

# Verify current user permissions
currentUserId=$(az ad signed-in-user show --query id -o tsv)
subscriptionId=$(az account show --query id -o tsv)
az role assignment list --assignee $currentUserId --scope "/subscriptions/$subscriptionId" --query "[].roleDefinitionName" -o table&lt;/LI-CODE&gt;
&lt;H3&gt;4. Azure permissions setup&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;The repository requires specific Azure RBAC roles that are often missed:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Get current subscription and user
subscriptionId=$(az account show --query id -o tsv)
currentUserId=$(az ad signed-in-user show --query id -o tsv)

# Support Request Contributor - Required to create/manage Azure support tickets
az role assignment create \
  --assignee $currentUserId \
  --role "Support Request Contributor" \
  --scope "/subscriptions/$subscriptionId"

# Reader - Required to list and view Azure resources in the bot
az role assignment create \
  --assignee $currentUserId \
  --role "Reader" \
  --scope "/subscriptions/$subscriptionId"&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Why these roles are required:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Support Request Contributor: Allows creating and managing Azure support tickets&lt;/LI&gt;
&lt;LI&gt;Reader: Allows the bot to list subscriptions, services, and resources in dropdown menus&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;5. Local development setup&lt;/H3&gt;
&lt;H4&gt;5.1 Clone and initialize project&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;git clone https://github.com/Azure-Samples/azure-support-slack-bot.git
cd azure-support-slack-bot

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/macOS

# Install dependencies
pip install -r requirements.txt

# Create local environment file
cp .env-example .env&lt;/LI-CODE&gt;
&lt;H4&gt;5.2 Create Slack app with manifest&lt;/H4&gt;
&lt;P&gt;Using the provided manifest is crucial for correct configuration:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Visit &lt;A class="lia-external-url" href="https://api.slack.com/apps" target="_blank" rel="noopener"&gt;Slack API Apps&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Click "&lt;STRONG&gt;Create New App&lt;/STRONG&gt;" → "&lt;STRONG&gt;From a manifest&lt;/STRONG&gt;"&lt;/LI&gt;
&lt;LI&gt;Choose &lt;STRONG&gt;YAML&lt;/STRONG&gt; and paste the contents from &lt;A class="lia-external-url" href="https://github.com/Azure-Samples/azure-support-slack-bot/blob/main/slack_app_manifest.yaml" target="_blank" rel="noopener"&gt;slack_app_manifest.yaml&lt;/A&gt;:&lt;/LI&gt;
&lt;LI&gt;Click &lt;STRONG&gt;Next&lt;/STRONG&gt; →&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Copy the&amp;nbsp;&lt;STRONG&gt;Signing Secret&lt;/STRONG&gt;&amp;nbsp;from Basic Information&lt;/LI&gt;
&lt;LI&gt;Important: Click "&lt;STRONG&gt;Install App&lt;/STRONG&gt;" → "&lt;STRONG&gt;Install to Workspace&lt;/STRONG&gt;" to generate the Bot User OAuth Token (xoxb-...)&lt;/LI&gt;
&lt;LI&gt;After installation, copy the&amp;nbsp;&lt;STRONG&gt;Bot User OAuth Token&lt;/STRONG&gt;&amp;nbsp;from the OAuth &amp;amp; Permissions page&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;5.3 Local testing with ngrok&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Edit .env with your tokens (local development only)
# SLACK_SIGNING_SECRET=your-signing-secret-here
# SLACK_BOT_TOKEN=xoxb-your-bot-token-here


# Terminal 1: Start the Flask app 
python app.py&lt;/LI-CODE&gt;
&lt;P&gt;Expected output:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;INFO:azure_support:Azure credentials configured successfully
INFO:azure_support:Preloading subscriptions completed
⚡️ Bolt app is running on port 5000!&lt;/LI-CODE&gt;&lt;LI-CODE lang="bash"&gt;# Terminal 2: Create ngrok tunnel 
ngrok http 5000&lt;/LI-CODE&gt;
&lt;P&gt;Copy the &lt;STRONG&gt;https&lt;/STRONG&gt; forwarding URL (e.g., &lt;A class="lia-external-url" href="https://abc123.ngrok-free.app)" target="_blank" rel="noopener"&gt;https://abc123.ngrok-free.app)&lt;/A&gt;&lt;/P&gt;
&lt;H4&gt;5.4 Update Slack app manifest for local testing&lt;/H4&gt;
&lt;P&gt;This is the critical step that's often missed:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;In your Slack app settings, go to&amp;nbsp;&lt;STRONG&gt;App Manifest&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Replace &lt;STRONG&gt;ALL instances&lt;/STRONG&gt; of YOUR-DOMAIN-NAME with your ngrok domain&lt;/LI&gt;
&lt;LI&gt;Example replacement:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="yaml"&gt;   # Before
   request_url: https://YOUR-DOMAIN-NAME/slack/events
   
   # After  
   request_url: https://abc123.ngrok-free.app/slack/events&lt;/LI-CODE&gt;
&lt;OL start="4"&gt;
&lt;LI&gt;Click&amp;nbsp;&lt;STRONG&gt;Save Changes&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Go to &lt;STRONG&gt;Install App&lt;/STRONG&gt;&amp;nbsp;and install it to your workspace&lt;/LI&gt;
&lt;LI&gt;Copy the &lt;STRONG&gt;Bot User OAuth Token&lt;/STRONG&gt; (xoxb-...)&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;5.5 Test local integration&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;Invite the bot to channels:&amp;nbsp;&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="bash"&gt;/invite azure-support&lt;/LI-CODE&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;Test the slash command:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="bash"&gt; /azure-support&lt;/LI-CODE&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;You should be able to see this screen:&lt;BR /&gt;&lt;BR /&gt;&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4. Monitor logs:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;   # Check your Python terminal for incoming requests

   INFO:azure_support:Opened modal for support request&lt;/LI-CODE&gt;
&lt;H3&gt;6. Azure infrastructure setup&lt;/H3&gt;
&lt;H4&gt;6.1 Define resource names&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Set consistent naming convention
RG="rg-slack-support-prod"
LOCATION="eastus"
ACR_NAME="acrsupport$RANDOM"  # Globally unique
ENV_NAME="aca-slack-env"
APP_NAME="slack-support-app"
KV_NAME="kv-slack-$RANDOM"
UAMI_NAME="id-slack-support"
LAW_NAME="law-slack-support"

# Verify names are available
echo "ACR Name: $ACR_NAME"
echo "Key Vault: $KV_NAME"&lt;/LI-CODE&gt;
&lt;H4&gt;6.2 Create resource group&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az group create \
  --name $RG \
  --location $LOCATION \
  --tags environment=production project=slack-support&lt;/LI-CODE&gt;
&lt;H3&gt;7. Container registry with security&lt;/H3&gt;
&lt;H4&gt;7.1 Create Azure container registry&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az acr create \
  --name $ACR_NAME \
  --resource-group $RG \
  --sku Standard \
  --admin-enabled false  # Security: No admin credentials&lt;/LI-CODE&gt;
&lt;H4&gt;7.2 &amp;nbsp;Build and push image&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Login to ACR
az acr login --name $ACR_NAME

# Build and push 
IMAGE_NAME="$ACR_NAME.azurecr.io/azure-support-slack-bot:latest"

docker build -t $IMAGE_NAME .
docker push $IMAGE_NAME

# Verify image
az acr repository show \
  --name $ACR_NAME \
  --repository azure-support-slack-bot&lt;/LI-CODE&gt;
&lt;H3&gt;8. Managed Identity and RBAC Setup&lt;/H3&gt;
&lt;H4&gt;8.1 Create User-Assigned Managed Identity&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az identity create \
  --name $UAMI_NAME \
  --resource-group $RG \
  --location $LOCATION

# Get identity details
UAMI_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query id -o tsv)
UAMI_PRINCIPAL_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query principalId -o tsv)
UAMI_CLIENT_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query clientId -o tsv)&lt;/LI-CODE&gt;
&lt;H4&gt;8.2 Grant ACR pull permissions&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;ACR_ID=$(az acr show --name $ACR_NAME --resource-group $RG --query id -o tsv)

az role assignment create \
  --assignee $UAMI_PRINCIPAL_ID \
  --role "AcrPull" \
  --scope $ACR_ID

# Wait for role propagation
echo "Waiting 60 seconds for role assignment propagation..."
sleep 60&lt;/LI-CODE&gt;
&lt;H4&gt;8.3 Grant Azure support API permissions&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Get subscription ID (in case it's not set from earlier)
subscriptionId=$(az account show --query id -o tsv)

# Support Request Contributor for the managed identity
az role assignment create \
  --assignee $UAMI_PRINCIPAL_ID \
  --role "Support Request Contributor" \
  --scope "/subscriptions/$subscriptionId"

# Reader role for listing Azure resources
az role assignment create \
  --assignee $UAMI_PRINCIPAL_ID \
  --role "Reader" \
  --scope "/subscriptions/$subscriptionId"

# Verify the role assignments were created successfully
echo "Azure Support API permissions granted to managed identity"
az role assignment list \
  --assignee $UAMI_PRINCIPAL_ID \
  --query "[].{Role:roleDefinitionName,Scope:scope}" -o table&lt;/LI-CODE&gt;
&lt;H3&gt;9. Azure Key Vault Setup&lt;/H3&gt;
&lt;H4&gt;9.1 Create Key Vault with RBAC&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az keyvault create \
  --name $KV_NAME \
  --resource-group $RG \
  --location $LOCATION \
  --enable-rbac-authorization true \
  --retention-days 7 &lt;/LI-CODE&gt;
&lt;H4&gt;9.2 Grant Key Vault permissions&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Get current user and Key Vault scope
USER_PRINCIPAL_ID=$(az ad signed-in-user show --query id -o tsv)
KV_SCOPE=$(az keyvault show --name $KV_NAME --resource-group $RG --query id -o tsv)

# Grant admin access to current user
az role assignment create \
  --assignee $USER_PRINCIPAL_ID \
  --role "Key Vault Administrator" \
  --scope $KV_SCOPE

# Grant read access to managed identity
az role assignment create \
  --assignee $UAMI_PRINCIPAL_ID \
  --role "Key Vault Secrets User" \
  --scope $KV_SCOPE

# Wait for propagation
sleep 30&lt;/LI-CODE&gt;
&lt;H4&gt;9.3 Store secrets&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Store Slack secrets (replace with your actual values)
echo "Enter your Slack Bot Token (xoxb-...):"
read -s SLACK_BOT_TOKEN

echo "Enter your Slack Signing Secret:"
read -s SLACK_SIGNING_SECRET

az keyvault secret set \
  --vault-name $KV_NAME \
  --name "slack-bot-token" \
  --value "$SLACK_BOT_TOKEN"

az keyvault secret set \
  --vault-name $KV_NAME \
  --name "slack-signing-secret" \
  --value "$SLACK_SIGNING_SECRET"

# Verify secrets are stored
az keyvault secret list --vault-name $KV_NAME --query "[].name" -o table&lt;/LI-CODE&gt;
&lt;H3&gt;10. Container Apps environment&lt;/H3&gt;
&lt;H4&gt;10.1 Create Log Analytics Workspace&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az monitor log-analytics workspace create \
  --workspace-name $LAW_NAME \
  --resource-group $RG \
  --location $LOCATION

# Get workspace details
LAW_CUSTOMER_ID=$(az monitor log-analytics workspace show \
  --workspace-name $LAW_NAME \
  --resource-group $RG \
  --query customerId -o tsv)

LAW_SHARED_KEY=$(az monitor log-analytics workspace get-shared-keys \
  --workspace-name $LAW_NAME \
  --resource-group $RG \
  --query primarySharedKey -o tsv)&lt;/LI-CODE&gt;
&lt;H4&gt;10.2 Create Container Apps environment&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az containerapp env create \
  --name $ENV_NAME \
  --resource-group $RG \
  --location $LOCATION \
  --logs-workspace-id $LAW_CUSTOMER_ID \
  --logs-workspace-key $LAW_SHARED_KEY&lt;/LI-CODE&gt;
&lt;H3&gt;11. Deploy Container App with security&lt;/H3&gt;
&lt;H4&gt;11.1 Create Container App&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az containerapp create \
  --name $APP_NAME \
  --resource-group $RG \
  --environment $ENV_NAME \
  --image $IMAGE_NAME \
  --target-port 5000 \
  --ingress external \
  --registry-server "$ACR_NAME.azurecr.io" \
  --user-assigned $UAMI_ID \
  --min-replicas 1 \
  --max-replicas 10 \
  --cpu 0.5 \
  --memory 1Gi&lt;/LI-CODE&gt;
&lt;H4&gt;11.2 Configure Key Vault secret references&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Create secret references to Key Vault
az containerapp secret set \
  --name $APP_NAME \
  --resource-group $RG \
  --secrets \
  "slack-bot-token=keyvaultref:https://$KV_NAME.vault.azure.net/secrets/slack-bot-token,identityref:$UAMI_ID" \
  "slack-signing-secret=keyvaultref:https://$KV_NAME.vault.azure.net/secrets/slack-signing-secret,identityref:$UAMI_ID"&lt;/LI-CODE&gt;
&lt;H4&gt;11.3 Configure environment variables&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az containerapp update \
  --name $APP_NAME \
  --resource-group $RG \
  --set-env-vars \
  "SLACK_BOT_TOKEN=secretref:slack-bot-token" \
  "SLACK_SIGNING_SECRET=secretref:slack-signing-secret" \
  "AZURE_CLIENT_ID=$UAMI_CLIENT_ID" \
  "PORT=5000"&lt;/LI-CODE&gt;
&lt;H4&gt;11.4 Configure scaling rules&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;az containerapp update \
  --name $APP_NAME \
  --resource-group $RG \
  --scale-rule-name "http-rule" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 50 \
  --min-replicas 0 \
  --max-replicas 10&lt;/LI-CODE&gt;
&lt;H3&gt;12. Production configuration&lt;/H3&gt;
&lt;H4&gt;12.1 Get application URL&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;APP_FQDN=$(az containerapp show \
  --name $APP_NAME \
  --resource-group $RG \
  --query properties.configuration.ingress.fqdn -o tsv)

APP_URL="https://$APP_FQDN"
echo "Production URL: $APP_URL/slack/events"&lt;/LI-CODE&gt;
&lt;H4&gt;12.2 Update Slack app manifest for production&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Critical: Replace ngrok URLs with production URLs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;In your Slack app settings, go to App Manifest&lt;/LI&gt;
&lt;LI&gt;Replace &lt;STRONG&gt;all &lt;/STRONG&gt;ngrok URLs with your Azure Container Apps URL:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;settings:
     event_subscriptions:
       request_url: https://your-app-fqdn.region.azurecontainerapps.io/slack/events
     interactivity:
       is_enabled: true
       request_url: https://your-app-fqdn.region.azurecontainerapps.io/slack/events
       message_menu_options_url: https://your-app-fqdn.region.azurecontainerapps.io/slack/events&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;LI&gt;Click&amp;nbsp;Save Changes&lt;/LI&gt;
&lt;LI&gt;Reinstall the app&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Critical: URL Verification Step&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;After updating your Slack App Manifest with the production URL, Slack will attempt to verify the new endpoint. This verification process is&amp;nbsp;&lt;STRONG&gt;mandatory &lt;/STRONG&gt;and must succeed before your bot will work in production.&lt;/P&gt;
&lt;P&gt;What happens during verification:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Slack sends a POST request to your new URL (https://your-app-fqdn.region.azurecontainerapps.io/slack/events)&lt;/LI&gt;
&lt;LI&gt;The request contains a challenge parameter that your Flask app must echo back&lt;/LI&gt;
&lt;LI&gt;If verification fails, Slack will reject the manifest changes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Common verification failures:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Container App not running: Ensure your Azure Container App is deployed and healthy&lt;/LI&gt;
&lt;LI&gt;Wrong URL format: Must end with /slack/events exactly&lt;/LI&gt;
&lt;LI&gt;HTTPS required: Slack only accepts HTTPS endpoints (Container Apps provides this automatically)&lt;/LI&gt;
&lt;LI&gt;Timeout issues: Container App must respond within Slack's timeout window&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;12.3 Bot installation and invitation&lt;/H4&gt;
&lt;P&gt;Required Post-Deployment Steps:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Slack App Manifest updated with production URL&lt;/LI&gt;
&lt;LI&gt;Reinstall the bot in your Slack workspace&lt;/LI&gt;
&lt;LI&gt;Invite the bot to channels: /invite @ azure-support&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Test with: /azure-support command&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;13. Testing and validation&lt;/H3&gt;
&lt;H4&gt;13.1 Health and connectivity checks&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Test basic connectivity (note: this will return 404 since the app has no root endpoint handler)
curl -f "$APP_URL/" || echo "Expected 404 - app only handles /slack/events endpoint"

# Check container app status
az containerapp show \
  --name $APP_NAME \
  --resource-group $RG \
  --query properties.provisioningState

# Check logs
az containerapp logs show \
  --name $APP_NAME \
  --resource-group $RG \
  --follow&lt;/LI-CODE&gt;
&lt;H4&gt;13.2 Functional testing&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Test Slack integration:&lt;BR /&gt;&lt;BR /&gt;&lt;LI-CODE lang="bash"&gt;/azure-support&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Complete workflow:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;Fill out the support ticket modal completely (details below)&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Submit the form&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Verify ticket appears in Azure Portal → Help + Support&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;13.3 Opening the support request&lt;/H4&gt;
&lt;P&gt;When you open the support request form, you’ll see a few fields that need your attention:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Subject: &lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;Think of this as your headline. Keep it short and clear&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Problem Details: &lt;/STRONG&gt;Here’s your chance to explain what’s going wrong. Be specific! The more details, the better.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Subscription, Service, Problem Type, and Resource: &lt;/STRONG&gt;Select the right options from the dropdown menus. This helps the support team route your ticket to the right experts.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You’ll notice options for advanced diagnostic info. If you’re not sure, just say “Yes” (it’s recommended). Set the severity, if it’s a minor issue, pick “Minimal impact.” And choose how you’d like to be contacted (email is usually easiest). Make sure your name and email are correct. If you want someone else to get updates, add their email too.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Once you’ve filled everything out, click &lt;STRONG&gt;Submit&lt;/STRONG&gt;. You’ll see a confirmation message, your ticket is on its way!&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you chose a Slack channel, you’ll get a message like this:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You’ll also get a link to view your ticket in the Azure portal, along an e-mail with all the details you provided.&lt;/P&gt;
&lt;H3&gt;14. Production observability&lt;/H3&gt;
&lt;H4&gt;14.1 Application Insights Integration&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Create Application Insights
APPINSIGHTS_NAME="ai-slack-support"

az monitor app-insights component create \
  --app $APPINSIGHTS_NAME \
  --location $LOCATION \
  --resource-group $RG \
  --workspace $LAW_NAME

# Get instrumentation key
APPINSIGHTS_KEY=$(az monitor app-insights component show \
  --app $APPINSIGHTS_NAME \
  --resource-group $RG \
  --query instrumentationKey -o tsv)

# Add to container app
az containerapp update \
  --name $APP_NAME \
  --resource-group $RG \
  --set-env-vars \
  "APPLICATIONINSIGHTS_INSTRUMENTATION_KEY=$APPINSIGHTS_KEY"&lt;/LI-CODE&gt;
&lt;H4&gt;14.2 Monitoring and alerts&lt;/H4&gt;
&lt;LI-CODE lang="bash"&gt;# Create alert for container app failures
az monitor metrics alert create \
  --name "SlackBot-ContainerFailures" \
  --resource-group $RG \
  --scopes "/subscriptions/$subscriptionId/resourceGroups/$RG/providers/Microsoft.App/containerApps/$APP_NAME" \
  --condition "avg Requests &amp;lt; 1" \
  --description "Slack bot container app is not receiving requests" \
  --window-size 5m \
  --evaluation-frequency 1m

# Create alert for Key Vault access failures  
az monitor metrics alert create \
  --name "SlackBot-KeyVaultAccess" \
  --resource-group $RG \
  --scopes "/subscriptions/$subscriptionId/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV_NAME" \
  --condition "total ServiceApiHit &amp;lt; 1" \
  --description "Slack bot unable to access Key Vault secrets" \
  --target-resource-type "Microsoft.KeyVault/vaults" \
  --target-resource-region $LOCATION \
  --window-size 5m \
  --evaluation-frequency 1m&lt;/LI-CODE&gt;
&lt;H3&gt;15. Security Hardening Checklist&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Authentication &amp;amp; Authorization&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;User-Assigned Managed Identity for all Azure resources&lt;/LI&gt;
&lt;LI&gt;RBAC-based access (no admin credentials)&lt;/LI&gt;
&lt;LI&gt;Key Vault for all secrets with proper role assignments&lt;/LI&gt;
&lt;LI&gt;Azure Support API permissions (Support Request Contributor + Reader)&lt;/LI&gt;
&lt;LI&gt;Least-privilege permissions&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Network Security &amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;HTTPS-only ingress (Container Apps provides TLS termination)&lt;/LI&gt;
&lt;LI&gt;No public admin endpoints&lt;/LI&gt;
&lt;LI&gt;Container registry private access via managed identity&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Operational Security&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Comprehensive logging with Log Analytics&lt;/LI&gt;
&lt;LI&gt;Health monitoring and alerting&lt;/LI&gt;
&lt;LI&gt;Automated vulnerability scanning (ACR)&lt;/LI&gt;
&lt;LI&gt;Secret rotation capability via Key Vault&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Application Security&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No secrets in code or environment variables&lt;/LI&gt;
&lt;LI&gt;Slack request signature verification&lt;/LI&gt;
&lt;LI&gt;Input validation and sanitization (built into Slack Bolt framework)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;16. Complete deployment script&lt;/H3&gt;
&lt;P&gt;Before running the one-command deployment script, ensure you've completed sections 3 and 4 above, then verify:&lt;BR /&gt;&lt;BR /&gt;1. You're in the repository root directory&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;pwd  # Should end with: azure-support-slack-bot
ls   # Should show: Dockerfile, requirements.txt, app.py&lt;/LI-CODE&gt;
&lt;P&gt;2. Docker is ready for building&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;docker ps  # Should not show permission errors&lt;/LI-CODE&gt;
&lt;P&gt;3. You have your Slack tokens ready&lt;/P&gt;
&lt;P&gt;Now, go from zero to production Slack bot in one command. Save this as deploy-slack-bot.sh for one-command deployment:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;#!/bin/bash
set -euo pipefail

# Check parameters
if [ $# -ne 3 ]; then
    echo "Usage: $0 &amp;lt;subscription-id&amp;gt; &amp;lt;slack-bot-token&amp;gt; &amp;lt;slack-signing-secret&amp;gt;"
    exit 1
fi

SUBSCRIPTION_ID="$1"
SLACK_BOT_TOKEN="$2"
SLACK_SIGNING_SECRET="$3"

# Configuration - modern naming conventions
RG="rg-slack-support-prod"
LOCATION="eastus"
ACR_NAME="acrsupport$RANDOM"
ENV_NAME="aca-slack-env"
APP_NAME="slack-support-app"
KV_NAME="kv-slack-$RANDOM"
UAMI_NAME="id-slack-support"
LAW_NAME="law-slack-support"

echo "🚀 Deploying secure Azure Support Slack Bot..."

# Set subscription context
az account set --subscription "$SUBSCRIPTION_ID"

# Create resource group
az group create --name $RG --location $LOCATION --tags environment=production project=slack-support

# Create ACR with security defaults
az acr create --name $ACR_NAME --resource-group $RG --sku Standard --admin-enabled false
az acr login --name $ACR_NAME

# Build and push image
IMAGE_NAME="$ACR_NAME.azurecr.io/azure-support-slack-bot:latest"
docker build -t $IMAGE_NAME .
docker push $IMAGE_NAME

# Create managed identity - zero-trust foundation
az identity create --name $UAMI_NAME --resource-group $RG --location $LOCATION
UAMI_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query id -o tsv)
UAMI_PRINCIPAL_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query principalId -o tsv)
UAMI_CLIENT_ID=$(az identity show --name $UAMI_NAME --resource-group $RG --query clientId -o tsv)

# Grant ACR pull permissions
ACR_ID=$(az acr show --name $ACR_NAME --resource-group $RG --query id -o tsv)
az role assignment create --assignee $UAMI_PRINCIPAL_ID --role "AcrPull" --scope $ACR_ID

# Grant Azure Support API permissions - least privilege
subscriptionId="$SUBSCRIPTION_ID"
az role assignment create --assignee $UAMI_PRINCIPAL_ID --role "Support Request Contributor" --scope "/subscriptions/$subscriptionId"
az role assignment create --assignee $UAMI_PRINCIPAL_ID --role "Reader" --scope "/subscriptions/$subscriptionId"

echo "Azure Support API permissions granted to managed identity"

# Create Key Vault with RBAC (no access policies)
az keyvault create --name $KV_NAME --resource-group $RG --location $LOCATION --enable-rbac-authorization true --retention-days 7
KV_SCOPE=$(az keyvault show --name $KV_NAME --resource-group $RG --query id -o tsv)

# Grant Key Vault permissions
USER_PRINCIPAL_ID=$(az ad signed-in-user show --query id -o tsv)
az role assignment create --assignee $USER_PRINCIPAL_ID --role "Key Vault Administrator" --scope $KV_SCOPE
az role assignment create --assignee $UAMI_PRINCIPAL_ID --role "Key Vault Secrets User" --scope $KV_SCOPE

# Wait for RBAC propagation
sleep 60

# Store secrets securely
az keyvault secret set --vault-name $KV_NAME --name "slack-bot-token" --value "$SLACK_BOT_TOKEN"
az keyvault secret set --vault-name $KV_NAME --name "slack-signing-secret" --value "$SLACK_SIGNING_SECRET"

# Create observability foundation
az monitor log-analytics workspace create --workspace-name $LAW_NAME --resource-group $RG --location $LOCATION
LAW_CUSTOMER_ID=$(az monitor log-analytics workspace show --workspace-name $LAW_NAME --resource-group $RG --query customerId -o tsv)
LAW_SHARED_KEY=$(az monitor log-analytics workspace get-shared-keys --workspace-name $LAW_NAME --resource-group $RG --query primarySharedKey -o tsv)

# Create Container Apps environment
az containerapp env create --name $ENV_NAME --resource-group $RG --location $LOCATION --logs-workspace-id $LAW_CUSTOMER_ID --logs-workspace-key $LAW_SHARED_KEY

# Deploy Container App with scale-to-zero
az containerapp create \
  --name $APP_NAME \
  --resource-group $RG \
  --environment $ENV_NAME \
  --image $IMAGE_NAME \
  --target-port 5000 \
  --ingress external \
  --registry-server "$ACR_NAME.azurecr.io" \
  --user-assigned $UAMI_ID \
  --min-replicas 0 \
  --max-replicas 10 \
  --cpu 0.5 \
  --memory 1Gi

# Configure Key Vault secret references
az containerapp secret set --name $APP_NAME --resource-group $RG --secrets \
  "slack-bot-token=keyvaultref:https://$KV_NAME.vault.azure.net/secrets/slack-bot-token,identityref:$UAMI_ID" \
  "slack-signing-secret=keyvaultref:https://$KV_NAME.vault.azure.net/secrets/slack-signing-secret,identityref:$UAMI_ID"

# Configure environment variables
az containerapp update --name $APP_NAME --resource-group $RG --set-env-vars \
  "SLACK_BOT_TOKEN=secretref:slack-bot-token" \
  "SLACK_SIGNING_SECRET=secretref:slack-signing-secret" \
  "AZURE_CLIENT_ID=$UAMI_CLIENT_ID" \
  "PORT=5000"

# Configure HTTP-based autoscaling
az containerapp update --name $APP_NAME --resource-group $RG \
  --scale-rule-name "http-requests" \
  --scale-rule-type "http" \
  --scale-rule-http-concurrency 50 \
  --min-replicas 0 \
  --max-replicas 10

# Get deployment results
APP_FQDN=$(az containerapp show --name $APP_NAME --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv)

echo ""
echo "🎉 Deployment Complete!"
echo ""
echo "Slack Webhook URL: https://$APP_FQDN/slack/events"
echo " Resource Group: $RG"
echo " Key Vault: $KV_NAME"
echo " ACR: $ACR_NAME"
echo ""
echo " Next Steps:"
echo "1. Update your Slack App Manifest with: https://$APP_FQDN/slack/events"
echo "2. Reinstall the Slack app in your workspace"
echo "3. Invite the bot to channels: /invite -support"
echo "4. Test with: /azure-support"
echo ""
echo "Monitor: az containerapp logs show -n $APP_NAME -g $RG --follow"
echo "Debug: az containerapp show -n $APP_NAME -g $RG --query properties.provisioningState"
&lt;/LI-CODE&gt;
&lt;P&gt;Usage:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;chmod +x deploy-slack-bot.sh
./deploy-slack-bot.sh "your-subscription-id" "xoxb-your-bot-token" "your-signing-secret"&lt;/LI-CODE&gt;
&lt;H4&gt;Cost Expectations&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Scale-to-zero architecture&amp;nbsp;= minimal compute costs&lt;/LI&gt;
&lt;LI&gt;Base charges: Key Vault ($0.03/day), Log Analytics ($2.30/GB ingested)&lt;/LI&gt;
&lt;LI&gt;Container Apps: Only charges when processing requests (true serverless)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Deployment Notes&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Script creates globally unique resource names using&amp;nbsp;$RANDOM&lt;/LI&gt;
&lt;LI&gt;Takes ~8-12 minutes due to RBAC propagation delays&lt;/LI&gt;
&lt;LI&gt;After deployment,&amp;nbsp;update your Slack App Manifest&amp;nbsp;with the production URL&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Post-Deployment Steps&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Update Slack App Manifest&amp;nbsp;with your Azure Container Apps URL&lt;/LI&gt;
&lt;LI&gt;Reinstall the Slack app&amp;nbsp;(required for URL changes)&lt;/LI&gt;
&lt;LI&gt;Test with /azure-support&amp;nbsp;or the global shortcut&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;17. Cleanup&lt;/H3&gt;
&lt;P&gt;When you're ready to remove all resources:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Delete resource group (removes all resources)
az group delete --name $RG --yes --no-wait

# Purge Key Vault (if purge protection was enabled)
az keyvault purge --name $KV_NAME --location $LOCATION&lt;/LI-CODE&gt;
&lt;H3&gt;18. You’re live, what’s next?&lt;/H3&gt;
&lt;P data-start="315" data-end="487"&gt;You’ve just deployed a &lt;STRONG data-start="338" data-end="386"&gt;production-grade Slack bot for Azure Support&lt;/STRONG&gt; using a modern, secure-by-default architecture — no manual secrets, no patchy scripts, no guesswork.&lt;/P&gt;
&lt;P data-start="489" data-end="622"&gt;What you now have is more than a bot — it’s a &lt;STRONG data-start="535" data-end="612"&gt;template for how digital native teams should approach platform automation&lt;/STRONG&gt; on Azure:&lt;/P&gt;
&lt;UL data-start="624" data-end="1016"&gt;
&lt;LI data-start="624" data-end="690"&gt;&lt;STRONG data-start="629" data-end="654"&gt;Zero-trust foundation&lt;/STRONG&gt; with managed identities + Key Vault&lt;/LI&gt;
&lt;LI data-start="691" data-end="747"&gt;&lt;STRONG data-start="696" data-end="719"&gt;Dev-first workflows&lt;/STRONG&gt; for local testing and CI/CD&lt;/LI&gt;
&lt;LI data-start="748" data-end="807"&gt;&lt;STRONG data-start="753" data-end="783"&gt;Scale-to-zero architecture&lt;/STRONG&gt; on Azure Container Apps&lt;/LI&gt;
&lt;LI data-start="808" data-end="875"&gt;&lt;STRONG data-start="813" data-end="839"&gt;Built-in observability&lt;/STRONG&gt; with Log Analytics and App Insights&lt;/LI&gt;
&lt;LI data-start="876" data-end="966"&gt;&lt;STRONG data-start="882" data-end="908"&gt;RBAC-controlled access&lt;/STRONG&gt; to support APIs — no over-permissioned service principals&lt;/LI&gt;
&lt;LI data-start="967" data-end="1016"&gt;&lt;STRONG data-start="972" data-end="997"&gt;End-to-end automation&lt;/STRONG&gt; via GitHub Actions&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1018" data-end="1169"&gt;This isn't just a bot — it's a pattern. A way to wire your internal tools to your platform securely, scalably, and with full auditability from day one.&lt;/P&gt;
&lt;P data-start="1018" data-end="1169"&gt;This guide was made for fast-moving teams who prefer CLI over click-ops and automation over tribal knowledge. If you're building platforms, bots, or tools to empower your engineering org, this is a foundation you can trust.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Sep 2025 17:28:18 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/azure-support-slack-bot-on-azure-container-apps-production-ready/ba-p/4436423</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2025-09-04T17:28:18Z</dc:date>
    </item>
    <item>
      <title>A practical guide to Azure VM SKU eligibility and zonal support monitoring</title>
      <link>https://techcommunity.microsoft.com/t5/startups-at-microsoft/a-practical-guide-to-azure-vm-sku-eligibility-and-zonal-support/ba-p/4415773</link>
      <description>&lt;img /&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-start="793" data-end="840"&gt;&lt;STRONG data-start="793" data-end="840"&gt; Important clarification about “capacity”&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="844" data-end="929"&gt;This guide does not provide real-time, deployable capacity signals for Azure VM SKUs. The solution is based on the Azure ResourceSkus API, which exposes SKU metadata, regional availability, zonal support, and subscription-level restrictions. It can tell you whether a SKU is eligible for your subscription in a region and which zones are supported.&lt;/P&gt;
&lt;P data-start="1203" data-end="1397"&gt;It does not guarantee that capacity is available at deployment time. Azure capacity is dynamic, and allocation failures can still occur even when a SKU appears available and quota is sufficient. This solution is best used to proactively detect SKU restrictions, understand zonal exposure, and build guardrails and alternatives before deployments. For guaranteed capacity, Azure Capacity Reservations or pre-deployment validation are required.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-start="92" data-end="276"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1664" data-end="1929"&gt;Look, Azure allocation failures can really derail your day. Most of the time, you only find out there’s a problem when a deployment fails. no clear early signal, no easy way to validate whether a SKU is even usable in a given region or zone for your subscription.&lt;/P&gt;
&lt;P data-start="1936" data-end="2220"&gt;After seeing this happen repeatedly with customers, I built a simple monitor that helps you proactively validate &lt;STRONG data-start="2049" data-end="2101"&gt;SKU eligibility, restrictions, and zonal support&lt;/STRONG&gt;, so you can catch “this SKU won’t work here” scenarios early and design alternatives before you hit deployment time.&lt;/P&gt;
&lt;P data-start="2227" data-end="2304"&gt;Thought I’d share it here. hopefully it saves you some of the same headaches.&lt;/P&gt;
&lt;H4&gt;What this thing does&lt;/H4&gt;
&lt;P&gt;This solution isn't fancy, but it works. Here's what it'll do for you:&lt;/P&gt;
&lt;OL&gt;
&lt;LI data-start="281" data-end="385"&gt;Checks whether specific VM SKUs are &lt;STRONG data-start="319" data-end="343"&gt;eligible and exposed&lt;/STRONG&gt; in a given region for your subscription&lt;/LI&gt;
&lt;LI data-start="386" data-end="529"&gt;Shows exactly &lt;STRONG data-start="402" data-end="429"&gt;why a SKU can’t be used&lt;/STRONG&gt; when there’s a restriction (for example, not available for the subscription or in specific zones)&lt;/LI&gt;
&lt;LI data-start="530" data-end="610"&gt;Shows which &lt;STRONG data-start="544" data-end="580"&gt;availability zones are supported&lt;/STRONG&gt; for each SKU in that region&lt;/LI&gt;
&lt;LI data-start="611" data-end="687"&gt;Suggests &lt;STRONG data-start="622" data-end="641"&gt;similar VM SKUs&lt;/STRONG&gt; you could consider when a SKU is restricted&lt;/LI&gt;
&lt;LI data-start="688" data-end="798"&gt;Logs all results to &lt;STRONG data-start="710" data-end="733"&gt;Azure Log Analytics&lt;/STRONG&gt; so you can track SKU exposure and restriction trends over time&lt;/LI&gt;
&lt;LI data-start="799" data-end="862"&gt;Runs directly from your terminal, no complex setup required&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4 data-start="2817" data-end="2850"&gt;What this solution does not do&lt;/H4&gt;
&lt;P data-start="2854" data-end="2938"&gt;This solution does not provide a real-time view of free or remaining Azure capacity. There is currently no public API that exposes live, deploy-time capacity per SKU, per zone, per region. As a result, even if a SKU appears eligible and zonally supported, deployments may still fail due to transient or regional capacity constraints.&lt;/P&gt;
&lt;P data-start="3196" data-end="3250"&gt;If you need allocation certainty, you should consider:&lt;/P&gt;
&lt;UL data-start="3255" data-end="3413"&gt;
&lt;LI data-start="3255" data-end="3286"&gt;Azure Capacity Reservations&lt;/LI&gt;
&lt;LI data-start="3289" data-end="3349"&gt;Running validation deployments as a point-in-time signal&lt;/LI&gt;
&lt;LI data-start="3352" data-end="3413"&gt;Designing for flexibility across SKUs, zones, and regions&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;How it's put together&lt;/H4&gt;
&lt;P&gt;It's pretty simple really - just two main Python scripts:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;The Monitoring Script&lt;/STRONG&gt;: Checks VM SKU eligibility, restrictions, and zonal support using Azure’s ResourceSkus API&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Log Analytics Setup&lt;/STRONG&gt;: Stores your data for later analysis (optional, but super useful)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Here's a quick diagram:&lt;/P&gt;
&lt;img /&gt;
&lt;H4&gt;Before you start&lt;/H4&gt;
&lt;P&gt;You'll need a few things:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Azure CLI&lt;/STRONG&gt; installed and working on your machine&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# If you haven't logged in yet
az login&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;2. Azure permissions&lt;/STRONG&gt;&amp;nbsp;if you're doing the Log Analytics part:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Get your username first
USER_PRINCIPAL=$(az ad signed-in-user show --query userPrincipalName -o tsv)
echo "Looks like you're logged in as: $USER_PRINCIPAL"

# Create a resource group - you can change the name if you want
az group create --name vm-sku-monitor-rg --location eastus2

# Give yourself the right permissions
az role assignment create \
  --assignee "$USER_PRINCIPAL" \
  --role "Monitoring Metrics Publisher" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourcegroups/vm-sku-monitor-rg"

# Double-check it worked
az role assignment list \
  --assignee "$USER_PRINCIPAL" \
  --role "Monitoring Metrics Publisher" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourcegroups/vm-sku-monitor-rg"&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Azure can be kinda slow with permissions sometimes. If you get weird 403 errors later, maybe grab a coffee and try again in 10-15 mins.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;3. Python environment setup&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Set up a virtual environment - don't skip this step!
# I learned this the hard way when I borked my system Python...
python3 -m venv venv

# Activate it
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install what we need
pip install azure-identity azure-mgmt-compute azure-mgmt-subscription azure-monitor-ingestion rich&lt;/LI-CODE&gt;
&lt;H4&gt;Let's build this thing&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;1. The VM Capacity Checking Script&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The star of the show is the monitoring script itself. This script does all the heavy lifting - checking VM availability, showing you what's happening, and logging the data for later.&amp;nbsp; I'll call it: &lt;A class="lia-external-url" href="https://gist.github.com/ricmmartins/7f8fd1c3408464e5ea652301017c701c" target="_blank" rel="noopener"&gt;monitor_vm_sku_capacity.py&lt;/A&gt;:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;The script uses compute_client.resource_skus.list() to evaluate SKU metadata, regional exposure, supported zones, and restriction codes. This API does not surface live allocation capacity.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;2. Log Analytics Setup Script&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Now for the script that sets up all the Log Analytics stuff. This part is optional, but really helpful if you want to track capacity trends over time: &lt;A class="lia-external-url" href="https://gist.github.com/ricmmartins/76b0e2e96f288a9b2635233570f5d4d7" target="_blank" rel="noopener"&gt;setup_log_analytics.py&lt;/A&gt;&lt;/P&gt;
&lt;H4&gt;Setting default region and VM SKU&lt;/H4&gt;
&lt;P&gt;You've got a few options to set your preferred region and VM SKU:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Edit script defaults&lt;/STRONG&gt;: Open monitor_vm_sku_capacity.py and look for:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;parser.add_argument('--region', type=str, default='eastus2',  # Change this!
                    help='Azure region to check (default: eastus2)')
parser.add_argument('--sku', type=str, default='Standard_D16ds_v5',  # And this!
                    help='VM SKU to check (default: Standard_D16ds_v5)')&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;2. Specify on command line&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python monitor_vm_sku_capacity.py --region westus2 --sku Standard_D8ds_v5&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;3. Edit config file&lt;/STRONG&gt;: After running the setup script, it creates a config.json with these values:&lt;/P&gt;
&lt;LI-CODE lang="json"&gt;{
  "region": "eastus2",
  "target_sku": "Standard_D16ds_v5",
  "check_zones": true,
  ...
}&lt;/LI-CODE&gt;
&lt;H4&gt;Finding Available Regions and SKUs&lt;/H4&gt;
&lt;P&gt;If you're wondering which regions and SKUs to monitor, here's how to get that info:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Using Azure CLI&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# List all regions
az account list-locations --query "[].name" -o tsv

# List all VM SKUs in a region 
az vm list-skus --location eastus2 --resource-type virtualMachines --query "[].name" -o tsv  

# Get detailed info about a specific SKU
az vm list-skus --location eastus2 --size Standard_D16ds_v5 -o table&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Using Azure Portal&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Just go to the VM creation page in the portal and click "See all sizes" - you'll get a nice visual list of all available options. I sometimes just take a screenshot of this for reference.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Using this tool&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;So here's how you use this thing. I tried to make it as simple as possible:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Set up Log Analytics first&lt;/STRONG&gt; (optional but recommended):&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python setup_log_analytics.py&lt;/LI-CODE&gt;
&lt;P&gt;This builds all the Log Analytics stuff and spits out a config file you can use in the next step. The default options should work fine for most people, but you can customize if needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2. Run the monitoring script&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python monitor_vm_sku_capacity.py --config config.json&lt;/LI-CODE&gt;
&lt;P&gt;If you don't want to mess with Log Analytics, you can just run it directly:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python monitor_vm_sku_capacity.py --region eastus2 --sku Standard_D16ds_v5&lt;/LI-CODE&gt;
&lt;P&gt;The output will look something like this (way prettier if you have the rich package installed):&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;================================================================================
AZURE VM SKU CAPACITY MONITOR - 2024-05-20 14:32:45
================================================================================

Status:       AVAILABLE
SKU:          Standard_D16ds_v5
Region:       eastus2
Subscription: My Azure Subscription (12345678-1234-1234-1234-123456789012)

Available Zones:
  - 1
  - 2
  - 3

VM SKU Specifications:
  vCPUs: 16
  MemoryGB: 64
  MaxDataDiskCount: 32
  PremiumIO: True
  AcceleratedNetworkingEnabled: True&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG data-start="3654" data-end="3667"&gt;AVAILABLE&lt;/STRONG&gt;&amp;nbsp;means no subscription-level restriction was detected and the SKU is exposed in this region. It does not guarantee deploy-time capacity.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Or if the VM is unavailable:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;================================================================================
AZURE VM SKU CAPACITY MONITOR - 2024-05-20 14:32:45
================================================================================

Status:       NOT AVAILABLE
SKU:          Standard_D16ds_v5
Region:       eastus2
Subscription: My Azure Subscription (12345678-1234-1234-1234-123456789012)
Details:      SKU Standard_D16ds_v5 is not available in region eastus2

Available Zones:
  None

Restrictions:
  Type:           Zone
  Reason:         NotAvailableForSubscription
  Affected Values: eastus2

VM SKU Specifications:
  vCPUs: 16
  MemoryGB: 64
  MaxDataDiskCount: 32
  PremiumIO: True
  AcceleratedNetworkingEnabled: True

Alternative SKUs:
  - Standard_D16as_v5 (vCPUs: 16, Memory: 64 GB, Family: standardDasv5Family, Similarity: 100%)
  - Standard_D16s_v5 (vCPUs: 16, Memory: 64 GB, Family: standardDsv5Family, Similarity: 100%)
  - Standard_D16s_v4 (vCPUs: 16, Memory: 64 GB, Family: standardDsv4Family, Similarity: 100%)
  - Standard_F16s_v2 (vCPUs: 16, Memory: 32 GB, Family: standardFSv2Family, Similarity: 80%)
  - Standard_E16s_v5 (vCPUs: 16, Memory: 128 GB, Family: standardEsv5Family, Similarity: 80%)&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG data-start="3808" data-end="3823"&gt;NOT AVAILABLE&amp;nbsp;&lt;/STRONG&gt;means the SKU is restricted for this subscription in this region or zone based on the ResourceSkus restriction signals.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4&gt;&lt;BR /&gt;Setting up scheduled checks&lt;/H4&gt;
&lt;P&gt;I don't like missing things, so I set mine up to run every hour using cron:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Open crontab editor
crontab -e

# Add this line to run it every hour
0 * * * * cd /path/to/scripts &amp;amp;&amp;amp; source venv/bin/activate &amp;amp;&amp;amp; python monitor_vm_sku_capacity.py --config config.json &amp;gt;&amp;gt; vm_sku_monitor.log 2&amp;gt;&amp;amp;1&lt;/LI-CODE&gt;
&lt;H4&gt;Checking your data in Log Analytics&lt;/H4&gt;
&lt;P&gt;If you set up Log Analytics, you can run all sorts of cool queries:&lt;/P&gt;
&lt;LI-CODE lang="kusto"&gt;// Basic query - see everything
VMSKUCapacity_CL
| order by TimeGenerated desc

// Find when capacity changed
VMSKUCapacity_CL
| where sku_name == "Standard_D16ds_v5" and region == "eastus2"
| project TimeGenerated, is_available
| order by TimeGenerated desc


// Simple dashboard
VMSKUCapacity_CL
| summarize LastStatus=arg_max(TimeGenerated, is_available), 
            LastChecked=max(TimeGenerated) 
  by sku_name, region
| extend Status = iff(LastStatus == true, "Available", "Not Available")
| project sku_name, region, Status, LastChecked&lt;/LI-CODE&gt;
&lt;P&gt;You can &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/tutorial-log-alert" target="_blank" rel="noopener"&gt;set up alerts too&lt;/A&gt;. That way Azure tells YOU when capacity changes, instead of you finding out during a failed deployment!&lt;/P&gt;
&lt;H4&gt;Troubleshooting&lt;/H4&gt;
&lt;P&gt;Some common problems I've run into:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;"Could not automatically detect subscription ID"&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Make sure you're logged in with&amp;nbsp;az login&lt;/LI&gt;
&lt;LI&gt;Or just provide it explicitly with&amp;nbsp;--subscription-id&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Log Analytics permission errors&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Make sure you ran the permission commands from the prerequisites section&lt;/LI&gt;
&lt;LI&gt;Azure's permissions can be weirdly slow - wait 10-15 minutes and try again&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Python environment issues&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Always use a virtual environment! I learned this one the hard way when I messed up my system Python&lt;/LI&gt;
&lt;LI&gt;Make sure all the packages are installed with pip install azure-identity azure-mgmt-compute azure-mgmt-subscription azure-monitor-ingestion rich&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Next Steps&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/visualize/tutorial-logs-dashboards" target="_blank" rel="noopener"&gt;Create a dashboard &lt;/A&gt;&amp;nbsp;to visualize VM SKU availability over time&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/tutorial-log-alert" target="_blank" rel="noopener"&gt;Set up alerts&amp;nbsp;&lt;/A&gt; to notify you when specific SKUs become available&lt;/LI&gt;
&lt;LI&gt;Integrate with your CI/CD pipeline to automatically select available SKUs&lt;/LI&gt;
&lt;LI&gt;For a serverless, fully managed option, create an Azure Function version of the monitoring script&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Advanced: Bulk-Deploy Feasibility Check&lt;/H4&gt;
&lt;P data-start="946" data-end="1035"&gt;Want to validate up front whether a SKU is eligible in a region and whether your subscription quota would allow N VMs?&lt;BR data-start="1021" data-end="1024" /&gt;We combine:&lt;/P&gt;
&lt;OL data-start="1037" data-end="1187"&gt;
&lt;LI data-start="1037" data-end="1106"&gt;&lt;STRONG data-start="1040" data-end="1058"&gt;Hardware-level&lt;/STRONG&gt;: Resource SKUs API (is the SKU unrestricted?)&lt;/LI&gt;
&lt;LI data-start="1107" data-end="1187"&gt;&lt;STRONG data-start="1110" data-end="1132"&gt;Subscription-level&lt;/STRONG&gt;: Usage API (enough free vCPU cores for &lt;EM data-start="1172" data-end="1175"&gt;N&lt;/EM&gt; instances?)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Prerequisites already covered above:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;az login
USER_PRINCIPAL=$(az ad signed-in-user show --query userPrincipalName -o tsv)

az group create --name vm-sku-monitor-rg --location eastus2

az role assignment create \
  --assignee "$USER_PRINCIPAL" \
  --role "Monitoring Metrics Publisher" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourcegroups/vm-sku-monitor-rg"

python3 -m venv venv &amp;amp;&amp;amp; source venv/bin/activate

pip install azure-identity azure-mgmt-compute azure-mgmt-subscription rich&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;File: monitor_vm_sku_capacity_bulk.py&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;#!/usr/bin/env python
"""
Azure VM SKU Capacity &amp;amp; Quota Monitor (with Zone support)

Checks:
  1) Whether your target SKU is available in a region or zone
  2) Whether your subscription has enough free vCPU quota to deploy N VMs
Optionally logs results into Azure Log Analytics.
"""

import argparse
import datetime
import json
import logging
import subprocess
from typing import List, Tuple, Dict, Any

from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.subscription import SubscriptionClient

# Rich for prettier tables
try:
    from rich.console import Console
    from rich.table import Table
    from rich import box
    RICH_AVAILABLE = True
except ImportError:
    RICH_AVAILABLE = False

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    handlers=[logging.StreamHandler()]
)
logger = logging.getLogger("vm_sku_capacity_monitor")


def parse_arguments():
    p = argparse.ArgumentParser(
        description="Azure VM SKU Capacity &amp;amp; Quota Monitor (with zone support)"
    )
    p.add_argument("--region",        type=str,   default="eastus2",
                   help="Azure region to check")
    p.add_argument("--sku",           type=str,   default="Standard_D16ds_v5",
                   help="VM SKU to check")
    p.add_argument("--zone",          type=str,   default=None,
                   help="(Optional) Availability zone to check (e.g. '1')")
    p.add_argument("--count",         type=int,   default=1,
                   help="Number of VMs you plan to deploy")
    p.add_argument("--log-analytics", action="store_true",
                   help="Enable logging to Azure Log Analytics")
    p.add_argument("--endpoint",      type=str,
                   help="Data Collection Endpoint URI")
    p.add_argument("--rule-id",       type=str,
                   help="Data Collection Rule ID")
    p.add_argument("--stream-name",   type=str, default="Custom-VMSKUCapacity_CL",
                   help="Log Analytics stream name")
    p.add_argument("--debug",         action="store_true",
                   help="Enable debug logging")
    p.add_argument("--config",        type=str,
                   help="Path to JSON config file")
    p.add_argument("--subscription-id", type=str,
                   help="Azure Subscription ID")
    return p.parse_args()


def load_configuration(args) -&amp;gt; Dict[str, Any]:
    cfg = {
        "region": args.region,
        "zone": args.zone,
        "target_sku": args.sku,
        "desired_count": args.count,
        "subscription_id": args.subscription_id,
        "log_analytics": {
            "enabled": args.log_analytics,
            "endpoint": args.endpoint,
            "rule_id": args.rule_id,
            "stream_name": args.stream_name
        }
    }
    if args.config:
        try:
            with open(args.config) as f:
                j = json.load(f)
                # merge known keys
                for k in ("region","zone","target_sku","desired_count","subscription_id"):
                    if k in j: cfg[k] = j[k]
                cfg["log_analytics"].update(j.get("log_analytics", {}))
                logger.info(f"Loaded configuration from {args.config}")
        except Exception as e:
            logger.error(f"Failed loading config {args.config}: {e}")
    # CLI args override file
    if args.region:     cfg["region"] = args.region
    if args.zone:       cfg["zone"] = args.zone
    if args.sku:        cfg["target_sku"] = args.sku
    if args.count:      cfg["desired_count"] = args.count
    if args.subscription_id:
        cfg["subscription_id"] = args.subscription_id
    return cfg


def get_subscription_id(explicit: str) -&amp;gt; str:
    if explicit:
        return explicit
    # Try Azure CLI
    try:
        out = subprocess.run(
            "az account show --query id -o tsv",
            shell=True, check=True,
            stdout=subprocess.PIPE, text=True
        ).stdout.strip()
        if out:
            return out
    except:
        pass
    # Fallback: Azure SDK
    cred = DefaultAzureCredential()
    subs = list(SubscriptionClient(cred).subscriptions.list())
    return subs[0].subscription_id if subs else None


def check_sku_availability(
    compute: ComputeManagementClient,
    region: str, sku: str, zone: str = None
) -&amp;gt; Tuple[bool, str, List[str], Dict[str, Any]]:
    """
    Returns:
      is_available (bool),
      reason (str or None),
      supported_zones (list of str),
      capabilities (dict of name→value)
    """
    skus = list(compute.resource_skus.list())
    entry = next(
        (s for s in skus
         if s.name.lower() == sku.lower()
         and region.lower() in [loc.lower() for loc in s.locations]),
        None
    )
    if not entry:
        return False, "NotFound", [], {}

    # Find all zones where this SKU is sold in that region
    supported_zones = []
    for loc_info in entry.location_info or []:
        if loc_info.location.lower() == region.lower():
            supported_zones = loc_info.zones or []
            break

    # Determine restrictions
    if zone:
        # 1) If SKU doesn’t support the requested zone
        if zone not in supported_zones:
            return False, "UnsupportedZone", supported_zones, {}
        # 2) Check zone-level restrictionInfo.zones
        restricted = [
            r for r in entry.restrictions
            if r.restriction_info.zones and zone in r.restriction_info.zones
        ]
    else:
        # Region-level check
        restricted = [
            r for r in entry.restrictions
            if region.lower() in [l.lower() for l in r.restriction_info.locations]
        ]

    is_avail = len(restricted) == 0
    reason   = restricted[0].reason_code if restricted else None

    # Pull out SKU capabilities (vCPUs, MemoryGB, etc.)
    caps = {c.name: c.value for c in entry.capabilities or []}

    return is_avail, reason, supported_zones, caps


def check_quota(
    compute: ComputeManagementClient,
    region: str, vcpus_needed: int, count: int
) -&amp;gt; Tuple[int,int,bool]:
    usage = list(compute.usage.list(location=region))
    core = next((u for u in usage if u.name.value.lower()=="cores"), None)
    free = (core.limit - core.current_value) if core else 0
    required = vcpus_needed * count
    return free, required, free &amp;gt;= required


def display(rdata: Dict[str, Any]):
    if RICH_AVAILABLE:
        c = Console()
        c.print(f"\n[bold underline]SKU Capacity &amp;amp; Quota (Zone) Check "
                f"({datetime.datetime.now():%Y-%m-%d %H:%M:%S})[/]\n")

        # Availability table
        t1 = Table(box=box.SIMPLE)
        t1.add_column("SKU"); t1.add_column("Region"); t1.add_column("Zone")
        t1.add_column("Available"); t1.add_column("Reason")
        t1.add_row(
            rdata["target_sku"], rdata["region"],
            rdata["zone"] or "-",
            "✅" if rdata["is_available"] else "❌",
            rdata["reason"] or "-"
        )
        c.print(t1)

        # Supported zones
        t0 = Table(box=box.SIMPLE)
        t0.add_column("Supported Zones")
        t0.add_row(", ".join(rdata["supported_zones"]) or "None")
        c.print(t0)

        # Quota table
        t2 = Table(box=box.SIMPLE)
        t2.add_column("Desired VMs", justify="right")
        t2.add_column("vCPUs/VM",   justify="right")
        t2.add_column("Free Cores", justify="right")
        t2.add_column("Needs Cores",justify="right")
        t2.add_column("Quota OK?",  justify="center")
        t2.add_row(
            str(rdata["desired_count"]),
            str(rdata["vcpus"]),
            str(rdata["free_cores"]),
            str(rdata["required_cores"]),
            "✅" if rdata["quota_ok"] else "❌"
        )
        c.print(t2)

    else:
        print(f"\nSKU {rdata['target_sku']} in {rdata['region']} "
              f"zone {rdata['zone'] or '-'}: "
              f"Available={rdata['is_available']} (Reason={rdata['reason']})")
        print("Supported zones:", ", ".join(rdata["supported_zones"]) or "None")
        print(f"Quota: need {rdata['required_cores']} cores, "
              f"have {rdata['free_cores']} → OK={rdata['quota_ok']}")


def main():
    args = parse_arguments()
    if args.debug:
        logger.setLevel(logging.DEBUG)

    cfg = load_configuration(args)
    cfg["subscription_id"] = get_subscription_id(cfg.get("subscription_id"))
    logger.info(f"Checking SKU {cfg['target_sku']} x{cfg['desired_count']} "
                f"in {cfg['region']} zone {cfg['zone']}")

    cred = DefaultAzureCredential()
    compute = ComputeManagementClient(cred, cfg["subscription_id"])

    # 1) SKU + zone availability
    is_avail, reason, zones, caps = check_sku_availability(
        compute, cfg["region"], cfg["target_sku"], cfg["zone"]
    )
    vcpus = int(caps.get("vCPUs", 0))

    # 2) Subscription quota check
    free, required, ok = check_quota(
        compute, cfg["region"], vcpus, cfg["desired_count"]
    )

    result = {
        "target_sku":      cfg["target_sku"],
        "region":          cfg["region"],
        "zone":            cfg["zone"],
        "supported_zones": zones,
        "desired_count":   cfg["desired_count"],
        "is_available":    is_avail,
        "reason":          reason,
        "vcpus":           vcpus,
        "free_cores":      free,
        "required_cores":  required,
        "quota_ok":        ok
    }

    display(result)

    # (Optional) send to Log Analytics…
    # [omitted for brevity]


if __name__ == "__main__":
    main()
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Run the bulk-deploy checker (region-level check)&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python monitor_vm_sku_capacity_bulk.py \
  --region centralus \
  --sku Standard_B2s_v2 \
  --count 10 &lt;/LI-CODE&gt;
&lt;P&gt;(Optionally add the parameter&amp;nbsp; &lt;STRONG&gt;--log-analytics --endpoint &amp;lt;DCE-URI&amp;gt; --rule-id &amp;lt;DCR-ID&amp;gt;&lt;/STRONG&gt; to send it to Log Analytics)&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example output&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;SKU Capacity &amp;amp; Quota (Zone) Check (2025-06-20 12:49:58)


  SKU               Region      Zone   Available   Reason
 ─────────────────────────────────────────────────────────
  Standard_B2s_v2   centralus   -      ✅          -


  Supported Zones
 ─────────────────
  1, 3, 2


  Desired VMs   vCPUs/VM   Free Cores   Needs Cores   Quota OK?
 ───────────────────────────────────────────────────────────────
           10          2          100            20      ✅&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Availability in this output reflects SKU eligibility, not real-time capacity.&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Run the bulk-deploy checker (zone-level heck)&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python monitor_vm_sku_capacity_bulk.py \
  --region centralus \
  --zone 2 \
  --sku Standard_B2s_v2 \
  --count 10 &lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Example output&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;SKU Capacity &amp;amp; Quota (Zone) Check (2025-06-20 12:42:22)


  SKU               Region      Zone   Available   Reason
 ─────────────────────────────────────────────────────────
  Standard_B2s_v2   centralus   2      ✅          -


  Supported Zones
 ─────────────────
  1, 3, 2


  Desired VMs   vCPUs/VM   Free Cores   Needs Cores   Quota OK?
 ───────────────────────────────────────────────────────────────
           10          2          100            20      ✅&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Availability in this output reflects SKU eligibility and zonal exposure, not real-time capacity.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4&gt;Final Thoughts&lt;/H4&gt;
&lt;P data-start="3103" data-end="3316"&gt;This solution has proven to be a valuable asset for Azure infrastructure planning. It helps teams proactively identify SKU restrictions, understand zonal exposure, and spot changes in SKU eligibility over time.&lt;/P&gt;
&lt;P data-start="3323" data-end="3533"&gt;Used correctly, it reduces surprise deployment failures by surfacing &lt;STRONG data-start="3392" data-end="3421"&gt;where SKUs cannot be used&lt;/STRONG&gt; early, enabling better design decisions around regions, zones, and alternatives before production deployments&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;Happy monitoring!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jan 2026 21:43:33 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/startups-at-microsoft/a-practical-guide-to-azure-vm-sku-eligibility-and-zonal-support/ba-p/4415773</guid>
      <dc:creator>rmmartins</dc:creator>
      <dc:date>2026-01-20T21:43:33Z</dc:date>
    </item>
  </channel>
</rss>

