load balancer
44 TopicsA Guide to Azure Data Transfer Pricing
Understanding Azure networking charges is essential for businesses aiming to manage their budgets effectively. Given the complexity of Azure networking pricing, which involves various influencing factors, the goal here is to bring a clearer understanding of the associated data transfer costs by breaking down the pricing models into the following use cases: VM to VM VM to Private Endpoint VM to Internal Standard Load Balancer (ILB) VM to Internet Hybrid connectivity Please note this is a first version, with a second version to follow that will include additional scenarios. Disclaimer: Pricing may change over time, check the public Azure pricing calculator for up-to-date pricing information. Actual pricing may vary depending on agreements, purchase dates, and currency exchange rates. Sign in to the Azure pricing calculator to see pricing based on your current program/offer with Microsoft. 1. VM to VM 1.1. VM to VM, same VNet Data transfer within the same virtual network (VNet) is free of charge. This means that traffic between VMs within the same VNet will not incur any additional costs. Doc. Data transfer across Availability Zones (AZ) is free. Doc. 1.2. VM to VM, across VNet peering Azure VNet peering enables seamless connectivity between two virtual networks, allowing resources in different VNets to communicate with each other as if they were within the same network. When data is transferred between VNets, charges apply for both ingress and egress data. Doc: VM to VM, across VNet peering, same region VM to VM, across Global VNet peering Azure regions are grouped into 3 Zones (distinct from Avaialbility Zones within a specific Azure region). The pricing for Global VNet Peering is based on that geographic structure. Data transfer between VNets in different zones incurs outbound and inbound data transfer rates for the respective zones. When data is transferred from a VNet in Zone 1 to a VNet in Zone 2, outbound data transfer rates for Zone 1 and inbound data transfer rates for Zone 2 will be applicable. Doc. 1.3. VM to VM, through Network Virtual Appliance (NVA) Data transfer through an NVA involves charges for both ingress and egress data, depending on the volume of data processed. When an NVA is in the path, such as for spoke VNet to spoke VNet connectivity via an NVA (firewall...) in the hub VNet, it incurs VM to VM pricing twice. The table above reflects only data transfer charges and does not include NVA/Azure Firewall processing costs. 2. VM to Private Endpoint (PE) Private Endpoint pricing includes charges for the provisioned resource and data transfer costs based on traffic direction. For instance, writing to a Storage Account through a Private Endpoint incurs outbound data charges, while reading incurs inbound data charges. Doc: 2.1. VM to PE, same VNet Since data transfer within a VNet is free, charges are only applied for data processing through the Private Endpoint. Cross-region traffic will incur additional costs if the Storage Account and the Private Endpoint are located in different regions. 2.2. VM to PE, across VNet peering Accessing Private Endpoints from a peered network incurs only Private Link Premium charges, with no peering fees. Doc. VM to PE, across VNet peering, same region VM to PE, across VNet peering, PE region != SA region 2.3. VM to PE, through NVA When an NVA is in the path, such as for spoke VNet to spoke VNet connectivity via a firewall in the hub VNet, it incurs VM to VM charges between the VM and the NVA. However, as per the PE pricing model, there are no charges between the NVA and the PE. The table above reflects only data transfer charges and does not include NVA/Azure Firewall processing costs. 3. VM to Internal Load Balancer (ILB) Azure Standard Load Balancer pricing is based on the number of load balancing rules as well as the volume of data processed. Doc: 3.1. VM to ILB, same VNet Data transfer within the same virtual network (VNet) is free. However, the data processed by the ILB is charged based on its volume and on the number load balancing rules implemented. Only the inbound traffic is processed by the ILB (and charged), the return traffic goes direct from the backend to the source VM (free of charge). 3.2. VM to ILB, across VNet peering In addition to the Load Balancer costs, data transfer charges between VNets apply for both ingress and egress. 3.3. VM to ILB, through NVA When an NVA is in the path, such as for spoke VNet to spoke VNet connectivity via a firewall in the hub VNet, it incurs VM to VM charges between the VM and the NVA and VM to ILB charges between the NVA and the ILB/backend resource. The table above reflects only data transfer charges and does not include NVA/Azure Firewall processing costs. 4. VM to internet 4.1. Data transfer and inter-region pricing model Bandwidth refers to data moving in and out of Azure data centers, as well as data moving between Azure data centers; other transfers are explicitly covered by the Content Delivery Network, ExpressRoute pricing, or Peering. Doc: 4.2. Routing Preference in Azure and internet egress pricing model When creating a public IP in Azure, Azure Routing Preference allows you to choose how your traffic routes between Azure and the Internet. You can select either the Microsoft Global Network or the public internet for routing your traffic. Doc: See how this choice can impact the performance and reliability of network traffic: By selecting a Routing Preference set to Microsoft network, ingress traffic enters the Microsoft network closest to the user, and egress traffic exits the network closest to the user, minimizing travel on the public internet (“Cold Potato” routing). On the contrary, setting the Routing Preference to internet, ingress traffic enters the Microsoft network closest to the hosted service region. Transit ISP networks are used to route traffic, travel on the Microsoft Global Network is minimized (“Hot Potato” routing). Bandwidth pricing for internet egress, Doc: 4.3. VM to internet, direct Data transferred out of Azure to the internet incurs charges, while data transferred into Azure is free of charge. Doc. It is important to note that default outbound access for VMs in Azure will be retired on September 30 2025, migration to an explicit outbound internet connectivity method is recommended. Doc. 4.4. VM to internet, with a public IP Here a standard public IP is explicitly associated to a VM NIC, that incurs additional costs. Like in the previous scenario, data transferred out of Azure to the internet incurs charges, while data transferred into Azure is free of charge. Doc. 4.5. VM to internet, with NAT Gateway In addition to the previous costs, data transfer through a NAT Gateway involves charges for both the data processed and the NAT Gateway itself, Doc: 5. Hybrid connectivity Hybrid connectivity involves connecting on-premises networks to Azure VNets. The pricing model includes charges for data transfer between the on-premises network and Azure, as well as any additional costs for using Network Virtual Appliances (NVAs) or Azure Firewalls in the hub VNet. 5.1. H&S Hybrid connectivity without firewall inspection in the hub For an inbound flow, from the ExpressRoute Gateway to a spoke VNet, VNet peering charges are applied once on the spoke inbound. There are no charges on the hub outbound. For an outbound flow, from a spoke VNet to an ER branch, VNet peering charges are applied once, outbound of the spoke only. There are no charges on the hub inbound. Doc. The table above does not include ExpressRoute connectivity related costs. 5.2. H&S Hybrid connectivity with firewall inspection in the hub Since traffic transits and is inspected via a firewall in the hub VNet (Azure Firewall or 3P firewall NVA), the previous concepts do not apply. “Standard” inter-VNet VM-to-VM charges apply between the FW and the destination VM : inbound and outbound on both directions. Once outbound from the source VNet (Hub or Spoke), once inbound on the destination VNet (Spoke or Hub). The table above reflects only data transfer charges within Azure and does not include NVA/Azure Firewall processing costs nor the costs related to ExpressRoute connectivity. 5.3. H&S Hybrid connectivity via a 3rd party connectivity NVA (SDWAN or IPSec) Standard inter-VNet VM-to-VM charges apply between the NVA and the destination VM: inbound and outbound on both directions, both in the Hub VNet and in the Spoke VNet. 5.4. vWAN scenarios VNet peering is charged only from the point of view of the spoke – see examples and vWAN pricing components. Next steps with cost management To optimize cost management, Azure offers tools for monitoring and analyzing network charges. Azure Cost Management and Billing allows you to track and allocate costs across various services and resources, ensuring transparency and control over your expenses. By leveraging these tools, businesses can gain a deeper understanding of their network costs and make informed decisions to optimize their Azure spending.18KViews15likes2CommentsDistribute global traffic with ultra-low latency using Azure Load Balancer
Today, we are so excited to announce the general availability of Azure cross-region Load Balancer in all Azure public and national cloud regions. Since the preview, this product has been used by so many of you, our customers, whose valuable feedback has helped further improve the product. Our Global tier of Azure Load Balancer is ready for you to use in your production workloads. It is backed by the same 99.99% availability SLA.18KViews6likes10CommentsDeploying Third-Party Firewalls in Azure Landing Zones: Design, Configuration, and Best Practices
As enterprises adopt Microsoft Azure for large-scale workloads, securing network traffic becomes a critical part of the platform foundation. Azure’s Well-Architected Framework provides the blueprint for enterprise-scale Landing Zone design and deployments, and while Azure Firewall is a built-in PaaS option, some organizations prefer third-party firewall appliances for familiarity, feature depth, and vendor alignment. This blog explains the basic design patterns, key configurations, and best practices when deploying third-party firewalls (Palo Alto, Fortinet, Check Point, etc.) as part of an Azure Landing Zone. 1. Landing Zone Architecture and Firewall Role The Azure Landing Zone is Microsoft’s recommended enterprise-scale architecture for adopting cloud at scale. It provides a standardized, modular design that organizations can use to deploy and govern workloads consistently across subscriptions and regions. At its core, the Landing Zone follows a hub-and-spoke topology: Hub (Connectivity Subscription): Central place for shared services like DNS, private endpoints, VPN/ExpressRoute gateways, Azure Firewall (or third-party firewall appliances), Bastion, and monitoring agents. Provides consistent security controls and connectivity for all workloads. Firewalls are deployed here to act as the traffic inspection and enforcement point. Spokes (Workload Subscriptions): Application workloads (e.g., SAP, web apps, data platforms) are placed in spoke VNets. Additional specialized spokes may exist for Identity, Shared Services, Security, or Management. These are isolated for governance and compliance, but all connectivity back to other workloads or on-premises routes through the hub. Traffic Flows Through Firewalls North-South Traffic: Inbound connections from the Internet (e.g., customer access to applications). Outbound connections from Azure workloads to Internet services. Hybrid connectivity to on-premises datacenters or other clouds. Routed through the external firewall set for inspection and policy enforcement. East-West Traffic: Lateral traffic between spokes (e.g., Application VNet to Database VNet). Communication across environments like Dev → Test → Prod (if allowed). Routed through an internal firewall set to apply segmentation, zero-trust principles, and prevent lateral movement of threats. Why Firewalls Matter in the Landing Zone While Azure provides NSGs (Network Security Groups) and Route Tables for basic packet filtering and routing, these are not sufficient for advanced security scenarios. Firewalls add: Deep packet inspection (DPI) and application-level filtering. Intrusion Detection/Prevention (IDS/IPS) capabilities. Centralized policy management across multiple spokes. Segmentation of workloads to reduce blast radius of potential attacks. Consistent enforcement of enterprise security baselines across hybrid and multi-cloud. Organizations May Choose Depending on security needs, cost tolerance, and operational complexity, organizations typically adopt one of two models for third party firewalls: Two sets of firewalls One set dedicated for north-south traffic (external to Azure). Another set for east-west traffic (between VNets and spokes). Provides the highest security granularity, but comes with higher cost and management overhead. Single set of firewalls A consolidated deployment where the same firewall cluster handles both east-west and north-south traffic. Simpler and more cost-effective, but may introduce complexity in routing and policy segregation. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. 2. Why Choose Third-Party Firewalls Over Azure Firewall? While Azure Firewall provides simplicity as a managed service, customers often choose third-party solutions due to: Advanced features – Deep packet inspection, IDS/IPS, SSL decryption, threat feeds. Vendor familiarity – Network teams trained on Palo Alto, Fortinet, or Check Point. Existing contracts – Enterprise license agreements and support channels. Hybrid alignment – Same vendor firewalls across on-premises and Azure. Azure Firewall is a fully managed PaaS service, ideal for customers who want a simple, cloud-native solution without worrying about underlying infrastructure. However, many enterprises continue to choose third-party firewall appliances (Palo Alto, Fortinet, Check Point, etc.) when implementing their Landing Zones. The decision usually depends on capabilities, familiarity, and enterprise strategy. Key Reasons to Choose Third-Party Firewalls Feature Depth and Advanced Security Third-party vendors offer advanced capabilities such as: Deep Packet Inspection (DPI) for application-aware filtering. Intrusion Detection and Prevention (IDS/IPS). SSL/TLS decryption and inspection. Advanced threat feeds, malware protection, sandboxing, and botnet detection. While Azure Firewall continues to evolve, these vendors have a longer track record in advanced threat protection. Operational Familiarity and Skills Network and security teams often have years of experience managing Palo Alto, Fortinet, or Check Point appliances on-premises. Adopting the same technology in Azure reduces the learning curve and ensures faster troubleshooting, smoother operations, and reuse of existing playbooks. Integration with Existing Security Ecosystem Many organizations already use vendor-specific management platforms (e.g., Panorama for Palo Alto, FortiManager for Fortinet, or SmartConsole for Check Point). Extending the same tools into Azure allows centralized management of policies across on-premises and cloud, ensuring consistent enforcement. Compliance and Regulatory Requirements Certain industries (finance, healthcare, government) require proven, certified firewall vendors for security compliance. Customers may already have third-party solutions validated by auditors and prefer extending those to Azure for consistency. Hybrid and Multi-Cloud Alignment Many enterprises run a hybrid model, with workloads split across on-premises, Azure, AWS, or GCP. Third-party firewalls provide a common security layer across environments, simplifying multi-cloud operations and governance. Customization and Flexibility Unlike Azure Firewall, which is a managed service with limited backend visibility, third-party firewalls give admins full control over operating systems, patching, advanced routing, and custom integrations. This flexibility can be essential when supporting complex or non-standard workloads. Licensing Leverage (BYOL) Enterprises with existing enterprise agreements or volume discounts can bring their own firewall licenses (BYOL) to Azure. This often reduces cost compared to pay-as-you-go Azure Firewall pricing. When Azure Firewall Might Still Be Enough Organizations with simple security needs (basic north-south inspection, FQDN filtering). Cloud-first teams that prefer managed services with minimal infrastructure overhead. Customers who want to avoid manual scaling and VM patching that comes with IaaS appliances. In practice, many large organizations use a hybrid approach: Azure Firewall for lightweight scenarios or specific environments, and third-party firewalls for enterprise workloads that require advanced inspection, vendor alignment, and compliance certifications. 3. Deployment Models in Azure Third-party firewalls in Azure are primarily IaaS-based appliances deployed as virtual machines (VMs). Leading vendors publish Azure Marketplace images and ARM/Bicep templates, enabling rapid, repeatable deployments across multiple environments. These firewalls allow organizations to enforce advanced network security policies, perform deep packet inspection, and integrate with Azure-native services such as Virtual Network (VNet) peering, Azure Monitor, and Azure Sentinel. Note: Some vendors now also release PaaS versions of their firewalls, offering managed firewall services with simplified operations. However, for the purposes of this blog, we will focus mainly on IaaS-based firewall deployments. Common Deployment Modes Active-Active Description: In this mode, multiple firewall VMs operate simultaneously, sharing the traffic load. An Azure Load Balancer distributes inbound and outbound traffic across all active firewall instances. Use Cases: Ideal for environments requiring high throughput, resilience, and near-zero downtime, such as enterprise data centers, multi-region deployments, or mission-critical applications. Considerations: Requires careful route and policy synchronization between firewall instances to ensure consistent traffic handling. Typically involves BGP or user-defined routes (UDRs) for optimal traffic steering. Scaling is easier: additional firewall VMs can be added behind the load balancer to handle traffic spikes. Active-Passive Description: One firewall VM handles all traffic (active), while the secondary VM (passive) stands by for failover. When the active node fails, Azure service principals manage IP reassignment and traffic rerouting. Use Cases: Suitable for environments where simpler management and lower operational complexity are preferred over continuous load balancing. Considerations: Failover may result in a brief downtime, typically measured in seconds to a few minutes. Synchronization between the active and passive nodes ensures firewall policies, sessions, and configurations are mirrored. Recommended for smaller deployments or those with predictable traffic patterns. Network Interfaces (NICs) Third-party firewall VMs often include multiple NICs, each dedicated to a specific type of traffic: Untrust/Public NIC: Connects to the Internet or external networks. Handles inbound/outbound public traffic and enforces perimeter security policies. Trust/Internal NIC: Connects to private VNets or subnets. Manages internal traffic between application tiers and enforces internal segmentation. Management NIC: Dedicated to firewall management traffic. Keeps administration separate from data plane traffic, improving security and reducing performance interference. HA NIC (Active-Passive setups): Facilitates synchronization between active and passive firewall nodes, ensuring session and configuration state is maintained across failovers. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. : NICs of Palo Alto External Firewalls and FortiGate Internal Firewalls in two sets of firewall scenario 4. Key Configuration Considerations When deploying third-party firewalls in Azure, several design and configuration elements play a critical role in ensuring security, performance, and high availability. These considerations should be carefully aligned with organizational security policies, compliance requirements, and operational practices. Routing User-Defined Routes (UDRs): Define UDRs in spoke Virtual Networks to ensure all outbound traffic flows through the firewall, enforcing inspection and security policies before reaching the Internet or other Virtual Networks. Centralized routing helps standardize controls across multiple application Virtual Networks. Depending on the architecture traffic flow design, use appropriate Load Balancer IP as the Next Hop on UDRs of spoke Virtual Networks. Symmetric Routing: Ensure traffic follows symmetric paths (i.e., outbound and inbound flows pass through the same firewall instance). Avoid asymmetric routing, which can cause stateful firewalls to drop return traffic. Leverage BGP with Azure Route Server where supported, to simplify route propagation across hub-and-spoke topologies. : Azure UDR directing all traffic from a Spoke VNET to the Firewall IP Address Policies NAT Rules: Configure DNAT (Destination NAT) rules to publish applications securely to the Internet. Use SNAT (Source NAT) to mask private IPs when workloads access external resources. Security Rules: Define granular allow/deny rules for both north-south traffic (Internet to VNet) and east-west traffic (between Virtual Networks or subnets). Ensure least privilege by only allowing required ports, protocols, and destinations. Segmentation: Apply firewall policies to separate workloads, environments, and tenants (e.g., Production vs. Development). Enforce compliance by isolating workloads subject to regulatory standards (PCI-DSS, HIPAA, GDPR). Application-Aware Policies: Many vendors support Layer 7 inspection, enabling controls based on applications, users, and content (not just IP/port). Integrate with identity providers (Azure AD, LDAP, etc.) for user-based firewall rules. : Example Configuration of NAT Rules on a Palo Alto External Firewall Load Balancers Internal Load Balancer (ILB): Use ILBs for east-west traffic inspection between Virtual Networks or subnets. Ensures that traffic between applications always passes through the firewall, regardless of origin. External Load Balancer (ELB): Use ELBs for north-south traffic, handling Internet ingress and egress. Required in Active-Active firewall clusters to distribute traffic evenly across firewall nodes. Other Configurations: Configure health probes for firewall instances to ensure faulty nodes are automatically bypassed. Validate Floating IP configuration on Load Balancing Rules according to the respective vendor recommendations. Identity Integration Azure Service Principals: In Active-Passive deployments, configure service principals to enable automated IP reassignment during failover. This ensures continuous service availability without manual intervention. Role-Based Access Control (RBAC): Integrate firewall management with Azure RBAC to control who can deploy, manage, or modify firewall configurations. SIEM Integration: Stream logs to Azure Monitor, Sentinel, or third-party SIEMs for auditing, monitoring, and incident response. Licensing Pay-As-You-Go (PAYG): Licenses are bundled into the VM cost when deploying from the Azure Marketplace. Best for short-term projects, PoCs, or variable workloads. Bring Your Own License (BYOL): Enterprises can apply existing contracts and licenses with vendors to Azure deployments. Often more cost-effective for large-scale, long-term deployments. Hybrid Licensing Models: Some vendors support license mobility from on-premises to Azure, reducing duplication of costs. 5. Common Challenges Third-party firewalls in Azure provide strong security controls, but organizations often face practical challenges in day-to-day operations: Misconfiguration Incorrect UDRs, route tables, or NAT rules can cause dropped traffic or bypassed inspection. Asymmetric routing is a frequent issue in hub-and-spoke topologies, leading to session drops in stateful firewalls. Performance Bottlenecks Firewall throughput depends on the VM SKU (CPU, memory, NIC limits). Under-sizing causes latency and packet loss, while over-sizing adds unnecessary cost. Continuous monitoring and vendor sizing guides are essential. Failover Downtime Active-Passive models introduce brief service interruptions while IPs and routes are reassigned. Some sessions may be lost even with state sync, making Active-Active more attractive for mission-critical workloads. Backup & Recovery Azure Backup doesn’t support vendor firewall OS. Configurations must be exported and stored externally (e.g., storage accounts, repos, or vendor management tools). Without proper backups, recovery from failures or misconfigurations can be slow. Azure Platform Limits on Connections Azure imposes a per-VM cap of 250,000 active connections, regardless of what the firewall vendor appliance supports. This means even if an appliance is designed for millions of sessions, it will be constrained by Azure’s networking fabric. Hitting this cap can lead to unexplained traffic drops despite available CPU/memory. The workaround is to scale out horizontally (multiple firewall VMs behind a load balancer) and carefully monitor connection distribution. 6. Best Practices for Third-Party Firewall Deployments To maximize security, reliability, and performance of third-party firewalls in Azure, organizations should follow these best practices: Deploy in Availability Zones: Place firewall instances across different Availability Zones to ensure regional resilience and minimize downtime in case of zone-level failures. Prefer Active-Active for Critical Workloads: Where zero downtime is a requirement, use Active-Active clusters behind an Azure Load Balancer. Active-Passive can be simpler but introduces failover delays. Use Dedicated Subnets for Interfaces: Separate trust, untrust, HA, and management NICs into their own subnets. This enforces segmentation, simplifies route management, and reduces misconfiguration risk. Apply Least Privilege Policies: Always start with a deny-all baseline, then allow only necessary applications, ports, and protocols. Regularly review rules to avoid policy sprawl. Standardize Naming & Tagging: Adopt consistent naming conventions and resource tags for firewalls, subnets, route tables, and policies. This aids troubleshooting, automation, and compliance reporting. Validate End-to-End Traffic Flows: Test both north-south (Internet ↔ VNet) and east-west (VNet ↔ VNet/subnet) flows after deployment. Use tools like Azure Network Watcher and vendor traffic logs to confirm inspection. Plan for Scalability: Monitor throughput, CPU, memory, and session counts to anticipate when scale-out or higher VM SKUs are needed. Some vendors support autoscaling clusters for bursty workloads. Maintain Firmware & Threat Signatures: Regularly update the firewall’s software, patches, and threat intelligence feeds to ensure protection against emerging vulnerabilities and attacks. Automate updates where possible. Conclusion Third-party firewalls remain a core building block in many enterprise Azure Landing Zones. They provide the deep security controls and operational familiarity enterprises need, while Azure provides the scalable infrastructure to host them. By following the hub-and-spoke architecture, carefully planning deployment models, and enforcing best practices for routing, redundancy, monitoring, and backup, organizations can ensure a secure and reliable network foundation in Azure.3.3KViews5likes2CommentsAzure Front Door: Resiliency Series – Part 2: Faster recovery (RTO)
In Part 1 of this blog series, we outlined our four‑pillar strategy for resiliency in Azure Front Door: configuration resiliency, data plane resiliency, tenant isolation, and accelerated Recovery Time Objective (RTO). Together, these pillars help Azure Front Door remain continuously available and resilient at global scale. Part 1 focused on the first two pillars: configuration and data plane resiliency. Our goal is to make configuration propagation safer, so incompatible changes never escape pre‑production environments. We discussed how incompatible configurations are blocked early, and how data plane resiliency ensures the system continues serving traffic from a last‑known‑good (LKG) configuration even if a bad change manages to propagate. We also introduced ‘Food Taster’, a dedicated sacrificial process running in each edge server’s data plane, that pretests every configuration change in isolation, before it ever reaches the live data plane. In this post, we turn to the recovery pillar. We describe how we have made key enhancements to the Azure Front Door recovery path so the system can return to full operation in a predictable and bounded timeframe. For a global service like Azure Front Door, serving hundreds of thousands of tenants across 210+ edge sites worldwide, we set an explicit target: to be able to recover any edge site – or all edge sites – within approximately 10 minutes, even in worst‑case scenarios. In typical data plane crash scenarios, we expect recovery in under a second. Repair status The first blog post in this series mentioned the two Azure Front Door incidents from October 2025 – learn more by watching our Azure Incident Retrospective session recordings for the October 9 th incident and/or the October 29 th incident. Before diving into our platform investments for improving our Recovery Time Objectives (RTO), we wanted to provide a quick update on the overall repair items from these incidents. We are pleased to report that the work on configuration propagation and data plane resiliency is now complete and fully deployed across the platform (in the table below, “Completed” means broadly deployed in production). With this, we have reduced configuration propagation latency from ~45 minutes to ~20 minutes. We anticipate reducing this even further – to ~15 minutes by the end of April 2026, while ensuring that platform stability remains our top priority. Learning category Goal Repairs Status Safe customer configuration deployment Incompatible configuration never propagates beyond ‘EUAP or canary regions’ Control plane and data plane defect fixes Forced synchronous configuration processing Additional stages with extended bake time Early detection of crash state Completed Data plane resiliency Configuration processing cannot impact data plane availability Manage data-plane lifecycle to prevent outages caused by configuration-processing defects. Completed Isolated work-process in every data plane server to process and load the configuration. Completed 100% Azure Front Door resiliency posture for Microsoft internal services Microsoft operates an isolated, independent Active/Active fleet with automatic failover for critical Azure services Phase 1: Onboarded critical services batch impacted on Oct 29 th outage running on a day old configuration Completed Phase 2: Automation & hardening of operations, auto-failover and self-management of Azure Front Door onboarding for additional services March 2026 Recovery improvements Data plane crash recovery in under 10 minutes Data plane boot-up time optimized via local cache (~1 hour) Completed Accelerate recovery time < 10 minutes April 2026 Tenant isolation No configuration or traffic regression can impact other tenants Micro cellular Azure Front Door with ingress layered shards June 2026 Why recovery at edge scale is deceptively hard To understand why recovery took as long as it did, it helps to first understand how the Azure Front Door data plane processes configuration. Azure Front Door operates in 210+ edge sites with multiple servers per site. The data plane of each edge server hosts multiple processes. A master process orchestrates the lifecycle of multiple worker processes, that serve customer traffic. A separate configuration translator process runs alongside the data plane processes, and is responsible for converting customer configuration bundles from the control plane into optimized binary FlatBuffer files. This translation step, covering hundreds of thousands of tenants, represents hours of cumulative computation. A per edge server cache is kept locally at each server level – to enable a fast recovery of the data plane, if needed. Once the configuration translator process produces these FlatBuffer files, each worker processes them independently and memory-maps them for zero-copy access. Configuration updates flow through a two-phase commit: new FlatBuffers are first loaded into a staging area and validated, then atomically swapped into production maps. In-flight requests continue using the old configuration, until the last request referencing them completes. The data process recovery is designed to be resilient to different failure modes. A failure or crash at the worker process level has a typical recovery time of less than one second. Since each server has multiple such worker processes which serve customer traffic, this type of crash has no impact on the data plane. In the case of a master process crash, the system automatically tries to recover using the local cache. When the local cache is reused, the system is able to recover quickly – in approximately 60 minutes – since most of the configurations in the cache were already loaded into the data plane before the crash. However, in certain cases if the cache becomes unavailable or must be invalidated because of corruption, the recovery time increases significantly. During the October 29 th incident, a data plane crash triggered a complete recovery sequence that took approximately 4.5 hours. This was not because restarting a process is slow, it is because a defect in the recovery process invalidated the local cache, which meant that “restart” meant rebuilding everything from scratch. The configuration translator process then had to re-fetch and re-translate every one of the hundreds of thousands of customer configurations, before workers could memory-map them and begin serving traffic. This experience has crystallized three fundamental learnings related to our recovery path: Expensive rework: A subset of crashes discarded all previously translated FlatBuffer artifacts, forcing the configuration translator process to repeat hours of conversion work that had already been validated and stored. High restart costs: Every worker on every node had to wait for the configuration translator process to complete the full translation, before it could memory-map any configuration and begin serving requests. Unbounded recovery time: Recovery time grew linearly with total tenant footprint rather than with active traffic, creating a ‘scale penalty’ as more tenants onboarded to the system. Separately and together, the insight was clear: recovery must stop being proportional to the total configuration size. Persisting ‘validated configurations’ across restarts One of the key recovery improvements was strengthening how validated customer configurations are cached and reused across failures, rather than rebuilding configuration states from scratch during recovery. Azure Front Door already cached customer configurations on host‑mounted storage prior to the October incident. The platform enhancements post outage focused on making the local configuration cache resilient to crashes, partial failures, and bad tenant inputs. Our goal was to ensure that recovery behavior is dominated by serving traffic safely, not by reconstructing configuration state. This led us to two explicit design goals… Design goals No category of crash should invalidate the configuration cache: Configuration cache invalidation must never be the default response to failures. Whether the failure is a worker crash, master crash, data plane restart, or coordinated recovery action, previously validated customer configurations should remain usable—unless there is a proven reason to discard it. Bad tenant configuration must not poison the entire cache: A single faulty or incompatible tenant configuration should result in targeted eviction of that tenant’s configuration only—not wholesale cache invalidation across all tenants. Platform enhancements Previously, customer configurations persisted to host‑mounted storage, but certain failure paths treated the cache as unsafe and invalidated it entirely. In those cases, recovery implicitly meant reloading and reprocessing configuration for hundreds of thousands of tenants before traffic could resume, even though the vast majority of cached data was still valid. We changed the recovery model to avoid invalidating customer configurations, with strict scoping around when and how cached entries are discarded: Cached configurations are no longer invalidated based on crash type. Failures are assumed to be orthogonal to configuration correctness unless explicitly proven otherwise. Cache eviction is granular and tenant‑scoped. If a cached configuration fails validation or load checks, only that tenant’s configuration is discarded and reloaded. All other tenant configurations remain available. This ensures that recovery does not regress into a fleet‑wide rebuild due to localized or unrelated faults. Safety and correctness Durability is paired with strong correctness controls, to prevent unsafe configurations from being served: Per‑tenant validation on load: Each cached tenant configuration is validated during the ‘load and verification’ phase, before being promoted for traffic serving. Therefore, failures are contained to that tenant. Targeted re‑translation: When validation fails, only the affected tenant’s configuration is reloaded or reprocessed. Therefore, the cache for other tenants is left untouched. Operational escape hatch: Operators retain the ability to explicitly instruct a clean rebuild of the configuration cache (with proper authorization), preserving control without compromising the default fast‑recovery path. Resulting behavior With these changes, recovery behavior now aligns with real‑world traffic patterns - configuration defects impact tenants locally and predictably, rather than globally. The system now prefers isolated tenant impact, and continued service using last-known-good over aggressive invalidation, both of which are critical for predictable recovery at the scale of Azure Front Door. Making recovery scale with active traffic, not total tenants Reusing configuration cache solves the problem of rebuilding configuration in its entirety, but even with a warm cache, the original startup path had a second bottleneck: eagerly loading a large volume of tenant configurations into memory before serving any traffic. At our scale, memory-mapping, parsing hundreds of thousands of FlatBuffers, constructing internal lookup maps, adding Transport Layer Security (TLS) certificates and configuration blocks for each tenant, collectively added almost an hour to startup time. This was the case even when a majority of those tenants had no active traffic at that moment. We addressed this by fundamentally changing when configuration is loaded into workers. Rather than eagerly loading most of the tenants at startup across all edge locations, Azure Front Door now uses an Machine Learning (ML)-optimized lazy loading model. In the new architecture, instead of loading a large number of tenant configurations, we only load a small subset of tenants that are known to be historically active in a given site, we call this the “warm tenants” list. The warm tenants list per edge site is created through a sophisticated traffic analysis pipeline that leverages ML. However, loading the warm tenants is not good enough, because when a request arrives and we don’t have the configuration in memory, we need to know two things. Firstly, is this a request from a real Azure Front Door tenant – and, if it is, where can I find the configuration? To answer these questions, each worker maintains a hostmap that tracks the state of each tenant’s configuration. This hostmap is constructed during startup, as we process each tenant configuration – if the tenant is in the warm list, we will process and load their configuration fully; if not, then we will just add an entry into the hostmap where all their domain names are mapped to the configuration path location. When a request arrives for one of these tenants, the worker loads and validates that tenant’s configuration on demand, and immediately begins serving traffic. This allows a node to start serving its busiest tenants within a few minutes of startup, while additional tenants are loaded incrementally only when traffic actually arrives—allowing the system to progressively absorb cold tenants as demand increases. The effect on recovery is transformative. Instead of recovery time scaling with the total number of tenants configured on a server, it scales with the number of tenants actively receiving traffic. In practice, even at our busiest edge sites, the active tenant set is a small fraction of the total. Just as importantly, this modified form of lazy loading provides a natural failure isolation boundary. Most Edge sites won’t ever load a faulty configuration of an inactive tenant. When a request for an inactive tenant with an incompatible configuration arrives, impact is contained to a single worker. The configuration load architecture now prefers serving as many customers as quickly as possible, rather than waiting until everything is ready before serving anyone. The above changes are slated to complete in April 2026 and will bring our RTO from the current ~1 hour to under 10 minutes – for complete recovery from a worst case scenario. Continuous validation through Game Days A critical element of our recovery confidence comes from GameDay fault-injection testing. We don’t simply design recovery mechanisms and assume they work—we break the system deliberately and observe how it responds. Since late 2025, we have conducted recurring GameDay drills that simulate the exact failure scenarios we are defending against: Food Taster crash scenarios: Injecting deliberately faulty tenant configurations, to verify that they are caught and isolated with zero impact on live traffic. In our January 2026 GameDay, the Food Taster process crashed as expected, the system halted the update within approximately 5 seconds, and no customer traffic was affected. Master process crash scenarios: Triggering master process crashes across test environments to verify that workers continue serving traffic, that the Local Config Shield engages within 10 seconds, and that the coordinated recovery tool restores full operation within the expected timeframe. Multi-region failure drills: Simulating simultaneous failures across multiple regions to validate that global Config Shield mechanisms engage correctly, and that recovery procedures scale without requiring manual per-region intervention. Fallback test drills for critical Azure services running behind Azure Front Door: In our February 2026 GameDay, we simulated the complete unavailability of Azure Front Door, and successfully validated failover for critical Azure services with no impact to traffic. These drills have both surfaced corner cases and built operational confidence. They have transformed recovery from a theoretical plan into tested, repeatable muscle memory. As we noted in an internal communication to our team: “Game day testing is a deliberate shift from assuming resilience to actively proving it—turning reliability into an observed and repeatable outcome.” Closing Part 1 of this series emphasized preventing unsafe configurations from reaching the data plane, and data plane resiliency in case an incompatible configuration reaches production. This post has shown that prevention alone is not enough—when failures do occur, recovery must be fast, predictable, and bounded. By ensuring that the FlatBuffer cache is never invalidated, by loading only active tenants, and by building safe coordinated recovery tooling, we have transformed failure handling from a fleet-wide crisis into a controlled operation. These recovery investments work in concert with the prevention mechanisms described in Part 1. Together, they ensure that the path from incident detection to full service restoration is measured in minutes, with customer traffic protected at every step. In the next post of this series, we will cover the third pillar of our resiliency strategy: tenant isolation—how micro-cellular architecture and ingress-layered sharding can reduce the blast radius of any failure to a small subset, ensuring that one customer’s configuration or traffic anomaly never becomes everyone’s problem. We deeply value our customers’ trust in Azure Front Door. We are committed to transparently sharing our progress on these resiliency investments, and to exceed expectations for safety, reliability, and operational readiness.2.3KViews4likes0CommentsAzure Networking Portfolio Consolidation
Overview Over the past decade, Azure Networking has expanded rapidly, bringing incredible tools and capabilities to help customers build, connect, and secure their cloud infrastructure. But we've also heard strong feedback: with over 40 different products, it hasn't always been easy to navigate and find the right solution. The complexity often led to confusion, slower onboarding, and missed capabilities. That's why we're excited to introduce a more focused, streamlined, and intuitive experience across Azure.com, the Azure portal, and our documentation pivoting around four core networking scenarios: Network foundations: Network foundations provide the core connectivity for your resources, using Virtual Network, Private Link, and DNS to build the foundation for your Azure network. Try it with this link: Network foundations Hybrid connectivity: Hybrid connectivity securely connects on-premises, private, and public cloud environments, enabling seamless integration, global availability, and end-to-end visibility, presenting major opportunities as organizations advance their cloud transformation. Try it with this link: Hybrid connectivity Load balancing and content delivery: Load balancing and content delivery helps you choose the right option to ensure your applications are fast, reliable, and tailored to your business needs. Try it with this link: Load balancing and content delivery Network security: Securing your environment is just as essential as building and connecting it. The Network Security hub brings together Azure Firewall, DDoS Protection, and Web Application Firewall (WAF) to provide a centralized, unified approach to cloud protection. With unified controls, it helps you manage security more efficiently and strengthen your security posture. Try it with this link: Network security This new structure makes it easier to discover the right networking services and get started with just a few clicks so you can focus more on building, and less on searching. What you’ll notice: Clearer starting points: Azure Networking is now organized around four core scenarios and twelve essential services, reflecting the most common customer needs. Additional services are presented within the context of these scenarios, helping you stay focused and find the right solution without feeling overwhelmed. Simplified choices: We’ve merged overlapping or closely related services to reduce redundancy. That means fewer, more meaningful options that are easier to evaluate and act on. Sunsetting outdated services: To reduce clutter and improve clarity, we’re sunsetting underused offerings such as white-label CDN services and China CDN. These capabilities have been rolled into newer, more robust services, so you can focus on what’s current and supported. What this means for you Faster decision-making: With clearer guidance and fewer overlapping products, it's easier to discover what you need and move forward confidently. More productive sales conversations: With this simplified approach, you’ll get more focused recommendations and less confusion among sellers. Better product experience: This update makes the Azure Networking portfolio more cohesive and consistent, helping you get started quickly, stay aligned with best practices, and unlock more value from day one. The portfolio consolidation initiative is a strategic effort to simplify and enhance the Azure Networking portfolio, ensuring better alignment with customer needs and industry best practices. By focusing on top-line services, combining related products, and retiring outdated offerings, Azure Networking aims to provide a more cohesive and efficient product experience. Azure.com Before: Our original Solution page on Azure.com was disorganized and static, displaying a small portion of services in no discernable order. After: The revised solution page is now dynamic, allowing customers to click deeper into each networking and network security category, displaying the top line services, simplifying the customer experience. Azure Portal Before: With over 40 networking services available, we know it can feel overwhelming to figure out what’s right for you and where to get started. After: To make it easier, we've introduced four streamlined networking hubs each built around a specific scenario to help you quickly identify the services that match your needs. Each offers an overview to set the stage, key services to help you get started, guidance to support decision-making, and a streamlined left-hand navigation for easy access to all services and features. Documentation For documentation, we looked at our current assets as well as created new assets that aligned with the changes in the portal experience. Like Azure.com, we found the old experiences were disorganized and not well aligned. We updated our assets to focus on our top-line networking services, and to call out the pillars. Our belief is these changes will allow our customers to more easily find the relevant and important information they need for their Azure infrastructure. Azure Network Hub Before the updates, we had a hub page organized around different categories and not well laid out. In the updated hub page, we provided relevant links for top-line services within all of the Azure networking scenarios, as well as a section linking to each scenario's hub page. Scenario Hub pages We added scenario hub pages for each of the scenarios. This provides our customers with a central hub for information about the top-line services for each scenario and how to get started. Also, we included common scenarios and use cases for each scenario, along with references for deeper learning across the Azure Architecture Center, Well Architected Framework, and Cloud Adoption Framework libraries. Scenario Overview articles We created new overview articles for each scenario. These articles were designed to provide customers with an introduction to the services included in each scenario, guidance on choosing the right solutions, and an introduction to the new portal experience. Here's the Load balancing and content delivery overview: Documentation links Azure Networking hub page: Azure networking documentation | Microsoft Learn Scenario Hub pages: Azure load balancing and content delivery | Microsoft Learn Azure network foundation documentation | Microsoft Learn Azure hybrid connectivity documentation | Microsoft Learn Azure network security documentation | Microsoft Learn Scenario Overview pages What is load balancing and content delivery? | Microsoft Learn Azure Network Foundation Services Overview | Microsoft Learn What is hybrid connectivity? | Microsoft Learn What is Azure network security? | Microsoft Lea Improving user experience is a journey and in coming months we plan to do more on this. Watch out for more blogs over the next few months for further improvements.3.2KViews4likes0CommentsUnlock enterprise AI/ML with confidence: Azure Application Gateway as your scalable AI access layer
As enterprises accelerate their adoption of generative AI and machine learning to transform operations, enhance productivity, and deliver smarter customer experiences, Microsoft Azure has emerged as a leading platform for hosting and scaling intelligent applications. With offerings like Azure OpenAI, Azure Machine Learning, and Cognitive Services, organizations are building copilots, virtual agents, recommendation engines, and advanced analytics platforms that push the boundaries of what is possible. However, scaling these applications to serve global users introduces new complexities: latency, traffic bursts, backend rate limits, quota distribution, and regional failovers must all be managed effectively to ensure seamless user experiences and resilient architectures. Azure Application Gateway: The AI access layer Azure Application Gateway plays a foundational role in enabling AI/ML at scale by acting as a high-performance Layer 7 reverse proxy—built to intelligently route, protect, and optimize traffic between clients and AI services. Hundreds of enterprise customers are already using Azure Application Gateway to efficiently manage traffic across diverse Azure-hosted AI/ML models—ensuring uptime, performance, and security at global scale. The AI delivery challenge Inferencing against AI/ML backends is more than connecting to a service. It is about doing so: Reliably: across regions, regardless of load conditions Securely: protecting access from bad actors and abusive patterns Efficiently: minimizing latency and request cost Scalable: handling bursts and high concurrency without errors Observably: with real-time insights, diagnostics, and feedback loops for proactive tuning Key features of Azure Application Gateway for AI traffic Smart request distribution: Path-based and round-robin routing across OpenAI and ML endpoints. Built-in health probes: Automatically bypass unhealthy endpoints Security enforcement: With WAF, TLS offload, and mTLS to protect sensitive AI/ML workloads Unified endpoint: Expose a single endpoint for clients; manage complexity internally. Observability: Full diagnostics, logs, and metrics for traffic and routing visibility. Smart rewrite rules: Append path, or rewrite headers per policy. Horizontal scalability: Easily scale to handle surges in demand by distributing load across multiple regions, instances, or models. SSE and real-time streaming: Optimize connection handling and buffering to enable seamless AI response streaming. Azure Web Application Firewall (WAF) Protections for AI/ML Workloads When deploying AI/ML workloads, especially those exposed via APIs, model endpoints, or interactive web apps, security is as critical as performance. A modern WAF helps protect not just the application, but also the sensitive models, training data, and inference pipelines behind it. Core Protections: SQL injection – Prevents malicious database queries targeting training datasets, metadata stores, or experiment tracking systems. Cross-site scripting (XSS) – Blocks injected scripts that could compromise AI dashboards, model monitoring tools, or annotation platforms. Malformed payloads – Stops corrupted or adversarial crafted inputs designed to break parsing logic or exploit model pre/post-processing pipelines. Bot protections – Bot Protection Rule Set detects & blocks known malicious bot patterns (credential stuffing, password spraying). Block traffic based on request body size, HTTP headers, IP addresses, or geolocation to prevent oversized payloads or region-specific attacks on model APIs. Enforce header requirements to ensure only authorized clients can access model inference or fine-tuning endpoints. Rate limiting based on IP, headers, or user agent to prevent inference overloads, cost spikes, or denial of service against AI models. By integrating these WAF protections, AI/ML workloads can be shielded from both conventional web threats and emerging AI-specific attack vectors, ensuring models remain accurate, reliable, and secure. Architecture Real-world architectures with Azure Application Gateway Industries across sectors rely on Azure Application Gateway to securely expose AI and ML workloads: Healthcare → Protecting patient-facing copilots and clinical decision support tools with HIPAA-compliant routing, private inference endpoints, and strict access control. Finance → Safeguarding trading assistants, fraud-detection APIs, and customer chatbots with enterprise WAF rules, rate limiting, and region-specific compliance. Retail & eCommerce → Defending product recommendation engines, conversational shopping copilots, and personalization APIs from scraping and automated abuse. Manufacturing & industrial IoT → Securing AI-driven quality control, predictive maintenance APIs, and digital twin interfaces with private routing and bot protection. Education → Hosting learning copilots and tutoring assistants safely behind WAF, preventing misuse while scaling access for students and researchers. Public sector & government → Enforcing FIPS-compliant TLS, private routing, and zero-trust controls for citizen services and AI-powered case management. Telecommunications & media → Protecting inference endpoints powering real-time translation, content moderation, and media recommendations at scale. Energy & utilities → Safeguarding smart grid analytics, sustainability dashboards, and AI-powered forecasting models through secure gateway routing. Advanced integrations Position Azure Application Gateway as the secure, scalable network entry point to your AI infrastructure Private-only Azure Application Gateway: Host AI endpoints entirely within virtual networks for secure internal access SSE support: Configure HTTP settings for streaming completions via Server-Sent Events Azure Application Gateway+ Azure Functions: Build adaptive policies that reroute traffic based on usage, cost, or time of day Azure Application Gateway + API management to protect OpenAI workloads What’s next: Adaptive AI gateways Microsoft is evolving Azure Application Gateway into a more intelligent, AI aware platform with capabilities such as: Auto rerouting to healthy endpoints or more cost-efficient models. Dynamic token management directly within Azure Application Gateway to optimize AI inference usage. Integrated feedback loops with Azure Monitor and Log Analytics for real-time performance tuning. The goal is to transform Azure Application Gateway from a traditional traffic manager into an adaptive inference orchestrator one that predicts failures, optimizes operational costs, and safeguards AI workloads from misuse. Conclusion Azure Application Gateway is not just a load balancer—it’s becoming a critical enabler for enterprise-grade AI delivery. Today, it delivers smart routing, security enforcement, adaptive observability, and a compliance-ready architecture, enabling organizations to scale AI confidently while safeguarding performance and cost. Looking ahead, Microsoft’s vision includes future capabilities such as quota resiliency to intelligently manage and balance AI usage limits, auto-rerouting to healthy endpoints or more cost-efficient models, dynamic token management within Azure Application Gateway to optimize inference usage, and integrated feedback loops with Azure Monitor and Log Analytics for real-time performance tuning. Together, these advancements will transform Azure Application Gateway from a traditional traffic manager into an adaptive inference orchestrator capable of anticipating failures, optimizing costs, and protecting AI workloads from misuse. If you’re building with Azure OpenAI, Machine Learning, or Cognitive Services, let Azure Application Gateway be your intelligent command center—anticipating needs, adapting in real time, and orchestrating every interaction so your AI can deliver with precision, security, and limitless scale. For more information, please visit: What is Azure Application Gateway v2? | Microsoft Learn What Is Azure Web Application Firewall on Azure Application Gateway? | Microsoft Learn Azure Application Gateway URL-based content routing overview | Microsoft Learn Using Server-sent events with Application Gateway (Preview) | Microsoft Learn AI Architecture Design - Azure Architecture Center | Microsoft Learn860Views4likes0CommentsIntroducing Copilot in Azure for Networking: Your AI-Powered Azure Networking Assistant
As cloud networking grows in complexity, managing and operating these services efficiently can be tedious and time consuming. That’s where Copilot in Azure for Networking steps in, a generative AI tool that simplifies every aspect of network management, making it easier for network administrators to stay on top of their Azure infrastructure. With Copilot, network professionals can design, deploy, and troubleshoot Azure Networking services using a streamlined, AI-powered approach. A Comprehensive Networking Assistant for Azure We’ve designed Copilot to really feel like an intuitive assistant you can talk to just like a colleague. Copilot understands networking-related questions in simple terms and responds with actionable solutions, drawing from Microsoft’s expansive networking knowledge base and the specifics of your unique Azure environment. Think of Copilot as an all-encompassing AI-Powered Azure Networking Assistant. It acts as: Your Cloud Networking Specialist by quickly answering questions about Azure networking services, providing product guidance, and configuration suggestions. Your Cloud Network Architect by helping you select the right network services, architectures, and patterns to connect, secure, and scale your workloads in Azure. Your Cloud Network Engineer by helping you diagnose and troubleshoot network connectivity issues with step-by-step guidance. One of the most powerful features of Copilot in Azure is its ability to automatically diagnose common networking issues. Misconfigurations, connectivity failures, or degraded performance? Copilot can help with step-by-step guidance to resolve these issues quickly with minimal input and assistance from the user, simply ask questions like ”Why can’t my VM connect to the internet?”. As seen above, upon the user identifying the source and destination, Copilot can automatically discover the connectivity path and analyze the state and status of all the network elements in the path to pinpoint issues such as blocked ports, unhealthy network devices, or misconfigured Network Security Groups (NSGs). Technical Deep Dive: Contextualized Responses with Real-Time Insights When users ask a question on the Azure Portal, it gets sent to the Orchestrator. This step is crucial to generating a deep semantic understanding of the user’s question, reasoning over all Azure resources, and then determining that the question requires Network-specific capabilities to be answered. Copilot then collects contextual information based on what the user is looking at and what they have access to before dispatching the question to the relevant domain-specific plugins. Those plugins then use their service-specific capabilities to answer the user’s question. Copilot may even combine information from multiple plugins to provide responses to complex questions. In the case of questions relevant to Azure Networking services, Copilot uses real-time data from sources like diagnostic APIs, user logs, Azure metrics, Azure Resource Graph etc. all while maintaining complete privacy and security and only accessing what the user can access as defined in Azure Role based Access Control (RBAC) to help generate data-driven insights that help keep your network operating smoothly and securely. This information is then used by Copilot to help answer the user’s question via a variety of techniques including but not limited to Retrieval-Augmented Generation (RAG) and grounding. To learn more about how Copilot works, including our Responsible AI commitments, see Copilot in Azure Technical Deep Dive | Microsoft Community Hub. Summary: Key Benefits, Capabilities and Sample Prompts Copilot boosts efficiency by automating routine tasks and offering targeted answers, which saves network administrators time while troubleshooting, configuring and architecting their environments. Copilot also helps organizations reduce costs by minimizing manual work and catching errors while empowering customers to resolve networking issues on their own with AI-powered insights backed by Azure expertise. Copilot is equipped with powerful skills to assist users with network product information and selection, resource inventory and topology, and troubleshooting. For product information, Copilot can answer questions about Azure Networking products by leveraging published documentation, helping users with questions like “What type of Firewall is best suited for my environment?”. It offers tailored guidance for selecting and planning network architectures, including specific services like Azure Load Balancer and Azure Firewall. This guidance also extends to resilience-related questions like “What more can I do to ensure my app gateway is resilient?” involving services such as Azure Application Gateway and Azure Traffic Manager, among others. When it comes to inventory and topology, Copilot can help with questions like “What is the data path between my VM and the internet?” by mapping network resources, visualizing topologies, and tracking traffic paths, providing users with clear topology maps and connectivity graphs. For troubleshooting questions like “Why can’t I connect to my VM from on prem?”, Copilot analyzes both the control plane and data plane, offering diagnostics at the network and individual service levels. By using on-behalf-of RBAC, Copilot maintains secure, authorized access, ensuring users interact only with resources permitted by their access level. Looking Forward: Future Enhancements This is only the first step we are taking toward bringing interactive, generative-AI powered capabilities to Azure Networking services and as it evolves over time, future releases will introduce advanced capabilities. We also acknowledge that today Copilot in preview works better with certain Azure Networking services, and we will continue to onboard more services to the capabilities we are launching today. Some of the more advanced capabilities we are working on include predictive troubleshooting where Copilot will anticipate potential issues before they impact network performance. Network optimization capabilities that suggest ways to optimize your network for better performance, resilience and reliability alongside enhanced security capabilities providing insights into network security and compliance, helping organizations meet regulatory requirements starting with the integration of Security Copilot attack investigation capabilities for Azure Firewall. Conclusion Copilot in Azure for Networking is intended to enhance the overall Azure experience and help network administrators easily manage their Azure Networking services. By combining AI-driven insights with user-friendly interfaces, it empowers networking professionals and users to plan, deploy, and operate their Azure Network. These capabilities are now in preview, see Azure networking capabilities using Microsoft Copilot in Azure (preview) | Microsoft Learn to learn more and get started.4KViews3likes2Comments