azure networking
47 TopicsAzure Front Door: Implementing lessons learned following October outages
Abhishek Tiwari, Vice President of Engineering, Azure Networking Amit Srivastava, Principal PM Manager, Azure Networking Varun Chawla, Partner Director of Engineering Introduction Azure Front Door is Microsoft's advanced edge delivery platform encompassing Content Delivery Network (CDN), global security and traffic distribution into a single unified offering. By using Microsoft's extensive global edge network, Azure Front Door ensures efficient content delivery and advanced security through 210+ global and local points of presence (PoPs) strategically positioned closely to both end users and applications. As the central global entry point from the internet onto customer applications, we power mission critical customer applications as well as many of Microsoft’s internal services. We have a highly distributed resilient architecture, which protects against failures at the server, rack, site and even at the regional level. This resiliency is achieved by the use of our intelligent traffic management layer which monitors failures and load balances traffic at server, rack or edge sites level within the primary ring, supplemented by a secondary-fallback ring which accepts traffic in case of primary traffic overflow or broad regional failures. We also deploy a traffic shield as a terminal safety net to ensure that in the event of a managed or unmanaged edge site going offline, end user traffic continues to flow to the next available edge site. Like any large-scale CDN, we deploy each customer configuration across a globally distributed edge fleet, densely shared with thousands of other tenants. While this architecture enables global scale, it carries the risk that certain incompatible configurations, if not contained, can propagate broadly and quickly which can result in a large blast radius of impact. Here we describe how the two recent service incidents impacting Azure Front Door have reinforced the need to accelerate ongoing investments in hardening our resiliency, and tenant isolation strategy to mitigate likelihood and the scale of impact from this class of risk. October incidents: recap and key learnings Azure Front Door experienced two service incidents; on October 9 th and October 29 th , both with customer-impacting service degradation. On October 9 th : A manual cleanup of stuck tenant metadata bypassed our configuration protection layer, allowing incompatible metadata to propagate beyond our canary edge sites. This metadata was created on October 7 th , from a control-plane defect triggered by a customer configuration change. While the protection system initially blocked the propagation, the manual override operation bypassed our safeguards. This incompatible configuration reached the next stage and activated a latent data-plane defect in a subset of edge sites, causing availability impact primarily across Europe (~6%) and Africa (~16%). You can learn more about this issue in detail at https://aka.ms/AIR/QNBQ-5W8 On October 29 th : A different sequence of configuration changes across two control-plane versions produced incompatible metadata. Because the failure mode in the data-plane was asynchronous, the health checks validations embedded in our protection systems were all passed during the rollout. The incompatible customer configuration metadata successfully propagated globally through a staged rollout and also updated the “last known good” (LKG) snapshot. Following this global rollout, the asynchronous process in data-plane exposed another defect which caused crashes. This impacted connectivity and DNS resolutions for all applications onboarded to our platform. Extended recovery time amplified impact on customer applications and Microsoft services. You can learn more about this issue in detail at https://aka.ms/AIR/YKYN-BWZ We took away a number of clear and actionable lessons from these incidents, which are applicable not just to our service, but to any multi-tenant, high-density, globally distributed system. Configuration resiliency – Valid configuration updates should propagate safely, consistently, and predictably across our global edge, while ensuring that incompatible or erroneous configuration never propagate beyond canary environments. Data plane resiliency - Additionally, configuration processing in the data plane must not cause availability impact to any customer. Tenant isolation – Traditional isolation techniques such as hardware partitioning and virtualization are impractical at edge sites. This requires innovative sharding techniques to ensure single tenant-level isolation – a must-have to reduce potential blast radius. Accelerated and automated recovery time objective (RTO) – System should be able to automatically revert to last known good configuration in an acceptable RTO. In case of a service like Azure Front Door, we deem ~10 mins to be a practical RTO for our hundreds of thousands of customers at every edge site. Post outage, given the severity of impact which allowed an incompatible configuration to propagate globally, we made the difficult decision to temporarily block configuration changes in order to expedite rollout of additional safeguards. Between October 29 th to November 5 th , we prioritized and deployed immediate hardening steps before opening up the configuration change. We are confident that the system is stable, and we are continuing to invest in additional safeguards to further strengthen the platform's resiliency. Learning category Goal Repairs Status Safe customer configuration deployment Incompatible configuration never propagates beyond Canary · Control plane and data plane defect fixes · Forced synchronous configuration processing · Additional stages with extended bake time · Early detection of crash state Completed Data plane resiliency Configuration processing cannot impact data plane availability Manage data-plane lifecycle to prevent outages caused by configuration-processing defects. Completed Isolated work-process in every data plane server to process and load the configuration. January 2026 100% Azure Front Door resiliency posture for Microsoft internal services Microsoft operates an isolated, independent Active/Active fleet with automatic failover for critical Azure services Phase 1: Onboarded critical services batch impacted on Oct 29 th outage running on a day old configuration Completed Phase 2: Automation & hardening of operations, auto-failover and self-management of Azure Front Door onboarding for additional services March 2026 Recovery improvements Data plane crash recovery in under 10 minutes Data plane boot-up time optimized via local cache (~1 hour) Completed Accelerate recovery time < 10 minutes March 2026 Tenant isolation No configuration or traffic regression can impact other tenants Micro cellular Azure Front Door with ingress layered shards June 2026 This blog is the first in a multi-part series on Azure Front Door resiliency. In this blog, we will focus on configuration resiliency—how we are making the configuration pipeline safer and more robust. Subsequent blogs will cover tenant isolation and recovery improvements. How our configuration propagation works Azure Front Door configuration changes can be broadly classified into three distinct categories. Service code & data – these include all aspects of Azure Front Door service like management plane, control plane, data plane, configuration propagation system. Azure Front Door follows a safe deployment practice (SDP) process to roll out newer versions of management, control or data plane over a period of approximately 2-3 weeks. This ensures that any regression in software does not have a global impact. However, latent bugs that escape pre-validation and SDP rollout can remain undetected until a specific combination of customer traffic patterns or configuration changes trigger the issue. Web Application Firewall (WAF) & L7 DDoS platform data – These datasets are used by Azure Front Door to deliver security and load-balancing capabilities. Examples include GeoIP data, malicious attack signatures, and IP reputation signatures. Updates to these datasets occur daily through multiple SDP stages with an extended bake time of over 12 hours to minimize the risk of global impact during rollout. This dataset is shared across all customers and the platform, and it is validated immediately since it does not depend on variations in customer traffic or configuration steps. Customer configuration data – Examples of these are any customer configuration change—whether a routing rule update, backend pool modification, WAF rule change, or security policy change. Due to the nature of these changes, it is expected across the edge delivery / CDN industry to propagate these changes globally in 5-10 mins. Both outages stemmed from issues within this category. All configuration changes, including customer configuration data, are processed through a multi-stage pipeline designed to ensure correctness before global rollout across Azure Front Door’s 200+ edge locations. At a high level, Azure Front Door’s configuration propagation system has two distinct components - Control plane – Accepts customer API/portal changes (create/update/delete for profiles, routes, WAF policies, origins, etc.) and translates them into internal configuration metadata which the data plane can understand. Data plane – Globally distributed edge servers that terminate client traffic, apply routing/WAF logic, and proxy to origins using the configuration produced by the control plane. Between these two halves sits a multi-stage configuration rollout pipeline with a dedicated protection system (known as ConfigShield): Changes flow through multiple stages (pre-canary, canary, expanding waves to production) rather than going global at once. Each stage is health-gated: the data plane must remain within strict error and latency thresholds before proceeding. Each stage’s health check also rechecks previous stage’s health for any regressions. A successfully completed rollout updates a last known good (LKG) snapshot used for automated rollback. Historically, rollout targeted global completion in roughly 5–10 minutes, in line with industry standards. Customer configuration processing in Azure Front Door data plane stack Customer configuration changes in Azure Front Door traverse multiple layers—from the control plane through the deployment system—before being converted into FlatBuffers at each Azure Front Door node. These FlatBuffers are then loaded by the Azure Front Door data plane stack, which runs as Kubernetes pods on every node. FlatBuffer Composition: Each FlatBuffer references several sub-resources such as WAF and Rules Engine schematic files, SSL certificate objects, and URL signing secrets. Data plane architecture: o Master process: Accepts configuration changes (memory-mapped files with references) and manages the lifecycle of worker processes. o Workers: L7 proxy processes that serve customer traffic using the applied configuration. Processing flow for each configuration update: Load and apply in master: The transformed configuration is loaded and applied in the master process. Cleanup of unused references occurs synchronously except for certain categories à October 9 outage occurred during this step due to a crash triggered by incompatible metadata. Apply to workers: Configuration is applied to all worker processes without memory overhead (FlatBuffers are memory-mapped). Serve traffic: Workers start consuming new FlatBuffers for new requests; in-flight requests continue using old buffers. Old buffers are queued for cleanup post-completion. Feedback to deployment service: Positive feedback signals readiness for rollout.Cleanup: FlatBuffers are freed asynchronously by the master process after all workers load updates à October 29 outage occurred during this step due to a latent bug in reference counting logic. The October incidents showed we needed to strengthen key aspects of configuration validation, propagation safeguards, and runtime behavior. During the Azure Front Door incident on October 9 th , that protection system worked as intended but was later bypassed by our engineering team during a manual cleanup operation. During this Azure Front Door incident on October 29 th , the incompatible customer configuration metadata progressed through the protection system, before the delayed asynchronous processing task resulted in the crash. Configuration propagation safeguards Based on learnings from the incidents, we are implementing a comprehensive set of configuration resiliency improvements. These changes aim to guarantee that any sequence of configuration changes cannot trigger instability in the data plane, and to ensure quicker recovery in the event of anomalies. Strengthening configuration generation safety This improvement pivots on a ‘shift-left’ strategy where we want to ensure that we catch regression early before they propagate to production. It also includes fixing the latent defects which were the proximate cause of the outage. Fixing outage specific defects - We have fixed the control-plane defects that could generate incompatible tenant metadata under specific operation sequences. We have also remediated the associated data-plane defects. Stronger cross-version validation - We are expanding our test and validation suite to account for changes across multiple control plane build versions. This is expected to be fully completed by February 2026. Fuzz testing - Automated fuzzing and testing of metadata generation contract between the control plane and the data plane. This allows us to generate an expanded set of invalid/unexpected configuration combinations which might not be achievable by traditional test cases alone. This is expected to be fully completed by February 2026. Preventing incompatible configurations from being propagated This segment of the resiliency strategy strives to ensure that a potentially dangerous configuration change never propagates beyond canary stage. Protection system is “always-on” - Enhancements to operational procedures and tooling prevent bypass in all scenarios (including internal cleanup/maintenance), and any cleanup must flow through the same guarded stages and health checks as standard configuration changes. This is completed. Making rollout behavior more predictable and conservative - Configuration processing in the data plane is now fully synchronous. Every data plane issue due to incompatible meta data can be detected withing 10 seconds at every stage. This is completed. Enhancement to deployment pipeline - Additional stages during roll-out and extended bake time between stages serve as an additional safeguard during configuration propagation. This is completed. Recovery tool improvements now make it easier to revert to any previous version of LKG with a single click. This is completed. These changes significantly improve system safety. Post-outage we have increased the configuration propagation time to approximately 45 minutes. We are working towards reducing configuration propagation time closer to pre-incident levels once additional safeguards covered in the Data plane resiliency section below are completed by mid-January, 2026. Data plane resiliency The data plane recovery was the toughest part of recovery efforts during the October incidents. We must ensure fast recovery as well as resilience to configuration processing related issues for the data plane. To address this, we implemented changes that decouple the data plane from incompatible configuration changes. With these enhancements, the data plane continues operating on the last known good configuration—even if the configuration pipeline safeguards fail to protect as intended. Decoupling data plane from configuration changes Each server’s data plane consists of a master process which accepts configuration changes and manages lifecycle of multiple worker processes which serve customer traffic. One of the critical reasons for the prolonged outage in October was that due to latent defects in the data plane, when presented with a bad configuration the master process crashed. The master is a critical command-and-control process and when it crashes it takes down the entire data plane, in that node. Recovery of the master process involves reloading hundreds of thousands of configurations from scratch and took approximately 4.5 hours. We have since made changes to the system to ensure that even in the event of the master process crash due to any reason - including incompatible configuration data being presented - the workers remain healthy and able to serve traffic. During such an event, the workers would not be able to accept new configuration changes but will continue to serve customer traffic using the last known good configuration. This work is completed. Introducing Food Taster: strengthening config propagation resiliency In our efforts to further strengthen Azure Front Door’s configuration propagation system, we are introducing an additional configuration safeguard known internally as Food Taster which protects the master and worker processes from any configuration change related incidents, thereby ensuring data plane resiliency. The principle is simple: every data-plane server will have a redundant and isolated process – the Food Taster – whose only job is to ingest and process new configuration metadata first and then pass validated configuration changes to active data plane. This redundant worker does not accept any customer traffic. All configuration processing in this Food Taster is fully synchronous. That means we do all parsing, validation, and any expensive or risky work up front, and we do not move on until the Food Taster has either proven the configuration is safe or rejected it. Only when the Food Taster successfully loads the configuration and returns “Config OK” does the master process proceed to load the same config and then instruct the worker processes to do the same. If anything goes wrong in the Food Taster, the failure is contained to that isolated worker; the master and traffic-serving workers never see that invalid configuration. We expect this safeguard to reach production globally in January 2026 timeframe. Introduction of this component will also allow us to return closer to pre-incident level of configuration propagation while ensuring data plane safety. Closing This is the first in a series of planned blogs on Azure Front Door resiliency enhancements. We are continuously improving platform safety and reliability and will transparently share updates through this series. Upcoming posts will cover advancements in tenant isolation and improvements to recovery time objectives (RTO). We deeply value our customers’ trust in Azure Front Door. The October incidents reinforced how critical configuration resiliency is, and we are committed to exceeding industry expectations for safety, reliability, and transparency. By hardening our configuration pipeline, strengthening safety gates, and reinforcing isolation boundaries, we’re making Azure Front Door even more resilient so your applications can be too.6.5KViews14likes5CommentsAzure Networking 2025: Powering cloud innovation and AI at global scale
In 2025, Azure’s networking platform proved itself as the invisible engine driving the cloud’s most transformative innovations. Consider the construction of Microsoft’s new Fairwater AI datacenter in Wisconsin – a 315-acre campus housing hundreds of thousands of GPUs. To operate as one giant AI supercomputer, Fairwater required a single flat, ultra-fast network interconnecting every GPU. Azure’s networking team delivered: the facility’s network fabric links GPUs at 800 Gbps speeds in a non-blocking architecture, enabling 10× the performance of the world’s fastest supercomputer. This feat showcases how fundamental networking is to cloud innovation. Whether it’s uniting massive AI clusters or connecting millions of everyday users, Azure’s globally distributed network is the foundation upon which new breakthroughs are built. In 2025, the surge of AI workloads, data-driven applications, and hybrid cloud adoption put unprecedented demands on this foundation. We responded with bold network investments and innovations. Each new networking feature delivered in 2025, from smarter routing to faster gateways, was not just a technical upgrade but an innovation enabling customers to achieve more. Recapping the year’s major releases across Azure Networking services and key highlights how AI both drive and benefit from these advancements. Unprecedented connectivity for a hybrid and AI era Hybrid connectivity at scale: Azure’s network enhancements in 2025 focused on making global and hybrid connectivity faster, simpler, and ready for the next wave of AI-driven traffic. For enterprises extending on-premises infrastructure to Azure, Azure ExpressRoute private connectivity saw a major leap in capacity: Microsoft announced support for 400 Gbps ExpressRoute Direct ports (available in 2026) to meet the needs of AI supercomputing and massive data volumes. These high-speed ports – which can be aggregated into multi-terabit links – ensure that even the largest enterprises or HPC clusters can transfer data to Azure with dedicated, low-latency links. In parallel, Azure VPN Gateway performance reached new highs, with a generally available upgrade that delivers up to 20 Gbps aggregate throughput per gateway and 5 Gbps per individual tunnel. This is a 3× increase over previous limits, enabling branch offices and remote sites to connect to Azure even more seamlessly without bandwidth bottlenecks. Together, the ExpressRoute and VPN improvements give customers a spectrum of high-performance options for hybrid networking – from offices and datacenters to the cloud – supporting scenarios like large-scale data migrations, resilient multi-site architectures, and hybrid AI processing. Simplified global networking: Azure Virtual WAN (vWAN) continued to mature as the one-stop solution for managing global connectivity. Virtual WAN introduced forced tunneling for Secure Virtual Hubs (now in preview), which allows organizations to route all Internet-bound traffic from branch offices or virtual networks back to a central hub for inspection. This capability simplifies the implementation of a “backhaul to hub” security model – for example, forcing branches to use a central firewall or security appliance – without complex user-defined routing. Empowering multicloud and NVA integration: Azure recognizes that enterprise networks are diverse. Azure Route Server improvements enhanced interoperability with customer equipment and third-party network virtual appliances (NVAs). Notably, Azure Route Server now supports up to 500 virtual network connections (spokes) per route server, a significant scale boost that enables larger hub-and-spoke topologies and simplified Border Gateway Protocol (BGP) route exchange even in very large environments. This helps customers using SD-WAN appliances or custom firewalls in Azure to seamlessly learn routes from hundreds of VNet spokes – maintaining central routing control without manual configuration. Additionally, Azure Route Server introduced a preview of hub routing preference, giving admins the ability to influence BGP route selection (for example, preferring ExpressRoute over a VPN path, or vice versa). This fine-grained control means hybrid networks can be tuned for optimal performance and cost. Resilience and reliability by design Azure’s growth has been underpinned by making the network “resilient by default.” We shipped tools to help validate and improve network resiliency. ExpressRoute Resiliency Insights was released for general availability – delivering an intelligent assessment of an enterprise’s ExpressRoute setup. This feature evaluates how well your ExpressRoute circuits and gateways are architected for high availability (for example, using dual circuits in diverse locations, zone-redundant gateways, etc.) and assigns a resiliency index score as a percentage. It will highlight suboptimal configurations – such as routes advertised on only one circuit, or a gateway that isn’t zone-redundant – and provide recommendations for improvement. Moreover, Resiliency Insights includes a failover simulation tool that can test circuit redundancy by mimicking failures, so you can verify that your connections will survive real-world incidents. By proactively monitoring and testing resilience, Azure is helping customers achieve “always-on” connectivity even in the face of fiber cuts, hardware faults, or other disruptions. Security, governance, and trust in the network As enterprises entrust more core business to Azure, the platform’s networking services advanced on security and governance – helping customers achieve Zero Trust networks and high compliance with minimal complexity. Azure DNS now offers DNS Security Policies with Threat Intelligence feeds (GA). This capability allows organizations to protect their DNS queries from known malicious domains by leveraging continuously updated threat intel. For example, if a known phishing domain or C2 (command-and-control) hostname appears in DNS queries from your environment, Azure DNS can automatically block or redirect those requests. Because DNS is often the first line of detection for malware and phishing activities, this built-in filtering provides a powerful layer of defense that’s fully managed by Azure. It’s essentially a cloud-delivered DNS firewall using Microsoft’s vast threat intelligence – enabling all Azure customers to benefit from enterprise-grade security without deploying additional appliances. Network traffic governance was another focus. The introduction of forced tunneling in Azure Virtual WAN hubs (preview) shared above is a prime example where networking meets security compliance. Optimizing cloud-native and edge networks We previewed DNS intelligent traffic control features – such as filtering DNS queries to prevent data exfiltration and applying flexible recursion policies – which complement the DNS Security offering in safeguarding name resolution. Meanwhile, for load balancing across regions, Azure Traffic Manager’s behind-the-scenes upgrades (as noted earlier) improved reliability, and it’s evolving to integrate with modern container-based apps and edge scenarios. AI-powered networking: Both enabling and enabled by AI We are infusing AI into networking to make management and troubleshooting more intelligent. Networking functionality in Azure Copilot accelerates tasks like never before: it outlines the best practices instantly and troubleshooting that once required combing through docs and logs can be conversational. It effectively democratizes networking expertise, helping even smaller IT teams manage sophisticated networks by leveraging AI recommendations. The future of cloud networking in an AI world As we close out 2025, one message is clear: networking is strategic. The network is no longer a static utility – it is the adaptive circulatory system of the cloud, determining how far and fast customers can go. By delivering higher speeds, greater reliability, tighter security, and easier management, Azure Networking has empowered businesses to connect everything to anything, anywhere – securely and at scale. These advances unlock new scenarios: global supply chains running in real-time over a trusted network, multi-player AR/VR and gaming experiences delivered without lag, and AI models trained across continents. Looking ahead, AI-powered networking will become the norm. The convergence of AI and network tech means we will see more self-optimizing networks that can heal, defend, and tune themselves with minimal human intervention.547Views3likes0CommentsNetwork Detection and Response (NDR) in Financial Services
New PCI DSS v4.0.1 requirements heighten the need for automated monitoring and analysis of security logs. Network Detection and Response solutions fulfill these mandates by providing 24/7 network traffic inspection and real-time alerting on suspicious activities. Azure’s native tools (Azure vTAP for full packets, VNET Flow Logs for all flows) capture rich network data and integrate with advanced NDR analytics from partners. This combination detects intrusions (satisfying IDS requirements under Requirement 11), validates network segmentation (for scope reduction under Req. 1), and feeds alerts into Microsoft Sentinel for rapid response (fulfilling incident response obligations in Req. 12). The result is a cloud architecture that not only meets PCI DSS controls but actively strengthens security.392Views0likes0CommentsIntroducing eBPF Host Routing: High performance AI networking with Azure CNI powered by Cilium
AI-driven applications demand low-latency workloads for optimal user experience. To meet this need, services are moving to containerized environments, with Kubernetes as the standard. Kubernetes networking relies on the Container Network Interface (CNI) for pod connectivity and routing. Traditional CNI implementations use iptables for packet processing, adding latency and reducing throughput. Azure CNI powered by Cilium natively integrates Azure Kubernetes service (AKS) data plane with Azure CNI networking modes for superior performance, hardware offload support, and enterprise-grade reliability. Azure CNI powered by Cilium delivers up to 30% higher throughput in both benchmark and real-world customer tests compared to a bring-your-own Cilium setup on AKS. The next leap forward: Now, AKS data plane performance can be optimized even further with eBPF host routing, which is an open-source Cilium CNI capability that accelerates packet forwarding by executing routing logic directly in eBPF. As shown in the figure, this architecture eliminates reliance on iptables and connection tracking (conntrack) within the host network namespace. As a result, significantly improving packet processing efficiency, reducing CPU overhead and optimized performance for modern workloads. Comparison of host routing using the Linux kernel stack vs eBPF Azure CNI powered by Cilium is battle-tested for mission-critical workloads, backed by Microsoft support, and enriched with Advanced Container Networking Services features for security, observability, and accelerated performance. eBPF host routing is now included as part of Advanced Container Networking Services suite, delivering network performance acceleration. In this blog, we highlight the performance benefits of eBPF host routing, explain how to enable it in an AKS cluster, and provide a deep dive into its implementation on Azure. We start by examining AKS cluster performance before and after enabling eBPF host routing. Performance comparison Our comparative benchmarks measure the difference in Azure CNI Powered by Cilium, by enabling eBPF host routing. To perform these measurements, we use AKS clusters on K8s version 1.33, with host nodes of 16 cores, running Ubuntu 24.04. We are interested in throughput and latency numbers for pod-to-pod traffic in these clusters. For throughput measurements, we deploy netperf client and server pods, and measure TCP_STREAM throughput at varying message sizes in tests running 20 seconds each. The wide range of message sizes are meant to capture the variety of workloads running on AKS clusters, ranging from AI training and inference to messaging systems and media streaming. For latency, we run TCP_RR tests, measuring latency at various percentiles, as well as transaction rates. The following figure compares pods on the same node; eBPF-based routing results in a dramatic improvement in throughput (~30%). This is because, on the same node, the throughput is not constrained by factors such as the VM NIC limits and is almost entirely determined by host routing performance. For pod-to-pod throughput across different nodes in the cluster. eBPF host routing results in better pod-to-pod throughput across nodes, and the difference is more pronounced with smaller message sizes (3x more). This is because, with smaller messages, the per-message overhead incurred in the host network stack has a bigger impact on performance. Next, we compare latency for pod-to-pod traffic. We limit this benchmark to intra-node traffic, because cross-node traffic latency is determined by factors other than the routing latency incurred in the hosts. eBPF host routing results in reduced latency compared to the non-accelerated configuration at all measured percentiles. We have also measured the transaction rate between client and server pods, with and without eBPF host routing. This benchmark is an alternative measurement of latency because a transaction is essentially a small TCP request/response pair. We observe that eBPF host routing improves transactions per second by around 27% as compared to legacy host routing. Transactions/second (same node) Azure CNI configuration Transactions/second eBPF host routing 20396.9 Traditional host routing 16003.7 Enabling eBPF routing through Advanced Container Networking Services eBPF host routing is disabled by default in Advanced Container Networking Services because bypassing iptables in the host network namespace can ignore custom user rules and host-level security policies. This may lead to visible failures such as dropped traffic or broken network policies, as well as silent issues like unintended access or missed audit logs. To mitigate these risks, eBPF host routing is offered as an opt-in feature, enabled through Advanced Container Networking Services on Azure CNI powered by Cilium. The Advanced Container Networking Services advantage: Built-in safeguards: Enabling eBPF Host Routing in ACNS enhances the open-source offering with strong built-in safeguards. Before activation, ACNS validates existing iptables rules in the host network namespace and blocks enablement if user-defined rules are detected. Once enabled, kernel-level protections prevent new iptables rules and generate Kubernetes events for visibility. These measures allow customers to benefit from eBPF’s performance gains while maintaining security and reliability. Thanks to the additional safeguards, eBPF host routing in Advanced Container Networking Services is a safer and more robust option for customers who wish to obtain the best possible networking performance on their Kubernetes infrastructure. How to enable eBPF Host Routing with ACNS Visit the documentation on how to enable eBPF Host Routing for new and existing Azure CNI Powered by Cilium clusters. Verify the network profile with the new performance `accelerationMode`field set to `BpfVeth`. "networkProfile": { "advancedNetworking": { "enabled": true, "performance": { "accelerationMode": "BpfVeth" }, … For more information on Advanced Container Networking Services and ACNS Performance, please visit https://aka.ms/acnsperformance. Resources For more info about Advanced Container Networking Services please visit (Container Network Security with Advanced Container Networking Services (ACNS) - Azure Kubernetes Service | Microsoft Learn). For more info about Azure CNI Powered by Cilium please visit (Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn).864Views1like2CommentsSimplify container network metrics filtering in Azure Container Networking Services for AKS
We’re excited to introduce container network metrics filtering in Azure Container Networking Services for Azure Kubernetes Service (AKS) is now in public preview! This capability transforms how you manage network observability in Kubernetes clusters by giving you control over what metrics matter most. Why excessive metrics are a problem (and how we’re fixing it) In today’s large-scale, microservices-driven environments, teams often face metrics bloat, collecting far more data than they need. The result? High storage & ingestion costs: Paying for data you’ll never use. Cluttered dashboards: Hunting for critical latency spikes in a sea of irrelevant pod restarts. Operational overhead: Slower queries, higher maintenance, and fatigue. Our new filtering capability solves this by letting you define precise filters at the pod level using standard Kubernetes custom resources. You collect only what matters, before it ever reaches your monitoring stack. Key Benefits: Signal Over Noise Benefit Your Gain Fine-grained control Filter by namespace or pod label. Target critical services and ignore noise. Cost optimization Reduce ingestion costs for Prometheus, Grafana, and other tools. Improved observability Cleaner dashboards and faster troubleshooting with relevant metrics only. Dynamic & zero-downtime Apply or update filters without restarting Cilium agents or Prometheus. How it works: Filtering at the source Unlike traditional sampling or post-processing, filtering happens at the Cilium agent level—inside the kernel’s data plane. You define filters using the ContainerNetworkMetric custom resource to include or exclude metrics such as: DNS lookups TCP connection metrics Flow metrics Drop (error) metrics This reduces data volume before metrics leave the host, ensuring your observability tools receive only curated, high-value data. Example: Filtering flow metrics to reduce noise Here’s a sample ContainerNetworkMetric CRD that filters only dropped flows from the traffic/http namespace and excludes flows from traffic/fortio pods: apiVersion: acn.azure.com/v1alpha1 kind: ContainerNetworkMetric metadata: name: container-network-metric spec: filters: - metric: flow includeFilters: # Include only DROPPED flows from traffic namespace verdict: - "dropped" from: namespacedPod: - "traffic/http" excludeFilters: # Exclude traffic/fortio flows to reduce noise from: namespacedPod: - "traffic/fortio" Before filtering: After applying filters: Getting started today Ready to simplify your network observability? Enable Advanced Container Networking Services: Make sure Advanced Container Networking Services is enabled on your AKS cluster. Define Your Filter: Apply the ContainerNetworkMetric CRD with your include/exclude rules. Validate: Check your settings via ConfigMap and Cilium agent logs. See the Impact: Watch ingestion costs drop and dashboards become clearer! 👉 Learn more in the Metrics Filtering Guide. Try the public preview today and take control of your container network metrics.341Views0likes0CommentsAnnouncing Azure DNS security policy with Threat Intelligence feed general availability
Azure DNS security policy with Threat Intelligence feed allows early detection and prevention of security incidents on customer Virtual Networks where known malicious domains sourced by Microsoft’s Security Response Center (MSRC) can be blocked from name resolution. Azure DNS security policy with Threat Intelligence feed is being announced to all customers and will have regional availability in all public regions.1.9KViews3likes0CommentsAzure Virtual Network Manager + Azure Virtual WAN
Azure continues to expand its networking capabilities, with Azure Virtual Network Manager and Azure Virtual WAN (vWAN) standing out as two of the most transformative services. When deployed together, they offer the best of both worlds: the operational simplicity of a managed hub architecture combined with the ability for spoke VNets to communicate directly, avoiding additional hub hops and minimizing latency Revisiting the classic hub-and-spoke pattern Element Traditional hub-and-spoke role Hub VNet Centralized network that hosts shared services including firewalls (e.g., Azure Firewall, NVAs), VPN/ExpressRoute gateways, DNS servers, domain controllers, and central route tables for traffic management. Acts as the connectivity and security anchor for all spoke networks. Spoke VNets Host individual application workloads and peer directly to the hub VNet. Traffic flows through the hub for north-south connectivity (to/from on-premises or internet) and cross-spoke communication (east-west traffic between spokes). Benefits • Single enforcement point for security policies and network controls • No duplication of shared services across environments • Simplified routing logic and traffic flow management • Clear network segmentation and isolation between workloads • Cost optimization through centralized resources However, this architecture comes with a trade-off: every spoke-to-spoke packet must route through the hub, introducing additional network hops, increased latency, and potential throughput constraints. How Virtual WAN modernizes that design Virtual WAN replaces a do-it-yourself hub VNet with a fully managed hub service: Managed hubs – Azure owns and operates the hub infrastructure. Automatic route propagation – routes learned once are usable everywhere. Integrated add-ons – Firewalls, VPN, and ExpressRoute ports are first-class citizens. By default, Virtual WAN enables any-to-any routing between spokes. Traffic transits the hub fabric automatically—no configuration required. Why direct spoke mesh? Certain patterns require single-hop connectivity Micro-service meshes that sit in different spokes and exchange chatty RPC calls. Database replication / backups where throughput counts, and hub bandwidth is precious. Dev / Test / Prod spokes that need to sync artifacts quickly yet stay isolated from hub services. Segmentation mandates where a workload must bypass hub inspection for compliance yet still talk to a partner VNet. Benefits Lower latency – the hub detour disappears. Better bandwidth – no hub congestion or firewall throughput cap. Higher resilience – spoke pairs can keep talking even if the hub is under maintenance. The peering explosion problem With pure VNet peering, the math escalates fast: For n spokes you need n × (n-1)/2 links. Ten spokes? 45 peerings. Add four more? Now 91. Each extra peering forces you to: Touch multiple route tables. Update NSG rules to cover the new paths. Repeat every time you add or retire a spoke. Troubleshoot an ever-growing spider web. Where Azure Virtual Network Manager Steps In? Azure Virtual Network Manager introduces Network Groups plus a Mesh connectivity policy: Azure Virtual Network Manager Concept What it gives you Network group A logical container that groups multiple VNets together, allowing you to apply configurations and policies to all members simultaneously Mesh connectivity Automated peering between all VNets in the group, ensuring every member can communicate directly with every other member without manual configuration Declarative config Intent-based approach where you define the desired network state, and Azure Virtual Network Manager handles the implementation and ongoing maintenance Dynamic updates Automatic topology management—when VNets are added to or removed from a group, Azure Virtual Network Manager reconfigures all necessary connections without manual intervention Operational complexity collapses from O(n²) to O(1)—you manage a group, not 100+ individual peerings. A complementary model: Azure Virtual Network Manager mesh inside vWAN Since Azure Virtual Network Manager works on any Azure VNet—including the VNets you already attach to a vWAN hub—you can apply mesh policies on top of your existing managed hub architecture: Spoke VNets join a vWAN hub for branch connectivity, centralized firewalling, or multi-region reach. The same spokes are added to an Azure Virtual Network Manager Network Group with a mesh policy. Azure Virtual Network Manager builds direct peering links between the spokes, while vWAN continues to advertise and learn routes. Result: All VNets still benefit from vWAN’s global routing and on-premises integration. Latency-critical east-west flows now travel the shortest path—one hop—as if the VNets were traditionally peered. Rather than choosing one over the other, organizations can leverage both vWAN and Azure Virtual Network Manager together, as the combination enhances the strengths of each service. Performance illustration Spoke-to-Spoke Communication with Virtual WAN without Azure Virtual Network Manager mesh: Spoke-to-Spoke Communication with Virtual WAN with Azure Virtual Network Manager mesh: Observability & protection NSG flow logs – granular packet logs on every peered VNet. Azure Virtual Network Manager admin rules – org-wide guardrails that trump local NSGs. Azure Monitor + SIEM – route flow logs to Log Analytics, Sentinel, or third-party SIEM for threat detection. Layered design – hub firewalls inspect north-south traffic; NSGs plus admin rules secure east-west flows. Putting it all together Virtual WAN offers fully managed global connectivity, simplifying the integration of branch offices and on-premises infrastructure into your Azure environment. Azure Virtual Network Manager mesh establishes direct communication paths between spoke VNets, making it ideal for workloads requiring high throughput or minimal latency in east-west traffic patterns. When combined, these services provide architects with granular control over traffic routing. Each flow can be directed through hub services when needed or routed directly between spokes for optimal performance—all without re-architecting your network or creating additional management complexity. By pairing Azure Virtual Network Manager’s group-based mesh with VWAN’s managed hubs, you get the best of both worlds: worldwide reach, centralized security, and single-hop performance where it counts.1.7KViews5likes0CommentsDelivering web applications over IPv6
The IPv4 address space pool has been exhausted for some time now, meaning there is no new public address space available for allocation from Internet Registries. The internet continues to run on IPv4 through technical measures such as Network Address Translation (NAT) and Carrier Grade NAT, and reallocation of address space through IPv4 address space trading. IPv6 will ultimately be the dominant network protocol on the internet, as IPv4 life-support mechanisms used by network operators, hosting providers and ISPs will eventually reach the limits of their scalability. Mobile networks are already changing to IPv6-only APNs; reachability of IPv4-only destinations from these mobile network is through 6-4 NAT gateways, which sometimes causes problems. Client uptake of IPv6 is progressing steadily. Google reports 49% of clients connecting to its services over IPv6 globally, with France leading at 80%. IPv6 client access measured by Google: Meanwhile, countries around the world are requiring IPv6 reachability for public web services. Examples are the United States, European Union member states among which the Netherlands and Norway, and India, and Japan. IPv6 adoption per country measured by Google: Entities needing to comply with these mandates are looking at Azure's networking capabilities for solutions. Azure supports IPv6 for both private and public networking, and capabilities have developed and expanded over time. This article discusses strategies to build and deploy IPv6-enabled public, internet-facing applications that are reachable from IPv6(-only) clients. Azure Networking IPv6 capabilities Azure's private networking capabilities center on Virtual Networks (VNETs) and the components that are deployed within. Azure VNETs are IPv4/IPv6 dual stack capable: a VNET must always have IPv4 address space allocated, and can also have IPv6 address space. Virtual machines in a dual stack VNET will have both an IPv4 and an IPv6 address from the VNET range, and can be behind IPv6 capable External- and Internal Load Balancers. VNETs can be connected through VNET peering, which effectively turns the peered VNETs into a single routing domain. It is now possible to peer only the IPv6 address spaces of VNETs, so that the IPv4 space assigned to VNETs can overlap and communication across the peering is over IPv6. The same is true for connectivity to on-premise over ExpressRoute: the Private Peering can be enabled for IPv6 only, so that VNETs in Azure do not have to have unique IPv4 address space assigned, which may be in short supply in an enterprise. Not all internal networking components are IPv6 capable yet. Most notable exceptions are VPN Gateway, Azure Firewall and Virtual WAN; IPv6 compatibility is on the roadmap for these services, but target availability dates have not been communicated. But now let's focus on Azure's externally facing, public, network services. Azure is ready to let customers publish their web applications over IPv6. IPv6 capable externally facing network services include: - Azure Front Door - Application Gateway - External Load Balancer - Public IP addresses and Public IP address prefixes - Azure DNS - Azure DDOS Protection - Traffic Manager - App Service (IPv6 support is in public preview) IPv6 Application Delivery IPv6 Application Delivery refers to the architectures and services that enable your web application to be accessible via IPv6. The goal is to provide an IPv6 address and connectivity for clients, while often continuing to run your application on IPv4 internally. Key benefits of adopting IPv6 in Azure include: ✅ Expanded Client Reach: IPv4-only websites risk being unreachable to IPv6-only networks. By enabling IPv6, you expand your reach into growing mobile and IoT markets that use IPv6 by default. Governments and enterprises increasingly mandate IPv6 support for public-facing services. ✅Address Abundance & No NAT: IPv6 provides a virtually unlimited address pool, mitigating IPv4 exhaustion concerns. This abundance means each service can have its own public IPv6 address, often removing the need for complex NAT schemes. End-to-end addressing can simplify connectivity and troubleshooting. ✅ Dual-Stack Compatibility: Azure supports dual-stack deployments where services listen on both IPv4 and IPv6. This allows a single application instance or endpoint to serve both types of clients seamlessly. Dual-stack ensures you don’t lose any existing IPv4 users while adding IPv6 capability. ✅Performance and Future Services: Some networks and clients might experience better performance over IPv6. Also, being IPv6-ready prepares your architecture for future Azure features and services as IPv6 integration deepens across the platform. General steps to enable IPv6 connectivity for a web application in Azure are: Plan and Enable IPv6 Addressing in Azure: Define an IPv6 address space in your Azure Virtual Network. Azure allows adding IPv6 address space to existing VNETs, making them dual-stack. A /56 segment for the VNET is recommended, /64 segment for subnets are required (Azure requires /64 subnets). If you have existing infrastructure, you might need to create new subnets or migrate resources, especially since older Application Gateway v1 instances cannot simply be “upgraded” to dual-stack. Deploy or Update Frontend Services with IPv6: Choose a suitable Azure service (Application Gateway, External / Global Load Balancer, etc.) and configure it with a public IPv6 address on the frontend. This usually means selecting *Dual Stack* configuration so the service gets both an IPv4 and IPv6 public IP. For instance, when creating an Application Gateway v2, you would specify IP address type: DualStack (IPv4 & IPv6). Azure Front Door by default provides dual-stack capabilities with its global endpoints. Configure Backends and Routing: Usually your backend servers or services will remain on IPv4. At the time of writing this in October 2025, Azure Application Gateway does not support IPv6 for backend pool addresses. This is fine because the frontend terminates the IPv6 network connection from the client, and the backend initiates an IPv4 connection to the backend pool or origin. Ensure that your load balancing rules, listener configurations, and health probes are all set up to route traffic to these backends. Both IPv4 and IPv6 frontend listeners can share the same backend pool. Azure Front Door does support IPv6 origins. Update DNS Records: Publish a DNS AAAA record for your application’s host name, pointing to the new IPv6 address. This step is critical so that IPv6-only clients can discover the IPv6 address of your service. If your service also has an IPv4 address, you will have both A (IPv4) and AAAA (IPv6) records for the same host name. DNS will thus allow clients of either IP family to connect. (In multi-region scenarios using Traffic Manager or Front Door, DNS configuration might be handled through those services as discussed later). Test IPv6 Connectivity: Once set up, test from an IPv6-enabled network or use online tools to ensure the site is reachable via IPv6. Azure’s services like Application Gateway and Front Door will handle the dual-stack routing, but it’s good to verify that content loads on an IPv6-only connection and that SSL certificates, etc., work over IPv6 as they do for IPv4. Next, we explore specific Azure services and architectures for IPv6 web delivery in detail. External Load Balancer - single region Azure External Load Balancer (also known as Public Load Balancer) can be deployed in a single region to provide IPv6 access to applications running on virtual machines or VM scale sets. External Load Balancer acts as a Layer 4 entry point for IPv6 traffic, distributing connections across backend instances. This scenario is ideal when you have stateless applications or services that do not require Layer 7 features like SSL termination or path-based routing. Key IPv6 Features of External Load Balancer: - Dual-Stack Frontend: Standard Load Balancer supports both IPv4 and IPv6 frontends simultaneously. When configured as dual-stack, the load balancer gets two public IP addresses – one IPv4 and one IPv6 – and can distribute traffic from both IP families to the same backend pool. - Zone-Redundant by Default: Standard Load Balancer is zone-redundant by default, providing high availability across Azure Availability Zones within a region without additional configuration. - IPv6 Frontend Availability: IPv6 support in Standard Load Balancer is available in all Azure regions. Basic Load Balancer does not support IPv6, so you must use Standard SKU. - IPv6 Backend Pool Support: While the frontend accepts IPv6 traffic, the load balancer will not translate IPv6 to IPv4. Backend pool members (VMs) must have private IPv6 addresses. You will need to add private IPv6 addressing to your existing VM IPv4-only infrastructure. This is in contrast to Application Gateway, discussed below, which will terminate inbound IPv6 network sessions and connect to the backend-end over IPv4. - Protocol Support: Supports TCP and UDP load balancing over IPv6, making it suitable for web applications and APIs, but also for non-web TCP- or UDP-based services accessed by IPv6-only clients. To set up an IPv6-capable External Load Balancer in one region, follow this high-level process: Enable IPv6 on the Virtual Network: Ensure the VNET where your backend VMs reside has an IPv6 address space. Add a dual-stack address space to the VNET (e.g., add an IPv6 space like 2001:db8:1234::/56 to complement your existing IPv4 space). Configure subnets that are dual-stack, containing both IPv4 and IPv6 prefixes (/64 for IPv6). Create Standard Load Balancer with IPv6 Frontend: In the Azure Portal, create a new Standard Load Balancer. During creation, configure the frontend IP with both IPv4 and IPv6 public IP addresses. Create or select existing Standard SKU public IP resources – one for IPv4 and one for IPv6. Configure Backend Pool: Add your virtual machines or VM scale set instances to the backend pool. Note that your backend instances will need to have private IPv6 addresses, in addition to IPv4 addresses, to receive inbound IPv6 traffic via the load balancer. Set Up Load Balancing Rules: Create load balancing rules that map frontend ports to backend ports. For web applications, typically map port 80 (HTTP) and 443 (HTTPS) from both the IPv4 and IPv6 frontends to the corresponding backend ports. Configure health probes to ensure only healthy instances receive traffic. Configure Network Security Groups: Ensure an NSG is present on the backend VM's subnet, allowing inbound traffic from the internet to the port(s) of the web application. Inbound traffic is "secure by default" meaning that inbound connectivity from internet is blocked unless there is an NSG present that explicitly allows it. DNS Configuration: Create DNS records for your application: an A record pointing to the IPv4 address and an AAAA record pointing to the IPv6 address of the load balancer frontend. Outcome: In this single-region scenario, IPv6-only clients will resolve your application's hostname to an IPv6 address and connect to the External Load Balancer over IPv6. Example: Consider a web application running on a VM (or a VM scale set) behind an External Load Balancer in Sweden Central. The VM runs the Azure Region and Client IP Viewer containerized application exposed on port 80, which displays the region the VM is deployed in and the calling client's IP address. The load balancer's front-end IPv6 address has a DNS name of ipv6webapp-elb-swedencentral.swedencentral.cloudapp.azure.com. When called from a client with an IPv6 address, the application shows its region and the client's address. Limitations & Considerations: - Standard SKU Required: Basic Load Balancer does not support IPv6. You must use Standard Load Balancer. - Layer 4 Only: Unlike Application Gateway, External Load Balancer operates at Layer 4 (transport layer). It cannot perform SSL termination, cookie-based session affinity, or path-based routing. If you need these features, consider Application Gateway instead. - Dual stack IPv4/IPv6 Backend required: Backend pool members must have private IPv6 addresses to receive inbound IPv6 traffic via the load balancer. The load balancer does not translate between the IPv6 frontend and an IPv4 backend. - Outbound Connectivity: If your backend VMs need outbound internet access over IPv6, you need to configure an IPv6 outbound rule. Global Load Balancer - multi-region Azure Global Load Balancer (aka Cross-Region Load Balancer) provides a cloud-native global network load balancing solution for distributing traffic across multiple Azure regions. Unlike DNS-based solutions, Global Load Balancer uses anycast IP addressing to automatically route clients to the nearest healthy regional deployment through Microsoft's global network. Key Features of Global Load Balancer: - Static Anycast Global IP: Global Load Balancer provides a single static public IP address (both IPv4 and IPv6 supported) that is advertised from all Microsoft WAN edge nodes globally. This anycast address ensures clients always connect to the nearest available Microsoft edge node without requiring DNS resolution. - Geo-Proximity Routing: The geo-proximity load-balancing algorithm minimizes latency by directing traffic to the nearest region where the backend is deployed. Unlike DNS-based routing, there's no DNS lookup delay - clients connect directly to the anycast IP and are immediately routed to the best region. - Layer 4 Pass-Through: Global Load Balancer operates as a Layer 4 pass-through network load balancer, preserving the original client IP address (including IPv6 addresses) for backend applications to use in their logic. - Regional Redundancy: If one region fails, traffic is automatically routed to the next closest healthy regional load balancer within seconds, providing instant global failover without DNS propagation delays. Architecture Overview: Global Load Balancer sits in front of multiple regional Standard Load Balancers, each deployed in different Azure regions. Each regional load balancer serves a local deployment of your application with IPv6 frontends. The global load balancer provides a single anycast IP address that clients worldwide can use to access your application, with automatic routing to the nearest healthy region. Multi-Region Deployment Steps: Deploy Regional Load Balancers: Create Standard External Load Balancers in multiple Azure regions (e.g. Sweden Central, East US2). Configure each with dual-stack frontends (IPv4 and IPv6 public IPs) and connect them to regional VM deployments or VM scale sets running your application. Configure Global Frontend IP address: Create a Global tier public IPv6 address for the frontend, in one of the supported Global Load Balancer home regions . This becomes your application's global anycast address. Create Global Load Balancer: Deploy the Global Load Balancer in the same home region. The home region is where the global load balancer resource is deployed - it doesn't affect traffic routing. Add Regional Backends: Configure the backend pool of the Global Load Balancer to include your regional Standard Load Balancers. Each regional load balancer becomes an endpoint in the global backend pool. The global load balancer automatically monitors the health of each regional endpoint. Set Up Load Balancing Rules: Create load balancing rules mapping frontend ports to backend ports. For web applications, typically map port 80 (HTTP) and 443 (HTTPS). The backend port on the global load balancer must match the frontend port of the regional load balancers. Configure Health Probes: Global Load Balancer automatically monitors the health of regional load balancers every 5 seconds. If a regional load balancer's availability drops to 0, it is automatically removed from rotation, and traffic is redirected to other healthy regions. DNS Configuration: Create DNS records pointing to the global load balancer's anycast IP addresses. Create both A (IPv4) and AAAA (IPv6) records for your application's hostname pointing to the global load balancer's static IPs. Outcome: IPv6 clients connecting to your application's hostname will resolve to the global load balancer's anycast IPv6 address. When they connect to this address, the Microsoft global network infrastructure automatically routes their connection to the nearest participating Azure region. The regional load balancer then distributes the traffic across local backend instances. If that region becomes unavailable, subsequent connections are automatically routed to the next nearest healthy region. Example: Our web application, which displays the region it is in, and the calling client's IP address, now runs on VMs behind External Load Balancers in Sweden Central and East US2. The External Load Balancer's front-ends are in the backend pool of a Global Load Balancer, which has a Global tier front-end IPv6 address. The front-end has an FQDN of `ipv6webapp-glb.eastus2.cloudapp.azure.com` (the region designation `eastus2` in the FQDN refers to the Global Load Balancer's "home region", into which the Global tier public IP must be deployed). When called from a client in Europe, Global Load Balancer directs the request to the instance deployed in Sweden Central. When called from a client in the US, Global Load Balancer directs the request to the instance deployed in US East 2. Features: - Client IP Preservation: The original IPv6 client address is preserved and available to backend applications, enabling IP-based logic and compliance requirements. - Floating IP Support: Configure floating IP at the global level for advanced networking scenarios requiring direct server return or high availability clustering. - Instant Scaling: Add or remove regional deployments behind the global endpoint without service interruption, enabling dynamic scaling for traffic events. - Multiple Protocol Support: Supports both TCP and UDP traffic distribution across regions, suitable for various application types beyond web services. Limitations & Considerations: - Home Region Requirement: Global Load Balancer can only be deployed in specific home regions, though this doesn't affect traffic routing performance. - Public Frontend Only: Global Load Balancer currently supports only public frontends - internal/private global load balancing is not available. - Standard Load Balancer Backends: Backend pool can only contain Standard Load Balancers, not Basic Load Balancers or other resource types. - Same IP Version Requirement: NAT64 translation isn't supported - frontend and backend must use the same IP version (IPv4 or IPv6). - Port Consistency: Backend port on global load balancer must match the frontend port of regional load balancers for proper traffic flow. - Health Probe Dependencies: Regional load balancers must have proper health probes configured for the global load balancer to accurately assess regional health. Comparison with DNS-Based Solutions: Unlike Traffic Manager or other DNS-based global load balancing solutions, Global Load Balancer provides: - Instant Failover: No DNS TTL delays - failover happens within seconds at the network level. - True Anycast: Single IP address that works globally without client-side DNS resolution. - Consistent Performance: Geo-proximity routing through Microsoft's backbone network ensures optimal paths. - Simplified Management: No DNS record management or TTL considerations. This architecture delivers global high availability and optimal performance for IPv6 applications through anycast routing, making it a good solution for latency-sensitive applications requiring worldwide accessibility with near-instant regional failover. Application Gateway - single region Azure Application Gateway can be deployed in a single region to provide IPv6 access to applications in that region. Application Gateway acts as the entry point for IPv6 traffic, terminating HTTP/S from IPv6 clients and forwarding to backend servers over IPv4. This scenario works well when your web application is served from one Azure region and you want to enable IPv6 connectivity for it. Key IPv6 Features of Application Gateway (v2 SKU): - Dual-Stack Frontend: Application Gateway v2 supports both [IPv4 and IPv6 frontends](https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-faq). When configured as dual-stack, the gateway gets two IP addresses – one IPv4 and one IPv6 – and can listen on both. (IPv6-only is not supported; IPv4 is always paired). IPv6 support requires Application Gateway v2, v1 does not support IPv6. - No IPv6 on Backends: The backend pool must use IPv4 addresses. IPv6 addresses for backend servers are currently not supported. This means your web servers can remain on IPv4 internal addresses, simplifying adoption because you only enable IPv6 on the frontend. - WAF Support: The Application Gateway Web Application Firewall (WAF) will inspect IPv6 client traffic just as it does IPv4. Single Region Deployment Steps: To set up an IPv6-capable Application Gateway in one region, consider the following high-level process: Enable IPv6 on the Virtual Network: Ensure the region’s VNET where the Application Gateway will reside has an IPv6 address space. Configure a subnet for the Application Gateway that is dual-stack (contains both an IPv4 subnet prefix and an IPv6 /64 prefix). Deploy Application Gateway (v2) with Dual Stack Frontend: Create a new Application Gateway using the Standard_v2 or WAF_v2 SKU. Populate Backend Pool: Ensure your backend pool (the target application servers or service) contains (DNS names pointing to) IPv4 addresses of your actual web servers. IPv6 addresses are not supported for backends. Configure Listeners and Rules: Set up listeners on the Application Gateway for your site. When creating an HTTP(S) listener, you choose which frontend IP to use – you would create one listener for IPv4 address and one for IPv6. Both listeners can use the same domain name (hostname) and the same underlying routing rule to your backend pool. Testing and DNS: After the gateway is deployed and configured, note the IPv6 address of the frontend (you can find it in the Gateway’s overview or in the associated Public IP resource). Update your application’s DNS records: create an AAAA record pointing to this IPv6 address (and update the A record to point to the IPv4 if it changed). With DNS in place, test the application by accessing it from an IPv6-enabled client or tool. Outcome: In this single-region scenario, IPv6-only clients will resolve your website’s hostname to an IPv6 address and connect to the Application Gateway over IPv6. The Application Gateway then handles the traffic and forwards it to your application over IPv4 internally. From the user perspective, the service now appears natively on IPv6. Importantly, this does not require any changes to the web servers, which can continue using IPv4. Application Gateway will include the source IPv6 address in an X-Forwarded-For header, so that the backend application has visibility of the originating client's address. Example: Our web application, which displays the region it is deployed in and the calling client's IP address, now runs on a VM behind Application Gateway in Sweden Central. The front-end has an FQDN of `ipv6webapp-appgw-swedencentral.swedencentral.cloudapp.azure.com`. Application Gateway terminates the IPv6 connection from the client and proxies the traffic to the application over IPv4. The client's IPv6 address is passed in the X-Forwarded-For header, which is read and displayed by the application. Calling the application's API endpoint at `/api/region` shows additional detail, including the IPv4 address of the Application Gateway instance that initiates the connection to the backend, and the original client IPv6 address (with the source port number appended) preserved in the X-Forwarded-For header. { "region": "SwedenCentral", "clientIp": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21:60769", "xForwardedFor": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21:60769", "remoteAddress": "::ffff:10.1.0.4", "isPrivateIP": false, "expressIp": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21:60769", "connectionInfo": { "remoteAddress": "::ffff:10.1.0.4", "remoteFamily": "IPv6", "localAddress": "::ffff:10.1.1.68", "localPort": 80 }, "allHeaders": { "x-forwarded-for": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21:60769" }, "deploymentAdvice": "Public IP detected successfully" } Limitations & Considerations: - Application Gateway v1 SKUs are not supported for IPv6. If you have an older deployment on v1, you’ll need to migrate to v2. - IPv6-only Application Gateway is not allowed. You must have IPv4 alongside IPv6 (the service must be dual-stack). This is usually fine, as dual-stack ensures all clients are covered. - No IPv6 backend addresses: The backend pool must have IPv4 addresses. - Management and Monitoring: Application Gateway logs traffic from IPv6 clients to Log Analytics (the client IP field will show IPv6 addresses). - Security: Azure’s infrastructure provides basic DDoS protection for IPv6 endpoints just as for IPv4. However, it is highly recommended to deploy Azure DDoS Protection Standard: this provides enhanced mitigation tailored to your specific deployment. Consider using the Web Application Firewall function for protection against application layer attacks. Application Gateway - multi-region Mission-critical web applications should be deploy in multiple Azure regions, achieving higher availability and lower latency for users worldwide. In a multi-region scenario, you need a mechanism to direct IPv6 client traffic to the “nearest” or healthiest region. Azure Application Gateway by itself is a regional service, so to use it in multiple regions, we use Azure Traffic Manager for global DNS load balancing, or use Azure Front Door (covered in the next section) as an alternative. This section focuses on the Traffic Manager + Application Gateway approach to multi-region IPv6 delivery. Azure Traffic Manager is a DNS-based load balancer that can distribute traffic across endpoints in different regions. It works by responding to DNS queries with the appropriate endpoint FQDN or IP, based on the routing method (Performance, Priority, Geographic) configured. Traffic Manager is agnostic to the IP version: it either returns CNAMEs, or AAAA records for IPv6 endpoints and A records for IPv4. This makes it suitable for routing IPv6 traffic globally. Architecture Overview: Each region has its own dual-stack Application Gateway. Traffic Manager is configured with an endpoint entry for each region’s gateway. The application’s FQDN is now a domain name hosted by Traffic Manager such as ipv6webapp.traffimanager.net, or a CNAME that ultimately points to it. DNS resolution will go through Traffic Manager, which decides which regional gateway’s FQDN to return. The client then connects directly to that Application Gateway’s IPv6 address, as follows: 1. DNS query: Client asks for ipv6webapp.trafficmanager.net, which is hosted in a Traffic Manager profile. 2. Traffic Manager decision: Traffic Manager sees an incoming DNS request and chooses the best endpoint (say, Sweden Central) based on routing rules (e.g., geographic proximity or lowest latency). 3. Traffic Manager response: Traffic Manager returns the FQDN of the Sweden Central Application Gateway to the client. 4. DNS Resolution: The client resolves regional FQDN and receives a AAAA response containing the IPv6 address. 5. Client connects: The client’s browser connects to the West Europe App Gateway IPv6 address directly. The HTTP/S session is established via IPv6 to that regional gateway, which then handles the request. 6. Failover: If that region becomes unavailable, Traffic Manager’s health checks will detect it and subsequent DNS queries will be answered with the FQDN of the secondary region’s gateway. Deployment Steps for Multi-Region with Traffic Manager: Set up Dual-Stack Application Gateways in each region: Similar to the single-region case, deploy an Azure Application Gateway v2 in each desired region (e.g., one in North America, one in Europe). Configure the web application in each region, these should be parallel deployments serving the same content. Configure a Traffic Manager Profile: In Azure Traffic Manager, create a profile and choose a routing method (such as Performance for nearest region routing, or Priority for primary/backup failover). Add endpoints for each region. Since our endpoints are Azure services with IPs, we can either use Azure endpoints (if the Application Gateways have Azure-provided DNS names) or External endpoints using the IP addresses. The simplest way is to use the Public IP resource of each Application Gateway as an Azure endpoint – ensure each App Gateway’s public IP has a DNS label (so it has a FQDN). Traffic Manager will detect those and also be aware of their IPs. Alternatively, use the IPv6 address as an External endpoint directly. Traffic Manager allows IPv6 addresses and will return AAAA records for them. DNS Setup: Traffic Manager profiles have a FQDN (like ipv6webapp.trafficmanager.net). You can either use that as your service’s CNAME, or you can configure your custom domain to CNAME to the Traffic Manager profile. Health Probing: Traffic Manager continuously checks the health of endpoints. When endpoints are Azure App Gateways, it uses HTTP/S probes to a specified URI path, to each gateway’s address. Make sure each App Gateway has a listener on the probing endpoint (e.g., a health check page) and that health probes are enabled. Testing Failover and Distribution: Test the setup by querying DNS from different geographical locations (to see if you get the nearest region’s IP). Also simulate a region down (stop the App Gateway or backend) and observe if Traffic Manager directs traffic to the other region. Because DNS TTLs are involved, failover isn’t instant but typically within a couple of minutes depending on TTL and probe interval. Considerations in this Architecture: - Latency vs Failover: Traffic Manager as a DNS load balancer directs users at connect time, but once a client has an answer (IP address), it keeps sending to that address until the DNS record TTL expires and it re-resolves. This is fine for most web apps. Ensure the TTL in the Traffic Manager profile is not too high (the default is 30 seconds). - IPv6 DNS and Connectivity: Confirm that each region’s IPv6 address is correctly configured and reachable globally. Azure’s public IPv6 addresses are globally routable. Traffic Manager itself is a global service and fully supports IPv6 in its decision-making. - Cost: Using multiple Application Gateways and Traffic Manager incurs costs for each component (App Gateway is per hour + capacity unit, Traffic Manager per million DNS queries). This is a trade-off for high availability. - Alternative: Azure Front Door: Azure Front Door is an alternative to the Traffic Manager + Application Gateway combination. Front Door can automatically handle global routing and failover at layer 7 without DNS-based limitations, offering potentially faster failover. Azure Front Door is discussed in the next section. In summary, a multi-region IPv6 web delivery with Application Gateways uses Traffic Manager for global DNS load balancing. Traffic Manager will seamlessly return IPv6 addresses for IPv6 clients, ensuring that no matter where an IPv6-only client is, they get pointed to the nearest available regional deployment of your app. This design achieves global resiliency (withstand a regional outage) and low latency access, leveraging IPv6 connectivity on each regional endpoint. Example: The global FQDN of our application is now ipv6webapp.trafficmanager.net and clients will use this FQDN to access the application regardless of their geographical location. Traffic Manager will return the FQDN of one of the regional deployments, `ipv6webapp-appgw-swedencentral.swedencentral.cloudapp.azure.com` or `ipv6webappr2-appgw-eastus2.eastus2.cloudapp.azure.com` depending on the routing method configured, the health state of the regional endpoints and the client's location. Then the client resolves the regional FQDN through its local DNS server and connects to the regional instance of the application. DNS resolution from a client in Europe: Resolve-DnsName ipv6webapp.trafficmanager.net Name Type TTL Section NameHost ---- ---- --- ------- -------- ipv6webapp.trafficmanager.net CNAME 59 Answer ipv6webapp-appgw-swedencentral.swedencentral.cloudapp.azure.com Name : ipv6webapp-appgw-swedencentral.swedencentral.cloudapp.azure.com QueryType : AAAA TTL : 10 Section : Answer IP6Address : 2603:1020:1001:25::168 And from a client in the US: Resolve-DnsName ipv6webapp.trafficmanager.net Name Type TTL Section NameHost ---- ---- --- ------- -------- ipv6webapp.trafficmanager.net CNAME 60 Answer ipv6webappr2-appgw-eastus2.eastus2.cloudapp.azure.com Name : ipv6webappr2-appgw-eastus2.eastus2.cloudapp.azure.com QueryType : AAAA TTL : 10 Section : Answer IP6Address : 2603:1030:403:17::5b0 Azure Front Door Azure Front Door is an application delivery network with built-in CDN, SSL offload, WAF, and routing capabilities. It provides a single, unified frontend distributed across Microsoft’s edge network. Azure Front Door natively supports IPv6 connectivity. For applications that have users worldwide, Front Door offers advantages: - Global Anycast Endpoint: Provides anycast IPv4 and IPv6 addresses, advertised out of all edge locations, with automatic A and AAAA DNS record support. - IPv4 and IPv6 origin support: Azure Front Door supports both IPv4 and IPv6 origins (i.e. backends), both within Azure and externally (i.e. accessible over the internet). - Simplified DNS: Custom domains can be mapped using CNAME records. - Layer-7 Routing: Supports path-based routing and automatic backend health detection. - Edge Security: Includes DDoS protection and optional WAF integration. Front Door enables "cross-IP version" scenario's: a client can connect to the Front Door front-end over IPv6, and then Front Door can connect to an IPv4 origin. Conversely, an IPv4-only client can retrieve content from an IPv6 backend via Front Door. Front Door preserves the client's source IP address in the X-Forwarded-For header. Note: Front Door provides managed IPv6 addresses that are not customer-owned resources. Custom domains should use CNAME records pointing to the Front Door hostname rather than direct IP address references. Private Link Integration Azure Front Door Premium introduces Private Link integration, enabling secure, private connectivity between Front Door and backend resources, without exposing them to the public internet. When Private Link is enabled, Azure Front Door establishes a private endpoint within a Microsoft-managed virtual network. This endpoint acts as a secure bridge between Front Door’s global edge network and your origin resources, such as Azure App Service, Azure Storage, Application Gateway, or workloads behind an internal load balancer. Traffic from end users still enters through Front Door’s globally distributed POPs, benefiting from features like SSL offload, caching, and WAF protection. However, instead of routing to your origin over public, internet-facing, endpoints, Front Door uses the private Microsoft backbone to reach the private endpoint. This ensures that all traffic between Front Door and your origin remains isolated from external networks. The private endpoint connection requires approval from the origin resource owner, adding an extra layer of control. Once approved, the origin can restrict public access entirely, enforcing that all traffic flows through Private Link. Private Link integration brings following benefits: - Enhanced Security: By removing public exposure of backend services, Private Link significantly reduces the risk of DDoS attacks, data exfiltration, and unauthorized access. - Compliance and Governance: Many regulatory frameworks mandate private connectivity for sensitive workloads. Private Link helps meet these requirements without sacrificing global availability. - Performance and Reliability: Traffic between Front Door and your origin travels over Microsoft’s high-speed backbone network, delivering low latency and consistent performance compared to public internet paths. - Defense in Depth: Combined with Web Application Firewall (WAF), TLS encryption, and DDoS protection, Private Link strengthens your security posture across multiple layers. - Isolation and Control: Resource owners maintain control over connection approvals, ensuring that only authorized Front Door profiles can access the origin. - Integration with Hybrid Architectures: For scenarios involving AKS clusters, custom APIs, or workloads behind internal load balancers, Private Link enables secure connectivity without requiring public IPs or complex VPN setups. Private Link transforms Azure Front Door from a global entry point into a fully private delivery mechanism for your applications, aligning with modern security principles and enterprise compliance needs. Example: Our application is now placed behind Azure Front Door. We are combining a public backend endpoint and Private Link integration, to show both in action in a single example. The Sweden Central origin endpoint is the public IPv6 endpoint of the regional External Load Balancers and the origin in US East 2 is connected via Private Link integration The global FQDN `ipv6webapp-d4f4euhnb8fge4ce.b01.azurefd.net` and clients will use this FQDN to access the application regardless of their geographical location. The FQDN resolves to Front Door's global anycast address, and the internet will route client requests to the nearest Microsoft edge from this address is advertised. Front Door will then transparently route the request to the nearest origin deployment in Azure. Although public endpoints are used in this example, that traffic will be route over the Microsoft network. From a client in Europe: Calling the application's api endpoint on `ipv6webapp-d4f4euhnb8fge4ce.b01.azurefd.net/api/region` shows some more detail. { "region": "SwedenCentral", "clientIp": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21", "xForwardedFor": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21", "remoteAddress": "2a01:111:2053:d801:0:afd:ad4:1b28", "isPrivateIP": false, "expressIp": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21", "connectionInfo": { "remoteAddress": "2a01:111:2053:d801:0:afd:ad4:1b28", "remoteFamily": "IPv6", "localAddress": "2001:db8:1:1::4", "localPort": 80 }, "allHeaders": { "x-forwarded-for": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21", "x-azure-clientip": "2001:1c04:3404:9500:fd9b:58f4:1fb2:db21" }, "deploymentAdvice": "Public IP detected successfully" } "remoteAddress": "2a01:111:2053:d801:0:afd:ad4:1b28" is the address from which Front Door sources its request to the origin. From a client in the US: The detailed view shows that the IP address calling the backend instance now is local VNET address. Private Link sources traffic coming in from a local address taken from the VNET it is in. The original client IP address is again preserved in the X-Forwarded-For header. { "region": "eastus2", "clientIp": "2603:1030:501:23::68:55658", "xForwardedFor": "2603:1030:501:23::68:55658", "remoteAddress": "::ffff:10.2.1.5", "isPrivateIP": false, "expressIp": "2603:1030:501:23::68:55658", "connectionInfo": { "remoteAddress": "::ffff:10.2.1.5", "remoteFamily": "IPv6", "localAddress": "::ffff:10.2.2.68", "localPort": 80 }, "allHeaders": { "x-forwarded-for": "2603:1030:501:23::68:55658" }, "deploymentAdvice": "Public IP detected successfully" } Conclusion IPv6 adoption for web applications is no longer optional. It is essential as public IPv4 address space is depleted, mobile networks increasingly use IPv6 only and governments mandate IPv6 reachability for public services. Azure's comprehensive dual-stack networking capabilities provide a clear path forward, enabling organizations to leverage IPv6 externally without sacrificing IPv4 compatibility or requiring complete infrastructure overhauls. Azure's externally facing services — including Application Gateway, External Load Balancer, Global Load Balancer, and Front Door — support IPv6 frontends, while Application Gateway and Front Door maintain IPv4 backend connectivity. This architecture allows applications to remain unchanged while instantly becoming accessible to IPv6-only clients. For single-region deployments, Application Gateway offers layer-7 features like SSL termination and WAF protection. External Load Balancer provides high-performance layer-4 distribution. Multi-region scenarios benefit from Traffic Manager's DNS-based routing combined with regional Application Gateways, or the superior performance and failover capabilities of Global Load Balancer's anycast addressing. Azure Front Door provides global IPv6 delivery with edge optimization, built-in security, and seamless failover across Microsoft's network. Private Link integration allows secure global IPv6 distribution while maintaining backend isolation. The transition to IPv6 application delivery on Azure is straightforward: enable dual-stack addressing on virtual networks, configure IPv6 frontends on load balancing services, and update DNS records. With Application Gateway or Front Door, backend applications require no modifications. These Azure services handle the IPv4-to-IPv6 translation seamlessly. This approach ensures both immediate IPv6 accessibility and long-term architectural flexibility as IPv6 adoption accelerates globally.372Views1like0CommentsDeploying Third-Party Firewalls in Azure Landing Zones: Design, Configuration, and Best Practices
As enterprises adopt Microsoft Azure for large-scale workloads, securing network traffic becomes a critical part of the platform foundation. Azure’s Well-Architected Framework provides the blueprint for enterprise-scale Landing Zone design and deployments, and while Azure Firewall is a built-in PaaS option, some organizations prefer third-party firewall appliances for familiarity, feature depth, and vendor alignment. This blog explains the basic design patterns, key configurations, and best practices when deploying third-party firewalls (Palo Alto, Fortinet, Check Point, etc.) as part of an Azure Landing Zone. 1. Landing Zone Architecture and Firewall Role The Azure Landing Zone is Microsoft’s recommended enterprise-scale architecture for adopting cloud at scale. It provides a standardized, modular design that organizations can use to deploy and govern workloads consistently across subscriptions and regions. At its core, the Landing Zone follows a hub-and-spoke topology: Hub (Connectivity Subscription): Central place for shared services like DNS, private endpoints, VPN/ExpressRoute gateways, Azure Firewall (or third-party firewall appliances), Bastion, and monitoring agents. Provides consistent security controls and connectivity for all workloads. Firewalls are deployed here to act as the traffic inspection and enforcement point. Spokes (Workload Subscriptions): Application workloads (e.g., SAP, web apps, data platforms) are placed in spoke VNets. Additional specialized spokes may exist for Identity, Shared Services, Security, or Management. These are isolated for governance and compliance, but all connectivity back to other workloads or on-premises routes through the hub. Traffic Flows Through Firewalls North-South Traffic: Inbound connections from the Internet (e.g., customer access to applications). Outbound connections from Azure workloads to Internet services. Hybrid connectivity to on-premises datacenters or other clouds. Routed through the external firewall set for inspection and policy enforcement. East-West Traffic: Lateral traffic between spokes (e.g., Application VNet to Database VNet). Communication across environments like Dev → Test → Prod (if allowed). Routed through an internal firewall set to apply segmentation, zero-trust principles, and prevent lateral movement of threats. Why Firewalls Matter in the Landing Zone While Azure provides NSGs (Network Security Groups) and Route Tables for basic packet filtering and routing, these are not sufficient for advanced security scenarios. Firewalls add: Deep packet inspection (DPI) and application-level filtering. Intrusion Detection/Prevention (IDS/IPS) capabilities. Centralized policy management across multiple spokes. Segmentation of workloads to reduce blast radius of potential attacks. Consistent enforcement of enterprise security baselines across hybrid and multi-cloud. Organizations May Choose Depending on security needs, cost tolerance, and operational complexity, organizations typically adopt one of two models for third party firewalls: Two sets of firewalls One set dedicated for north-south traffic (external to Azure). Another set for east-west traffic (between VNets and spokes). Provides the highest security granularity, but comes with higher cost and management overhead. Single set of firewalls A consolidated deployment where the same firewall cluster handles both east-west and north-south traffic. Simpler and more cost-effective, but may introduce complexity in routing and policy segregation. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. 2. Why Choose Third-Party Firewalls Over Azure Firewall? While Azure Firewall provides simplicity as a managed service, customers often choose third-party solutions due to: Advanced features – Deep packet inspection, IDS/IPS, SSL decryption, threat feeds. Vendor familiarity – Network teams trained on Palo Alto, Fortinet, or Check Point. Existing contracts – Enterprise license agreements and support channels. Hybrid alignment – Same vendor firewalls across on-premises and Azure. Azure Firewall is a fully managed PaaS service, ideal for customers who want a simple, cloud-native solution without worrying about underlying infrastructure. However, many enterprises continue to choose third-party firewall appliances (Palo Alto, Fortinet, Check Point, etc.) when implementing their Landing Zones. The decision usually depends on capabilities, familiarity, and enterprise strategy. Key Reasons to Choose Third-Party Firewalls Feature Depth and Advanced Security Third-party vendors offer advanced capabilities such as: Deep Packet Inspection (DPI) for application-aware filtering. Intrusion Detection and Prevention (IDS/IPS). SSL/TLS decryption and inspection. Advanced threat feeds, malware protection, sandboxing, and botnet detection. While Azure Firewall continues to evolve, these vendors have a longer track record in advanced threat protection. Operational Familiarity and Skills Network and security teams often have years of experience managing Palo Alto, Fortinet, or Check Point appliances on-premises. Adopting the same technology in Azure reduces the learning curve and ensures faster troubleshooting, smoother operations, and reuse of existing playbooks. Integration with Existing Security Ecosystem Many organizations already use vendor-specific management platforms (e.g., Panorama for Palo Alto, FortiManager for Fortinet, or SmartConsole for Check Point). Extending the same tools into Azure allows centralized management of policies across on-premises and cloud, ensuring consistent enforcement. Compliance and Regulatory Requirements Certain industries (finance, healthcare, government) require proven, certified firewall vendors for security compliance. Customers may already have third-party solutions validated by auditors and prefer extending those to Azure for consistency. Hybrid and Multi-Cloud Alignment Many enterprises run a hybrid model, with workloads split across on-premises, Azure, AWS, or GCP. Third-party firewalls provide a common security layer across environments, simplifying multi-cloud operations and governance. Customization and Flexibility Unlike Azure Firewall, which is a managed service with limited backend visibility, third-party firewalls give admins full control over operating systems, patching, advanced routing, and custom integrations. This flexibility can be essential when supporting complex or non-standard workloads. Licensing Leverage (BYOL) Enterprises with existing enterprise agreements or volume discounts can bring their own firewall licenses (BYOL) to Azure. This often reduces cost compared to pay-as-you-go Azure Firewall pricing. When Azure Firewall Might Still Be Enough Organizations with simple security needs (basic north-south inspection, FQDN filtering). Cloud-first teams that prefer managed services with minimal infrastructure overhead. Customers who want to avoid manual scaling and VM patching that comes with IaaS appliances. In practice, many large organizations use a hybrid approach: Azure Firewall for lightweight scenarios or specific environments, and third-party firewalls for enterprise workloads that require advanced inspection, vendor alignment, and compliance certifications. 3. Deployment Models in Azure Third-party firewalls in Azure are primarily IaaS-based appliances deployed as virtual machines (VMs). Leading vendors publish Azure Marketplace images and ARM/Bicep templates, enabling rapid, repeatable deployments across multiple environments. These firewalls allow organizations to enforce advanced network security policies, perform deep packet inspection, and integrate with Azure-native services such as Virtual Network (VNet) peering, Azure Monitor, and Azure Sentinel. Note: Some vendors now also release PaaS versions of their firewalls, offering managed firewall services with simplified operations. However, for the purposes of this blog, we will focus mainly on IaaS-based firewall deployments. Common Deployment Modes Active-Active Description: In this mode, multiple firewall VMs operate simultaneously, sharing the traffic load. An Azure Load Balancer distributes inbound and outbound traffic across all active firewall instances. Use Cases: Ideal for environments requiring high throughput, resilience, and near-zero downtime, such as enterprise data centers, multi-region deployments, or mission-critical applications. Considerations: Requires careful route and policy synchronization between firewall instances to ensure consistent traffic handling. Typically involves BGP or user-defined routes (UDRs) for optimal traffic steering. Scaling is easier: additional firewall VMs can be added behind the load balancer to handle traffic spikes. Active-Passive Description: One firewall VM handles all traffic (active), while the secondary VM (passive) stands by for failover. When the active node fails, Azure service principals manage IP reassignment and traffic rerouting. Use Cases: Suitable for environments where simpler management and lower operational complexity are preferred over continuous load balancing. Considerations: Failover may result in a brief downtime, typically measured in seconds to a few minutes. Synchronization between the active and passive nodes ensures firewall policies, sessions, and configurations are mirrored. Recommended for smaller deployments or those with predictable traffic patterns. Network Interfaces (NICs) Third-party firewall VMs often include multiple NICs, each dedicated to a specific type of traffic: Untrust/Public NIC: Connects to the Internet or external networks. Handles inbound/outbound public traffic and enforces perimeter security policies. Trust/Internal NIC: Connects to private VNets or subnets. Manages internal traffic between application tiers and enforces internal segmentation. Management NIC: Dedicated to firewall management traffic. Keeps administration separate from data plane traffic, improving security and reducing performance interference. HA NIC (Active-Passive setups): Facilitates synchronization between active and passive firewall nodes, ensuring session and configuration state is maintained across failovers. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. : NICs of Palo Alto External Firewalls and FortiGate Internal Firewalls in two sets of firewall scenario 4. Key Configuration Considerations When deploying third-party firewalls in Azure, several design and configuration elements play a critical role in ensuring security, performance, and high availability. These considerations should be carefully aligned with organizational security policies, compliance requirements, and operational practices. Routing User-Defined Routes (UDRs): Define UDRs in spoke Virtual Networks to ensure all outbound traffic flows through the firewall, enforcing inspection and security policies before reaching the Internet or other Virtual Networks. Centralized routing helps standardize controls across multiple application Virtual Networks. Depending on the architecture traffic flow design, use appropriate Load Balancer IP as the Next Hop on UDRs of spoke Virtual Networks. Symmetric Routing: Ensure traffic follows symmetric paths (i.e., outbound and inbound flows pass through the same firewall instance). Avoid asymmetric routing, which can cause stateful firewalls to drop return traffic. Leverage BGP with Azure Route Server where supported, to simplify route propagation across hub-and-spoke topologies. : Azure UDR directing all traffic from a Spoke VNET to the Firewall IP Address Policies NAT Rules: Configure DNAT (Destination NAT) rules to publish applications securely to the Internet. Use SNAT (Source NAT) to mask private IPs when workloads access external resources. Security Rules: Define granular allow/deny rules for both north-south traffic (Internet to VNet) and east-west traffic (between Virtual Networks or subnets). Ensure least privilege by only allowing required ports, protocols, and destinations. Segmentation: Apply firewall policies to separate workloads, environments, and tenants (e.g., Production vs. Development). Enforce compliance by isolating workloads subject to regulatory standards (PCI-DSS, HIPAA, GDPR). Application-Aware Policies: Many vendors support Layer 7 inspection, enabling controls based on applications, users, and content (not just IP/port). Integrate with identity providers (Azure AD, LDAP, etc.) for user-based firewall rules. : Example Configuration of NAT Rules on a Palo Alto External Firewall Load Balancers Internal Load Balancer (ILB): Use ILBs for east-west traffic inspection between Virtual Networks or subnets. Ensures that traffic between applications always passes through the firewall, regardless of origin. External Load Balancer (ELB): Use ELBs for north-south traffic, handling Internet ingress and egress. Required in Active-Active firewall clusters to distribute traffic evenly across firewall nodes. Other Configurations: Configure health probes for firewall instances to ensure faulty nodes are automatically bypassed. Validate Floating IP configuration on Load Balancing Rules according to the respective vendor recommendations. Identity Integration Azure Service Principals: In Active-Passive deployments, configure service principals to enable automated IP reassignment during failover. This ensures continuous service availability without manual intervention. Role-Based Access Control (RBAC): Integrate firewall management with Azure RBAC to control who can deploy, manage, or modify firewall configurations. SIEM Integration: Stream logs to Azure Monitor, Sentinel, or third-party SIEMs for auditing, monitoring, and incident response. Licensing Pay-As-You-Go (PAYG): Licenses are bundled into the VM cost when deploying from the Azure Marketplace. Best for short-term projects, PoCs, or variable workloads. Bring Your Own License (BYOL): Enterprises can apply existing contracts and licenses with vendors to Azure deployments. Often more cost-effective for large-scale, long-term deployments. Hybrid Licensing Models: Some vendors support license mobility from on-premises to Azure, reducing duplication of costs. 5. Common Challenges Third-party firewalls in Azure provide strong security controls, but organizations often face practical challenges in day-to-day operations: Misconfiguration Incorrect UDRs, route tables, or NAT rules can cause dropped traffic or bypassed inspection. Asymmetric routing is a frequent issue in hub-and-spoke topologies, leading to session drops in stateful firewalls. Performance Bottlenecks Firewall throughput depends on the VM SKU (CPU, memory, NIC limits). Under-sizing causes latency and packet loss, while over-sizing adds unnecessary cost. Continuous monitoring and vendor sizing guides are essential. Failover Downtime Active-Passive models introduce brief service interruptions while IPs and routes are reassigned. Some sessions may be lost even with state sync, making Active-Active more attractive for mission-critical workloads. Backup & Recovery Azure Backup doesn’t support vendor firewall OS. Configurations must be exported and stored externally (e.g., storage accounts, repos, or vendor management tools). Without proper backups, recovery from failures or misconfigurations can be slow. Azure Platform Limits on Connections Azure imposes a per-VM cap of 250,000 active connections, regardless of what the firewall vendor appliance supports. This means even if an appliance is designed for millions of sessions, it will be constrained by Azure’s networking fabric. Hitting this cap can lead to unexplained traffic drops despite available CPU/memory. The workaround is to scale out horizontally (multiple firewall VMs behind a load balancer) and carefully monitor connection distribution. 6. Best Practices for Third-Party Firewall Deployments To maximize security, reliability, and performance of third-party firewalls in Azure, organizations should follow these best practices: Deploy in Availability Zones: Place firewall instances across different Availability Zones to ensure regional resilience and minimize downtime in case of zone-level failures. Prefer Active-Active for Critical Workloads: Where zero downtime is a requirement, use Active-Active clusters behind an Azure Load Balancer. Active-Passive can be simpler but introduces failover delays. Use Dedicated Subnets for Interfaces: Separate trust, untrust, HA, and management NICs into their own subnets. This enforces segmentation, simplifies route management, and reduces misconfiguration risk. Apply Least Privilege Policies: Always start with a deny-all baseline, then allow only necessary applications, ports, and protocols. Regularly review rules to avoid policy sprawl. Standardize Naming & Tagging: Adopt consistent naming conventions and resource tags for firewalls, subnets, route tables, and policies. This aids troubleshooting, automation, and compliance reporting. Validate End-to-End Traffic Flows: Test both north-south (Internet ↔ VNet) and east-west (VNet ↔ VNet/subnet) flows after deployment. Use tools like Azure Network Watcher and vendor traffic logs to confirm inspection. Plan for Scalability: Monitor throughput, CPU, memory, and session counts to anticipate when scale-out or higher VM SKUs are needed. Some vendors support autoscaling clusters for bursty workloads. Maintain Firmware & Threat Signatures: Regularly update the firewall’s software, patches, and threat intelligence feeds to ensure protection against emerging vulnerabilities and attacks. Automate updates where possible. Conclusion Third-party firewalls remain a core building block in many enterprise Azure Landing Zones. They provide the deep security controls and operational familiarity enterprises need, while Azure provides the scalable infrastructure to host them. By following the hub-and-spoke architecture, carefully planning deployment models, and enforcing best practices for routing, redundancy, monitoring, and backup, organizations can ensure a secure and reliable network foundation in Azure.1.7KViews5likes2CommentsExtending Layer-2 (VXLAN) networks over Layer-3 IP network
Introduction Virtual Extensible LAN (VXLAN) is a network virtualization technology that encapsulates Layer-2 Ethernet frames inside Layer-3 UDP/IP packets. In essence, VXLAN creates a logical Layer-2 overlay network on top of an IP network, allowing Ethernet segments (VLANs) or underlay IP packet to be stretched across routed infrastructures. A key advantage is scale: VXLAN uses a 24-bit segment ID (VNI) instead of the 12-bit VLAN ID, supporting around 16 million isolated networks versus the 4,094 VLAN limit. This makes VXLAN ideal for large cloud data centers and multi-tenant environments that demand many distinct network segments. VXLAN’s Layer-2 overlays bring flexibility and mobility to modern architectures. Because VXLAN tunnels can span multiple Layer-3 domains, organizations can extend VLANs across different sites or subnets – for example, creating a tunnel that extends over two data centers over an IP WAN as long as underlying tunnel IP is reachable. This enables seamless workload mobility and disaster recovery: also helps virtual machines or applications can move between physical locations without changing IP addresses, since they remain in the same virtual L2 network. The overlay approach also decouples the logical network from the physical underlay, meaning you can run your familiar L2 segments over any IP routing infrastructure while leveraging features like equal-cost multi-path (ECMP) load balancing and avoiding large spanning-tree domains. In short, VXLAN combines the best of both worlds – the simplicity of Layer-2 adjacency with the scalability of Layer-3 routing – making it a foundational tool in cloud networking and software-defined data centers. Layer-2 VXLAN overlay on a Layer-3 IP network allows customers or edge networks to stretch Ethernet (VLAN) segments across geographically distributed sites using an IP backbone. This approach preserves VLAN tags end-to-end and enables flexible segmentation across locations without needing an extend or continuous Layer-2 network in the core. It also helps hide or avoid the underlying IP network complexities. However, it’s crucial to account for MTU overhead (VXLAN adds ~50 bytes of header) so that the overlay’s VLAN MTU is set smaller than the underlay IP MTU – otherwise fragmentation or packet loss can occur. Additionally, because VXLAN doesn’t inherently signal link status, implementing Bidirectional Forwarding Detection (BFD) on the VXLAN interfaces provides rapid detection of neighbor failures, ensuring quick rerouting or recovery when a tunnel endpoint goes down. VXLAN overlay use case and benefits VXLAN is a standard protocol (IETF RFC 7348) that can encapsulate Layer-2 Ethernet frames into Layer-3 UDP/IP packets. By doing so, VXLAN creates an L2 overlay network on top of an L3 underlay. The VXLAN tunnel endpoints (VTEPs), which can be routers, switches, or hosts, wrap the original Ethernet frame (including its VLAN tag) with an IP/UDP header plus a VXLAN header, then send it through the IP network. The default UDP port for VXLAN is 4789. This mechanism offers several key benefits: Preserves VLAN Tags and L2 Segmentation: The entire Ethernet frame is carried across, so the original VLAN ID (802.1Q tag) is maintained end-to-end through the tunnel. Even if an extra tag is added at the ingress for local tunneling, the customer’s inner VLAN tag remains intact across the overlay. This means a VLAN defined at one site will be recognized at the other site as the same VLAN, enabling seamless L2 adjacency. In practice, VXLAN can transport multiple VLANs transparently by mapping each VLAN or service to a VXLAN Network Identifier (VNI). Flexible network segmentation at scale: VXLAN uses a 24-bit VNI (VXLAN Network ID), supporting about 16 million distinct segments, far exceeding the 4094 VLAN limit of traditional 802.1Q networks. This gives architects freedom to create many isolated L2 overlay networks (for multi-tenant scenarios, application tiers, etc.) over a shared IP infrastructure. Geographically distributed sites can share the same VLANs and broadcast domain via VXLAN, without the WAN routers needing any VLAN configurations. The IP/MPLS core only sees routed VXLAN packets, not individual VLANs, simplifying the underlay configuration. No need for end-to-end VLANs in underlay: Traditional solutions to extend L2 might rely on methods like MPLS/VPLS or long ethernet trunk lines, which often require configuring VLANs across the WAN and can’t scale well. In a VXLAN overlay, the intermediate L3 network remains unaware of customer VLANs, and you don’t need to trunk VLANs across the WAN. Each site’s VTEP encapsulates and decapsulates traffic, so the core routers/switches just forward IP/UDP packets. This isolation improves scalability and stability—core devices don’t carry massive MAC address tables or STP domains from all sites. It also means the underlay can use robust IP routing (OSPF, BGP, etc.) with ECMP, rather than extending spanning-tree across sites. In short, VXLAN lets you treat the WAN like an IP cloud while still maintaining Layer-2 connectivity between specific endpoints. Multi-path and resilience: Since the overlay runs on IP, it naturally leverages IP routing features. ECMP in the underlay, for example, can load-balance VXLAN traffic across multiple links, something not possible with a single bridged VLAN spanning the WAN. The encapsulated traffic’s UDP header even provides entropy (via source port hashing) to help load-sharing on multiple paths. Furthermore, if one underlay path fails, routing protocols can reroute VXLAN packets via alternate paths without disrupting the logical L2 network. This increases reliability and bandwidth usage compared to a Layer-2 only approach. Diagram: VXLAN Overlay Across a Layer-3 WAN – Below is a simplified illustration of two sites using a VXLAN overlay. “Site A” and “Site B” each have a local VLAN (e.g. VLAN 100) that they want to bridge across an IP WAN. The VTEPs at each site encapsulate the Layer-2 frames into VXLAN/UDP packets and send them over the IP network. Inside the tunnel, the original VLAN tag is preserved. In this example, a BFD session (red dashed line) runs between the VTEPs to monitor the tunnel’s health, as explained later. Figure 1: Two sites (A and B) extend “VLAN 100” across an IP WAN using a VXLAN tunnel. The inner VLAN tag is preserved over the L3 network. A BFD keepalive (every 900ms) runs between the VXLAN endpoints to detect failures. The practical effect of this design is that devices in Site A and Site B can be in the same VLAN and IP subnet, broadcast to each other, etc., even though they are connected by a routed network. For example, if Site A has a machine in VLAN 100 with IP 10.1.100.5/24 and Site B has another in VLAN 100 with IP 10.1.100.10/24, they can communicate as if on one LAN – ARP, switches, and VLAN tagging function normally across the tunnel. MTU and overhead considerations One critical consideration for deploying VXLAN overlays is handling the increased packet size due to encapsulation. A VXLAN packet includes additional headers on top of the original Ethernet frame: an outer IP header, UDP header, and VXLAN header (plus an outer Ethernet header on the WAN interface). This encapsulation adds approximately 50 bytes of overhead to each packet (for IPv4; about 70 bytes for IPv6). In practical terms, if your original Ethernet frame was the typical 1500-byte payload (1518 bytes with Ethernet header and CRC, or 1522 with a VLAN tag), the VXLAN-encapsulated version will be ~1550 bytes. **The underlying IP network *must* accommodate these larger frames**, or you’ll get fragmentation or drops. Many network links by default only support 1500-byte MTUs, so without adjustments, a VXLAN carrying a full-sized VLAN packet would exceed that. Though modern networks runs jumbo frames (~9k), if the underlying encapsulated packet frames exceeds 8950 bytes it can create problems like control-plane failure (ex BGP session tear down) or fragmentation for data packet causing out of order packet. Solution: Either raise the MTU on the underlay network or enforce a lower MTU on the overlay. Network architects generally prefer to increase the IP MTU of the core so the overlay can carry standard 1500-byte Ethernet frames unfragmented. For example, one vendor’s guide recommends configuring at least a 1550-byte MTU on all network segments to account for VXLAN’s ~50B overhead. In enterprise environments, it’s common to use “baby jumbo” frames (e.g. 1600 bytes) or full jumbo (9000 bytes) in the datacenter/WAN to accommodate various tunneling overheads. If increasing the underlay MTU is not possible (say, over an ISP that only supports 1500), then the VLAN MTU on the overlay should be reduced – for instance, set the VLAN interface MTU to 1450 bytes, so that even with the 50B VXLAN overhead the outer packet remains 1500 bytes. This prevents any IP fragmentation. Why Fragmentation is Undesirable: VXLAN itself doesn’t include any fragmentation mechanism; it relies on the underlay IP to fragment if needed. But IP fragmentation can harm performance and some devices/drop policies might simply drop oversized VXLAN packets instead of fragmenting. In fact, certain implementations don’t support VXLAN fragmentation or Path MTU discovery on tunnels. The safe approach is to ensure no encapsulated packet ever exceeds the physical MTU. That means planning your MTUs end-to-end: make the core links slightly larger than the largest expected overlay packet. Diagram: VXLAN Encapsulation and MTU Layering – The figure below illustrates the components of a VXLAN-encapsulated frame and how they contribute to packet size. The original Ethernet frame (yellow) with a VLAN tag is wrapped with a new outer Ethernet, IP, UDP, and VXLAN header . The extra headers add ~50 bytes. If the inner (yellow) frame was, say, 1500 bytes of payload plus 18 bytes Ethernet overhead, the outer packet becomes ~1568 bytes (including new headers and FCS). In practice the old FCS is replaced by a new one, so the net growth is ~50 bytes. The key takeaway: the IP transport must handle the total size. Figure 2: Layered view of a VXLAN-encapsulated packet (not to scale). The original Ethernet frame with VLAN tag (yellow) is encapsulated by outer headers (blue/green/red/gray), resulting in ~50 bytes of overhead for IPv4. The outer packet must fit within the WAN MTU (e.g. 1518B if inner frame is 1468B) to avoid fragmentation. In summary, ensure the IP underlay’s MTU is configured to accommodate the VXLAN overhead. If using standard 1500-byte MTUs on the WAN, set your overlay interfaces (VLAN SVIs or bridge MTUs) to around 1450 bytes. In many cases if possible, raising the WAN MTU to 1600 or using jumbo frames throughout is the best practice to provide ample headroom. Always test your end-to-end path with ping sweeps (e.g. using the DF-bit and varying sizes) to verify that the encapsulated packets aren’t being dropped due to MTU limits. Neighbor failure detection with BFD One challenge with overlays like VXLAN is that the logical link lacks immediate visibility into physical link status. If one end of the VXLAN tunnel goes down or the path fails, the other end’s VXLAN interface may remain “up” (since its own underlay interface is still up), potentially blackholing traffic until higher-level protocols notice. VXLAN itself doesn’t send continuous “link alive” messages to check the remote VTEP’s reachability. To address this, network engineers deploy BFD on VXLAN endpoints. BFD is a lightweight protocol specifically designed for rapid failure detection independent of media or routing protocol. It works by two endpoints periodically sending very fast, small hello packets to each other (often every 50ms or less). If a few consecutive hellos are missed, BFD declares the peer down – often within <1 second, versus several seconds (or tens of seconds) with conventional detection. Applying BFD to VXLAN: Many router and switch vendors support running BFD over a VXLAN tunnel or on the VTEP’s loopback adjacencies. When enabled, the two VTEPs will continuously ping each other at the configured interval. If the VXLAN tunnel fails (e.g. one site loses connectivity), BFD on the surviving side will quickly detect the loss of response. This can then trigger corrective actions: for instance, the BFD can generate logs for the logical interface or notify the routing protocol to withdraw routes via that tunnel. In designs with redundant tunnels or redundant VTEPs, BFD helps achieve sub-second failover – traffic can switch to a backup VXLAN tunnel almost immediately upon a primary failure. Even in a single-tunnel scenario, BFD gives an early alert to the network operator or applications that the link is down, rather than quietly dropping packets. Example: If Site A and Site B have two VXLAN tunnels (primary and backup) connecting them, running BFD on each tunnel interface means that if the primary’s path goes down, BFD at Site A and B will detect it within milliseconds and inform the routing control-plane. The network can then shift traffic to the backup tunnel right away. Without BFD, the network might have to wait for a timeout (e.g. OSPF dead interval or even ARP timeouts) to realize the primary tunnel is dead, causing a noticeable outage. BFD is protocol-agnostic – it can integrate with any routing protocols. For VXLAN, it’s purely a monitoring mechanism: lightweight and with minimal overhead on the tunnel. Its messages are small UDP packets (often on port 3784/3785) that can be sourced from the VTEP’s IP. The frequency is configurable based on how fast you need detection vs. how much overhead you can afford; common timers are 300ms with 3x multiplier (detect in ~1s) for moderate speeds, or even 50ms with 3x (150ms detection) for high-speed failover requirements. Bottom line: Implementing BFD dramatically improves the reliability of a VXLAN-based L2 extension. Since VXLAN tunnels don’t automatically signal if a neighbor is unreachable, BFD acts as the heartbeat. Many platforms even allow BFD to directly influence interface state (for example, the VXLAN interface can be tied to go down when BFD fails) so that any higher-level protocols (like VRRP, dynamic routing, etc.) immediately react to the loss. This prevents lengthy outages and ensures the overlay network remains robust even over a complex WAN. Conclusion Deploying a Layer-2 VXLAN overlay across a Layer-3 WAN unlocks powerful capabilities: you can keep using familiar VLAN-based segmentation across sites while taking advantage of an IP network’s scalability and resilience. It’s a vendor-neutral solution widely supported in modern networking gear. By preserving VLAN tags over the tunnel, VXLAN makes it possible to stretch subnets and broadcast domains to remote locations for workloads that require Layer-2 adjacency. With the huge VNI address space, segmentation can scale for large enterprises or cloud providers well beyond traditional VLAN limits. However, to realize these benefits successfully, careful attention must be paid to MTU and link monitoring. Always accommodate the ~50-byte VXLAN overhead by configuring proper MTUs (or adjusting the overlay’s MTU) – this prevents fragmentation and packet loss that can be very hard to troubleshoot after deployment. And since a VXLAN tunnel’s health isn’t apparent to switches/hosts by default, use tools like BFD to add fast failure detection, thereby avoiding black holes and improving convergence times. In doing so, you ensure that your stretched network is not only functional but also resilient and performant. By following these guidelines – leveraging VXLAN for flexible L2 overlays, minding the MTU, and bolstering with BFD – network engineers can build a robust, wide-area Layer-2 extension that behaves nearly indistinguishably from a local LAN, yet rides on the efficiency and reliability of a Layer-3 IP backbone. Enjoy the best of both worlds: VLANs without borders, and an IP network without unnecessary constraints. References: VXLAN technical overview and best practices from vendor documentation and industry sources have been used to ensure accuracy in the above explanations. This ensures the blog is grounded in real-world proven knowledge while remaining vendor-neutral and applicable to a broad audience of cloud and network professionals.691Views2likes0Comments