updates
15 TopicsUsing Application Gateway to secure access to the Azure OpenAI Service: Customer success story
Introduction A large enterprise customer set out to build a generative AI application using Azure OpenAI. While the app would be hosted on-premises, the customer wanted to leverage the latest large language models (LLMs) available through Azure OpenAI. However, they faced a critical challenge: how to securely access Azure OpenAI from an on-prem environment without private network connectivity or a full Azure landing zone. This blog post walks through how customers overcame these limitations using Application Gateway as a reverse proxy in front of their Azure Open AI along with other Azure services, to meet their security and governance requirements. Customer landscape and challenges The customer’s environment lacked: Private network connectivity (no Site-to-Site VPN or ExpressRoute). This was due to using a new Azure Government environment and not having a cloud operations team set up yet Common network topology such as Virtual WAN and Hub-Spoke network design A full Enterprise Scale Landing Zone (ESLZ) of common infrastructure Security components like private DNS zones, DNS resolvers, API Management, and firewalls This meant they couldn’t use private endpoints or other standard security controls typically available in mature Azure environments. Security was non-negotiable. Public access to Azure OpenAI was unacceptable. Customer needs to: Restrict access to specific IP CIDR ranges from on-prem user machines and data centers Limit ports communicating with Azure OpenAI Implement a reverse proxy with SSL termination and Web Application Firewall (WAF) Use a customer-provided SSL certificate to secure traffic Proposed solution To address these challenges, the customer designed a secure architecture using the following Azure components: Key Azure services Application Gateway – Layer 7 reverse proxy, SSL termination & Web Application Firewall (WAF) Public IP – Allows communication over public internet between customer’s IP addresses & Azure IP addresses Virtual Network – Allows control of network traffic in Azure Network Security Group (NSG) – Layer 4 network controls such as port numbers, service tags using five-tuple information (source, source port, destination, destination port, protocol) Azure OpenAI – Large Language Model (LLM) NSG configuration Inbound Rules: Allow traffic only from specific IP CIDR ranges and HTTP(S) ports Outbound Rules: Target AzureCloud.<region> with HTTP(S) ports (no service tag for Azure OpenAI yet) Application Gateway setup SSL Certificate: Issued by the customer’s on-prem Certificate Authority HTTPS Listener: Uses the on-prem certificate to terminate SSL Traffic flow: Decrypt incoming traffic Scan with WAF Re-encrypt using a well-known Azure CA Override backend hostname Custom health probe: Configured to detect a 404 response from Azure OpenAI (since no health check endpoint exists) Azure OpenAI configuration IP firewall restrictions: Only allow traffic from the Application Gateway subnet Outcome By combining Application Gateway, NSGs, and custom SSL configurations, the customer successfully secured their Azure OpenAI deployment—without needing a full ESLZ or private connectivity. This approach enabled them to move forward with their generative AI app while maintaining enterprise-grade security and governance.269Views0likes0CommentsUnlock visibility, flexibility, and cost efficiency with Application Gateway logging enhancements
Introduction In today’s cloud-native landscape, organizations are accelerating the deployment of web applications at unprecedented speed. But with rapid scale comes increased complexity—and a growing need for deep, actionable visibility into the underlying infrastructure. As businesses embrace modern architectures, the demand for scalable, secure, and observable web applications continues to rise. Azure Application Gateway is evolving to meet these needs, offering enhanced logging capabilities that empower teams to gain richer insights, optimize costs, and simplify operations. This article highlights three powerful enhancements that are transforming how teams use logging in Azure Application Gateway: Resource-specific tables Data collection rule (DCR) transformations Basic log plan Resource-specific tables improve organization and query performance. DCR transformations give teams fine-grained control over the structure and content of their log data. And the basic log plan makes comprehensive logging more accessible and cost-effective. Together, these capabilities deliver a smarter, more structured, and cost-aware approach to observability. Resource-specific tables: Structured and efficient logging Azure Monitor stores logs in a Log Analytics workspace powered by Azure Data Explorer. Previously, when you configured Log Analytics, all diagnostic data for Application Gateway was stored in a single, generic table called AzureDiagnostics. This approach often led to slower queries and increased complexity, especially when working with large datasets. With resource-specific logging, Application Gateway logs are now organised into dedicated tables, each optimised for a specific log type: AGWAccessLogs- Contains access log information AGWPerformanceLogs-Contains performance metrics and data AGWFirewallLogs-Contains Web Application Firewall (WAF) log data This structured approach delivers several key benefits: Simplified queries – Reduces the need for complex filtering and data manipulation Improved schema discovery – Makes it easier to understand log structure and fields Enhanced performance – Speeds up both ingestion and query execution Granular access control – Allows you to grant Azure role-based access control (RBAC) permissions on specific tables Example: Azure diagnostics vs. resource-specific table approach Traditional AzureDiagnostics query: AzureDiagnostics | where ResourceType == "APPLICATIONGATEWAYS" and Category == "ApplicationGatewayAccessLog" | extend clientIp_s = todynamic(properties_s).clientIP | where clientIp_s == "203.0.113.1" New resource-specific table query: AGWAccessLogs | where ClientIP == "203.0.113.1" The resource-specific approach is cleaner, faster, and easier to maintain as it eliminates complex filtering and data manipulation. Data collection rules (DCR) log transformations: Take control of your log pipeline DCR transformations offer a flexible way to shape log data before it reaches your Log Analytics workspace. Instead of ingesting raw logs and filtering them post-ingestion, you can now filter, enrich, and transform logs at the source, giving you greater control and efficiency. Why DCR transformations matter: Optimize costs: Reduce ingestion volume by excluding non-essential data Support compliance: Strip out personally identifiable information (PII) before logs are stored, helping meet GDPR and CCPA requirements Manage volume: Ideal for high-throughput environments where only actionable data is needed Real-world use cases Whether you're handling high-traffic e-commerce workloads, processing sensitive healthcare data, or managing development environments with cost constraints, DCR transformations help tailor your logging strategy to meet specific business and regulatory needs. For implementation guidance and best practices, refer to Transformations Azure Monitor - Azure Monitor Basic log plan - Cost-effective logging for low-priority data Not all logs require real-time analysis. Some are used for occasional debugging or compliance audits. The Basic log plan in Log Analytics provides a cost-effective way to retain high-volume, low-priority diagnostic data—without paying for premium features you may not need. When to use the Basic log plan Save on costs: Pay-as-you-go pricing with lower ingestion rates Debugging and forensics: Retain data for troubleshooting and incident analysis, without paying premium costs for features you don't use regularly Understanding the trade-offs While the Basic plan offers significant savings, it comes with limitations: No real-time alerts: Not suitable for monitoring critical health metrics Query constraints: Limited KQL functionality and additional query costs Choose the Basic plan when deep analytics and alerting aren’t required and focus premium resources on critical logs. Building a smart logging strategy with Azure Application Gateway To get the most out of Azure Application Gateway logging, combine the strengths of all three capabilities: Assess your needs: Identify which logs require real-time monitoring versus those used for compliance or debugging Design for efficiency: Use the Basic log plan for low-priority data, and reserve standard tiers for critical logs Transform at the source: Apply DCR transformations to reduce costs and meet compliance before ingestion Query with precision: Use resource-specific tables to simplify queries and improve performance This integrated approach helps teams achieve deep visibility, maintain compliance, and manage costs.200Views0likes0CommentsMicrosoft Azure scales Hollow Core Fiber (HCF) production through outsourced manufacturing
Introduction As cloud and AI workloads surge, the pressure on datacenter (DC), Metro and Wide Area Network (WAN) networks has never been greater. Microsoft is tackling the physical limits of traditional networking head-on. From pioneering research in microLED technologies to deploying Hollow Core Fiber (HCF) at global scale, Microsoft is reimagining connectivity to power the next era of cloud networking. Azure’s HCF journey has been one of relentless innovation, collaboration, and a vision to redefine the physical layer of the cloud. Microsoft’s HCF, based on the proprietary Double Nested Antiresonant Nodeless Fiber (DNANF) design, delivers up to 47% faster data transmission and approximately 33% lower latency compared to conventional Single Mode Fiber (SMF), bringing significant advantages to the network that powers Azure. Today, Microsoft is announcing a major milestone: the industrial scale-up of HCF production, powered by new strategic manufacturing collaborations with Corning Incorporated (Corning) and Heraeus Covantics (Heraeus). These collaborations will enable Azure to increase the global fiber production of HCF to meet the demands of the growing network infrastructure, advancing the performance and reliability customers expect for cloud and AI workloads. Real-world benefits for Azure customers Since 2023, Microsoft has deployed HCF across multiple Azure regions, with production links meeting performance and reliability targets. As manufacturing scales, Azure plans to expand deployment of the full end-to-end HCF network solution to help increase capacity, resiliency, and speed for customers, with the potential to set new benchmarks for latency and efficiency in fiber infrastructure. Why it matters Microsoft’s proprietary HCF design brings the following improvements for Azure customers: Increased data transmission speeds with up to 33% lower latency. Enhanced signal performance that improves data transmission quality for customers. Improved optical efficiency resulting in higher bandwidth rates compared to conventional fiber. How Microsoft is making it possible To operationalize HCF across Azure with production grade performance, Microsoft is: Deploying a standardized HCF solution with end-to-end systems and components for operational efficiency, streamlined network management, and reliable connectivity across Azure’s infrastructure. Ensuring interoperability with standard SMF environments, enabling seamless integration with existing optical infrastructure in the network for faster deployment and scalable growth. Creating a multinational production supply chain to scale next generation fiber production, ensuring the volumes and speed to market needed for widespread HCF deployment across the Azure network. Scaling up and out With Corning and Heraeus as Microsoft’s first HCF manufacturing collaborators, Azure plans to accelerate deployment to meet surging demand for high-performance connectivity. These collaborations underscore Microsoft’s commitment to enhancing its global infrastructure and delivering a reliable customer experience. They also reinforce Azure’s continued investment in deploying HCF, with a vision for this technology to potentially set the global benchmark for high-capacity fiber innovation. “This milestone marks a new chapter in reimagining the cloud’s physical layer. Our collaborations with Corning and Heraeus establish a resilient, global HCF supply chain so Azure can deliver a standardized, world-class customer experience with ultra-low latency and high reliability for modern AI and cloud workloads.” - Jamie Gaudette, Partner Cloud Network Engineering Manager at Microsoft To scale HCF production, Microsoft will utilize Corning’s established U.S. facilities, while Heraeus will produce out of its sites in both Europe and the U.S. "Corning is excited to expand our longtime collaboration with Microsoft, leveraging Corning’s fiber and cable manufacturing facilities in North Carolina to accelerate the production of Microsoft's Hollow Core Fiber. This collaboration not only strengthens our existing relationship but also underscores our commitment to advancing U.S. leadership in AI innovation and infrastructure. By working closely with Microsoft, we are poised to deliver solutions that meet the demands of AI workloads, setting new benchmarks for speed and efficiency in fiber infrastructure." - Mike O'Day, Senior Vice President and General Manager, Corning Optical Communications “We started our work on HCF a decade ago, teamed up with the Optoelectronics Research Centre (ORC) at the University of Southampton and then with Lumenisity prior to its acquisition. Now, we are excited to continue working with Microsoft on shaping the datacom industry. With leading solutions in glass, tube, preform, and fiber manufacturing, we are ready to scale this disruptive HCF technology to significant volumes. We’ll leverage our proven track record of taking glass and fiber innovations from the lab to widespread adoption, just as we did in the telecom industry, where approximately 2 billion kilometers of fiber are made using Heraeus products.” - Dr. Jan Vydra, Executive Vice President Fiber Optics, Heraeus Covantics Azure engineers are working alongside Corning and Heraeus to operationalize Microsoft manufacturing process intellectual property (IP), deliver targeted training programs, and drive the yield, metrology, and reliability improvements required for scaled production. The collaborations are foundational to a growing standardized, global ecosystem that supports: Glass preform/tubing supply Fiber production at scale Cable and connectivity for deployment into carrier‑grade environments Building on a foundation of innovation: Microsoft’s HCF program In 2022, Microsoft acquired Lumenisity, a spin‑out from the Optoelectronics Research Centre (ORC) at the University of Southampton, UK. That same year, Microsoft launched the world’s first state‑of‑the‑art HCF fabrication facility in the UK to expand production and drive innovation. This purpose-built site continues to support long‑term HCF research, prototyping, and testing, ensuring that Azure remains at the forefront of HCF technology. Working with industry leaders, Microsoft has developed a proven end‑to‑end ecosystem of components, equipment, and HCF‑specific hardware necessary and successfully proven in production deployments and operations. Pushing the boundaries: recent breakthrough research Today, the University of Southampton announced a landmark achievement in optical communications: in collaboration with Azure Fiber researchers, they have demonstrated the lowest signal loss ever recorded for optical fibers (<0.1 dB/km) using research-grade DNANF HCF technology (see figure 4). This breakthrough, detailed in a research paper published in Nature Photonics earlier this month, paves the way for a potential revolution in the field, enabling unprecedented data transmission capacities and longer unamplified spans. ecords at around 1550nm [1] 2002 Nagayama et al. 1 [2] 2025 Sato et al. 2 [3] 2025 research-grade DNANF HCF Petrovich et al. 3 This breakthrough highlights the potential for this technology to transform global internet infrastructure and DC connectivity. Expected benefits include: Faster: Approximately 47% faster, reducing latency, powering real-time AI inference, cloud gaming and other interactive workloads. More capacity: A wider optical spectrum window enabling exponentially greater bandwidth. Future-ready: Lays the groundwork for quantum-safe links, quantum computing infrastructure, advanced sensing, and remote laser delivery. Looking ahead: Unlocking the future of cloud networking The future of cloud networking is being built today! With record-breaking [3] fiber innovations, a rapidly expanding collaborative ecosystem, and the industrialized scale to deliver next-generation performance, Azure continues to evolve to meet the demands for speed, reliability, and connectivity. As we accelerate the deployment of HCF across our global network, we’re not just keeping pace with the demands of AI and cloud, we’re redefining what’s possible. References: [1] Nagayama, K., Kakui, M., Matsui, M., Saitoh, T. & Chigusa, Y. Ultra-low-loss (0.1484 dB/km) pure silica core fibre and extension of transmission distance. Electron. Lett. 38, 1168–1169 (2002). [2] Sato, S., Kawaguchi, Y., Sakuma, H., Haruna, T. & Hasegawa, T. Record low loss optical fiber with 0.1397 dB/km. In Proc. Optical Fiber Communication Conference (OFC) 2024 Tu2E.1 (Optica Publishing Group, 2024). [3] Petrovich, M., Numkam Fokoua, E., Chen, Y., Sakr, H., Isa Adamu, A., Hassan, R., Wu, D., Fatobene Ando, R., Papadimopoulos, A., Sandoghchi, S., Jasion, G., & Poletti, F. Broadband optical fibre with an attenuation lower than 0.1 decibel per kilometre. Nat. Photon. (2025). https://doi.org/10.1038/s41566-025-01747-5 Useful Links: The Deployment of Hollow Core Fiber (HCF) in Azure’s Network How hollow core fiber is accelerating AI | Microsoft Azure Blog Learn more about Microsoft global infrastructure5.7KViews5likes0CommentsCombining firewall protection and SD-WAN connectivity in Azure virtual WAN
Virtual WAN (vWAN) introduces new security and connectivity features in Azure, including the ability to operate managed third-party firewalls and SD-WAN virtual appliances, integrated natively within a virtual WAN hub (vhub). This article will discuss updated network designs resulting from these integrations and examine how to combine firewall protection and SD-WAN connectivity when using vWAN. The objective is not to delve into the specifics of the security or SD-WAN connectivity solutions, but to provide an overview of the possibilities. Firewall protection in vWAN In a vWAN environment, the firewall solution is deployed either automatically inside the vhub (Routing Intent) or manually in a transit VNet (VM-series deployment). Routing Intent (managed firewall) Routing Intent refers to the concept of implementing a managed firewall solution within the vhub for internet protection or private traffic protection (VNet-to-VNet, Branch-to-VNet, Branch-to-Branch), or both. The firewall could be either an Azure Firewall or a third-party firewall, deployed within the vhub as Network Virtual Appliances or a SaaS solution. A vhub containing a managed firewall is called a secured hub. For an updated list of Routing Intent supported third-party solutions please refer to the following links: managed NVAs SaaS solution Transit VNet (unmanaged firewall) Another way to provide inspection in vWAN is to manually deploy the firewall solution in a spoke of the vhub and to cascade the actual spokes behind that transit firewall VNet (aka indirect spoke model or tiered-VNet design). In this discussion, the primary reasons for choosing unmanaged deployments are: either the firewall solution lacks an integrated vWAN offer, or it has an integrated offer but falls short in horizontal scalability or specific features compared to the VM-based version. For a detailed analysis on the pros and cons of each design please refer to this article. SD-WAN connectivity in vWAN Similar to the firewall deployment options, there are two main methods for extending an SDWAN overlay into an Azure vWAN environment: a managed deployment within the vhub, or a standard VM-series deployment in a spoke of the vhub. More options here. SD-WAN in vWAN deployment (managed) In this scenario, a pair of virtual SD-WAN appliances are automatically deployed and integrated in the vhub using dynamic routing (BGP) with the vhub router. Deployment and management processes are streamlined as these appliances are seamlessly provisioned in Azure and set up for a simple import into the partner portal (SD-WAN orchestrator). For an updated list of supported SDWAN partners please refer to this link. For more information on SD-WAN in vWAN deployments please refer to this article. VM-series deployment (unmanaged) This solution requires manual deployment of the virtual SD-WAN appliances in a spoke of the vhub. The underlying VMs and the horizontal scaling are managed by the customer. Dynamic route exchange with the vWAN environment is achieved leveraging BGP peering with the vhub. Alternatively, and depending on the complexity of your addressing plan, static routing may also be possible. Firewall protection and SD-WAN in vWAN THE CHALLENGE! Currently, it is only possible to chain managed third-party SD-WAN connectivity with Azure Firewall in the same vhub, or to use dual-role SD-WAN connectivity and security appliances. Routing Intent provided by third-party firewalls combined with another managed SD-WAN solution inside the same vhub is not yet supported. But how can firewall protection and SD-WAN connectivity be integrated together within vWAN? Solution 1: Routing Intent with Azure Firewall and managed SD-WAN (same vhub) Firewall solution: managed. SD-WAN solution: managed. This design is only compatible with Routing Intent using Azure Firewall, as it is the sole firewall solution that can be combined with a managed SD-WAN in vWAN deployment in that same vhub. With the private traffic protection policy enabled in Routing Intent, all East-West flows (VNet-to-VNet, Branch-to-VNet, Branch-to-Branch) are inspected. Solution 2: Routing Intent with a third-party firewall and managed SD-WAN (2 vhubs) Firewall solution: managed. SD-WAN solution: managed. To have both a third-party firewall managed solution in vWAN and an SD-WAN managed solution in vWAN in the same region, the only option is to have a vhub dedicated to the security solution deployment and another vhub dedicated to the SD-WAN solution deployment. In each region, spoke VNets are connected to the secured vhub, while SD-WAN branches are connected to the vhub containing the SD-WAN deployment. In this design, Routing Intent private traffic protection provides VNet-to-VNet and Branch-to-VNet inspection. However, Branch-to-Branch traffic will not be inspected. Solution 3: Routing Intent and SD-WAN spoke VNet (same vhub) Firewall solution: managed. SD-WAN solution: unmanaged. This design is compatible with any Routing Intent supported firewall solution (Azure Firewall or third-party) and with any SD-WAN solution. With Routing Intent private traffic protection enabled, all East-West flows (VNet-to-VNet, Branch-to-VNet, Branch-to-Branch) are inspected. Solution 4: Transit firewall VNet and managed SDWAN (same vhub) Firewall solution: unmanaged. SD-WAN solution: managed. This design utilizes the indirect spoke model, enabling the deployment of managed SD-WAN in vWAN appliances. This design provides VNet-to-VNet and Branch-to-VNet inspection. But because the firewall solution is not hosted in the hub, Branch-to-Branch traffic will not be inspected. Solution 5 - Transit firewall VNet and SD-WAN spoke VNet (same vhub) Firewall solution: unmanaged. SD-WAN solution: unmanaged. This design integrates both the security and the SD-WAN connectivity as unmanaged solutions, placing the responsibility for deploying and managing the firewall and the SD-WAN hub on the customer. Just like in solution #4, only VNet-to-VNet and Branch-to-VNet traffic is inspected. Conclusion Although it is currently not possible to combine a managed third-party firewall solution with a managed SDWAN deployment within the same vhub, numerous design options are still available to meet various needs, whether managed or unmanaged approaches are preferred.4.3KViews6likes2CommentsAzure virtual network terminal access point (TAP) public preview announcement
What is virtual network TAP? Virtual network TAP allows customers continuously stream virtual machine network traffic to a network packet collector or analytics tool. Many security and performance monitoring tools rely on packet-level insights that are difficult to access in cloud environments. Virtual network TAP bridges this gap by integrating with our industry partners to offer: Enhanced security and threat detection: Security teams can inspect full packet data in real-time to detect and respond to potential threats. Performance monitoring and troubleshooting: Operations teams can analyze live traffic patterns to identify bottlenecks, troubleshoot latency issues, and optimize application performance. Regulatory compliance: Organizations subject to compliance frameworks such as Health Insurance Portability and Accountability Act (HIPAA), and General Data Protection Regulation (GDPR) can use virtual network TAP to capture network activity for auditing and forensic investigations. Why use virtual network TAP? Unlike traditional packet capture solutions that require deploying additional agents or network appliances, virtual network TAP leverages Azure's native infrastructure to enable seamless traffic mirroring without complex configurations and without impacting the performance of the virtual machine. A key advantage is that mirrored traffic does not count towards virtual machine’s network limits, ensuring complete visibility without compromising application performance. Additionally, virtual network TAP supports all Azure virtual machine SKU. Deploying virtual network TAP The portal is a convenient way to get started with Azure virtual network TAP. However, if you have a lot of Azure resources and want to automate the setup you may want to use a PowerShell, CLI, or REST API. Add a TAP configuration on a network interface that is attached to a virtual machine deployed in your virtual network. The destination is a virtual network IP address in the same virtual network as the monitored network interface or a peered virtual network. The collector solution for virtual network TAP can be deployed behind an Azure Internal Load balancer for high availability. You can use the same virtual network TAP resource to aggregate traffic from multiple network interfaces in the same or different subscriptions. If the monitored network interfaces are in different subscriptions, the subscriptions must be associated to the same Microsoft Entra tenant. Additionally, the monitored network interfaces and the destination endpoint for aggregating the TAP traffic can be in peered virtual networks in the same region. Partnering with industry leaders to enhance network monitoring in Azure To maximize the value of virtual network TAP, we are proud to collaborate with industry-leading security and network visibility partners. Our partners provide deep packet inspection, analytics, threat detection, and monitoring solutions that seamlessly integrate with virtual network TAP: Network packet brokers Partner Product Gigamon GigaVUE Cloud Suite for Azure Keysight CloudLens Security analytics, network/application performance management Partner Product Darktrace Darktrace /NETWORK Netscout Omnis Cyber Intelligence NDR Corelight Corelight Open NDR Platform LinkShadow LinkShadow NDR Fortinet FortiNDR Cloud FortiGate VM cPacket cPacket Cloud Suite TrendMicro Trend Vision One™ Network Security Extrahop RevealX Bitdefender GravityZone Extended Detection and Response for Network eSentire eSentire MDR Vectra Vectra NDR AttackFence AttackFence NDR Arista Networks Arista NDR See our partner blogs: Bitdefender + Microsoft Virtual Network TAP: Deepening Visibility, Strengthening Security Streamline Traffic Mirroring in the Cloud with Azure Virtual Network Terminal Access Point (TAP) and Keysight Visibility | Keysight Blogs eSentire | Unlocking New Possibilities for Network Monitoring and… LinkShadow Unified Identity, Data, and Network Platform Integrated with Microsoft Virtual Network TAP Extrahop and Microsoft Extend Coverage for Azure Workloads Resources | Announcing cPacket Partnership with Azure virtual network terminal access point (TAP) Gain Network Traffic Visibility with FortiGate and Azure virtual network TAP Get started with virtual network TAP To learn more and get started, visit our website. We look forward to seeing how you leverage virtual network TAP to enhance security, performance, and compliance in your cloud environment. Stay tuned for more updates as we continue to refine and expand on our feature set! If you have any questions please reach out to us at azurevnettap@microsoft.com.2.6KViews3likes7CommentsThe Deployment of Hollow Core Fiber (HCF) in Azure’s Network
Co-authors: Jamie Gaudette, Frank Rey, Tony Pearson, Russell Ellis, Chris Badgley and Arsalan Saljoghei In the evolving Cloud and AI landscape, Microsoft is deploying state-of-the-art Hollow Core Fiber (HCF) technology in Azure’s network to optimize infrastructure and enhance performance for customers. By deploying cabled HCF technology together with HCF-supportable datacenter (DC) equipment, this solution creates ultra-low latency traffic routes with faster data transmission to meet the demands of Cloud & AI workloads. The successful adoption of HCF technology in Azure’s network relies on developing a new ecosystem to take full advantage of the solution, including new cables, field splicing, installation and testing… and Microsoft has done exactly that. Azure has collaborated with industry leaders to deliver components and equipment, cable manufacturing and installation. These efforts, along with advancements in HCF technology, have paved the way for its deployment in-field. HCF is now operational and carrying live customer traffic in multiple Microsoft Azure regions, proving it is as reliable as conventional fiber with no field failures or outages. This article will explore the installation activities, testing, and link performance of a recent HCF deployment, showcasing the benefits that Azure customers can leverage from HCF technology. HCF connected Azure DCs are ready for service The latest HCF cable deployment connects two Azure DCs in a major city, with two metro routes each over 20km long. The hybrid cables both include 32 HCF and 48 single mode fiber (SMF) strands, with HCFs delivering high-capacity Dense Wavelength Division Multiplexing (DWDM) transmission comparable to SMF. The cables are installed over two diverse paths (the red and blue lines shown in image 1), each with different entry points into the DC. Route diversity at the physical layer enhances network resilience and reliability by allowing traffic to be rerouted through alternate paths, minimizing the risk of network outage should there be a disruption. It also allows for increased capacity by distributing network traffic more evenly, improving overall network performance and operational efficiency. Image 1: Satellite image of two Azure DC sites (A & Z) within a metro region interconnected with new ultra-low latency HCF technology, using two diverse paths (blue & red) Image 2 shows the optical routing that the deployed HCF cables take through both Inside Plant (ISP) and Outside Plant (OSP), for interconnecting terminal equipment within key sites in the region (comprised of DCs, Network Gateways and PoPs). Image 2: Optical connectivity at the physical layer between DCA and DCZ The HCF OSP cables have been developed for outdoor use in harsh environments without degrading the propagation properties of the fiber. The cable technology is smaller, faster, and easier to install (using a blown installation method). Alongside cables, various other technologies have been developed and integrated to provide a reliable end-to-end HCF network solution. This includes dedicated HCF-compatible equipment (shown in image 3), such as custom cable joint enclosures, fusion splicing technology, HCF patch tails for cable termination in the DC, and a HCF custom-designed Optical Time Domain Reflectometer (OTDR) to locate faults in the link. These solutions work with commercially available transponders and DWDM technologies to deliver multi-Tb/s capacities for Azure customers. Looking more closely at a HCF cable installation, in image 4 the cable is installed by passing it through a blowing-head (1) and inserting it into pre-installed conduit in segments underground along the route. As with traditional installations with conventional cable, the conduit, cable entry/exit, and cable joints are accessible through pre-installed access chambers, typically a few hundred meters apart. The blowing head uses high-pressure air from a compressor to push the cable into the conduit. A single drum-length of cable can be re-fleeted above ground (2) at multiple access points and re-jetted (3) over several kilometers. After the cables are installed inside the conduit, they are jointed at pre-designated access chamber locations. These house the purpose designed cable joint enclosures. Image 4: Cable preparation and installation during in-field deployment Image 5 shows a custom HCF cable joint enclosure in the field, tailored to protect HCFs for reliable data transmission. These enclosures organize the many HCF splices inside and are placed in underground chambers across the link. Image 5: 1) HCF joint enclosure in a chamber in-field 2) Open enclosure showing fiber loop storage protected by colored tubes at the rear-side of the joint 3) Open enclosure showing HCF spliced on multiple splice tray layers Inside the DC, connectorized ‘plug-and-play’ HCF-specific patch tails have been developed and installed for use with existing DWDM solutions. The patch tails interface between the HCF transmission and SMF active and passive equipment, each containing two SMF compatible connectors, coupled to the ISP HCF cable. In image 6, this has been terminated to a patch panel and mated with existing DWDM equipment inside the DC. Image 6: HCF patch tail solution connected to DWDM equipment Testing To validate the end-to-end quality of the installed HCF links (post deployment and during its operation), field deployable solutions have been developed and integrated to ensure all required transmission metrics are met and to identify and restore any faults before the link is ready for customer traffic. One such solution is Microsoft’s custom-designed HCF-specific OTDR, which helps measure individual splice losses and verify attenuation in all cable sections. This is checked against rigorous Azure HCF specification requirements. The OTDR tool is invaluable for locating high splice losses or faults that need to be reworked before the link can be brought into service. The diagram below shows an OTDR trace detecting splice locations and splice loss levels (dB) across a single strand of installed HCF. The OTDR can also continuously monitor HCF links and quickly locate faults, such as cable cuts, for quick recovery and remediation. For this deployment, a mean splice loss of 0.16dB was achieved, with some splices as low as 0.04dB, comparable to conventional fiber. Low attenuation and splice loss helps to maintain higher signal integrity, supporting longer transmission reach and higher traffic capacity. There are ongoing Azure HCF roadmap programs to continually improve this. Performance Before running customer traffic on the link, the fiber is tested to ensure reliable, error-free data transmission across the operating spectrum by counting lost or error bits. Once confirmed, the link is moved into production, allowing customer traffic to flow on the route. These optical tests, tailored to HCF, are carried out by the installer to meet Azure’s acceptance requirements. Image 8 illustrates the flow of traffic across a HCF link, dictated by changing demand on capacity and routing protocols in the region, which fluctuate throughout the day. The HCF span supports varying levels of customer traffic from the point the link was made live, without incurring any outages or link flaps. A critical metric for measuring transmission performance over each HCF path is the instantaneous Pre-Forward Error Correction (FEC) Bit Error Rate (BER) level. Pre-FEC BERs measure errors in a digital data stream at the receiver before any error correction is applied. This is crucial for transmission quality when the link carries data traffic; lower levels mean fewer errors and higher signal quality, essential for reliable data transmission. The following graph (image 9) shows the evolution of the Pre-FEC BER level on a HCF span once the link is live. A single strand of HCF is represented by a color, with all showing minimal fluctuation. This demonstrates very stable Pre-FEC BER levels, well below < -3.4 (log 10 ), across all 400G optical transponders, operating over all channels during a 24-day period. This indicates the network can handle high-data transmission efficiently with no Post-FEC errors, leading to high customer traffic performance and reliability. Image 9: Very stable Pre-FEC BER levels across the HCF span over 20 days The graph below demonstrates the optical loss stability over one entire span which is comprised of two HCF strands. It was monitored continuously over 20 days using the inbuilt line system and measured in both directions to assess the optical health of the HCF link. The new HCF cable paths are live and carrying customer traffic across multiple Azure regions. Having demonstrated the end-to-end deployment capabilities and network compatibility of the HCF solution, it is possible to take full advantage of the ultra-stable, high performance and reliable connectivity HCF delivers to Azure customers. What’s next? Unlocking the full potential of HCF requires compatible, end-to-end solutions. This blog outlines the holistic and deployable HCF systems we have developed to better serve our customers. While we further integrate HCF into more Azure Regions, our development roadmap continues. Smaller cables with more fibers and enhanced systems components to further increase the capacity of our solutions, standardized and simplified deployment and operations, as well as extending the deployable distance of HCF long haul transmission solutions. Creating a more stable, higher capacity, faster network will allow Azure to better serve all its customers. Learn more about how hollow core fiber is accelerating AI. Recently published HCF research papers: Ultra high resolution and long range OFDRs for characterizing and monitoring HCF DNANFs Unrepeated HCF transmission over spans up to 301.7km10KViews10likes1CommentWhat's New in the World of eBPF from Azure Container Networking!
Azure Container Networking Interface (CNI) continues to evolve, now bolstered by the innovative capabilities of Cilium. Azure CNI Powered by Cilium (ACPC) leverages Cilium’s extended Berkeley Packet Filter (eBPF) technologies to enable features such as network policy enforcement, deep observability, and improved service routing. Here’s a deeper look into the latest features that make management of Azure Kubernetes Service (AKS) clusters more efficient, scalable, and secure. Improved Performance: Cilium Endpoint Slices One of the standout features in the recent updates is the introduction of CiliumEndpointSlice. This feature significantly enhances the performance and scalability of the Cilium dataplane in AKS clusters. Previously, Cilium used Custom Resource Definitions (CRDs) called CiliumEndpoints to manage pods. Each pod had a CiliumEndpoint associated with it, which contained information about the pod’s status and properties. However, this approach placed significant stress on the control plane, especially in larger clusters. To alleviate this load, CiliumEndpointSlice batches CiliumEndpoints and their updates, reducing the number of updates propagated to the control plane. Our performance testing has shown remarkable improvements: Average API Server Responsiveness: Upto 50% decrease in latency, meaning faster processing of queries. Pod Startup Latencies: Upto 60% reduction, allowing for faster deployment and scaling. In-Cluster Network Latency: Upto 80% decrease, translating to better application performance. Note that this feature is Generally Available in AKS clusters, by default, using Cilium 1.17 release and above and does not require additional configuration changes! Learn more about improvements unlocked by CiliumEndpointSlices with Azure CNI by Cilium - High-Scale Kubernetes Networking with Azure CNI Powered by Cilium | Microsoft Community Hub. Deployment Flexibility: Dual Stack for Cilium Network Policies Kubernetes clusters operating on an IPv4/IPv6 dual-stack network enable workloads to natively access both IPv4 and IPv6 endpoints without incurring additional complexities or performance drawbacks. Previously, we had enabled the use of dual stack networking on AKS clusters (starting with AKS 1.29) running Azure CNI powered by Cilium in preview mode. Now, we are happy to announce that the feature is Generally Available! By enabling both IPv4 and IPv6 addressing, you can manage your production AKS clusters in mixed environments, accommodating various network configurations seamlessly. More importantly, dual-stack support in Azure CNI’s Cilium network policies extend security benefits for AKS clusters in those complex environments. For instance, you can enable dual stack AKS clusters using eBPF dataplane as follows: az aks create \ --location <region> \ --resource-group <resourceGroupName> \ --name <clusterName> \ --network-plugin azure \ --network-plugin-mode overlay \ --network-dataplane cilium \ --ip-families ipv4,ipv6 \ --generate-ssh-keys Learn more about Azure CNI’s Network Policies - Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn Ease of Use: Node Subnet Mode with Cilium Azure CNI now supports Node Subnet IPAM mode with Cilium Dataplane. In Node Subnet mode IP addresses to pods are assigned from the same subnet as the node itself, simplifying routing and policy management. This mode is particularly beneficial for smaller clusters where managing multiple subnets is cumbersome. AKS clusters using this mode also gain the benefits of improved network observability, Cilium Network Policies and FQDN filtering and more capabilities unlocked by Advanced Container Networking Services (ACNS). More notable, with this feature we now support all IPAM configuration options with eBPF dataplane on AKS clusters. You can create an AKS cluster with node subnet IPAM mode and eBPF dataplane as follows: az aks create \ --name <clusterName> \ --resource-group <resourceGroupName> \ --location <location> \ --network-plugin azure \ --network-dataplane cilium \ --generate-ssh-keys Learn more about Node Subnet - Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn. Defense-in-depth: Cilium Layer 7 Policies Azure CNI by Cilium extends its comprehensive Layer4 network policy capabilities to Layer7, offering granular control over application traffic. This feature enables users to define security policies based on application-level protocols and metadata, adding a powerful layer of security and compliance management. Layer7 policies are implemented using Envoy, an open-source service proxy, which is part of ACNS Security Agent operating in conjunction with the Cilium agent. Envoy handles traffic between services and provides necessary visibility and control at the application layer. Policies can be enforced based on HTTP and gRPC methods, paths, headers, and other application-specific attributes. Additionally, Cilium Network Policies support Kafka-based workflows, enhancing security and traffic management. This feature is currently in public preview mode and you can learn more about the getting started experience here - Introducing Layer 7 Network Policies with Advanced Container Networking Services for AKS Clusters! | Microsoft Community Hub. Coming Soon: Transparent Encryption with Wireguard By leveraging Cilium’s Wireguard, customers can achieve regulatory compliance by ensuring that all network traffic, whether HTTP-based or non-HTTP, is encrypted. Users can enable inter-node transparent encryption in their Kubernetes environments using Cilium’s open-source based solution. When Wireguard is enabled, the cilium agent on each cluster node will establish a secure Wireguard tunnel with all other known nodes in the cluster to encrypt traffic between cilium endpoints. This feature will soon be in public preview and will be enabled as part of ACNS. Stay tuned for more details on this. Conclusion These new features in Azure CNI Powered by Cilium underscore our commitment to enhancing default network performance and security in your AKS environments, all while collaborating with the open-source community. From the impressive performance boost with CiliumEndpointSlice to the adaptability of dual-stack support and the advanced security of Layer7 policies and Wireguard based encryption, these innovations ensure your AKS clusters are not just ready for today but are primed for the future. Also, don’t forget to dive into the fascinating world of eBPF-based observability in multi-cloud environments! Check out our latest post - Retina: Bridging Kubernetes Observability and eBPF Across the Clouds. Why wait, try these out now! Stay tuned to the AKS public roadmap for more exciting developments! For additional information, visit the following resources: For more info about Azure CNI Powered by Cilium visit - Configure Azure CNI Powered by Cilium in AKS. For more info about ACNS visit Advanced Container Networking Services (ACNS) for AKS | Microsoft Learn.677Views0likes0CommentsAnnouncing the General Availability of Azure Load Balancer Health Event Logs
Health event logs are now fully available in all public, Azure China, and Government regions under the Azure Monitor resource log category LoadBalancerHealthEvent, providing you with enhanced capabilities to monitor and troubleshoot your load balancer resources. Health Event Types As announced in our previous public preview blog, the following health events are now logged when detected by the Azure Load Balancer platform. These events are designed to address the most critical issues affecting your load balancer’s health and availability: LoadBalancerHealthEventType Scenario DataPathAvailabilityWarning Detect when the Data Path Availability metric of the frontend IP is less than 90% due to platform issues DataPathAvailabilityCritical Detect when the Data Path Availability metric of the frontend IP is less than 25% due to platform issues NoHealthyBackends Detect when all backend instances in a pool are not responding to the configured health probes HighSnatPortUsage Detect when a backend instance utilizes more than 75% of its allocated ports from a single frontend IP SnatPortExhaustion Detect when a backend instance has exhausted all allocated ports and will fail further outbound connections until ports have been released or more ports are allocated Benefits of Using Health Event Logs Health event logs provide deeper insights into the health of your load balancer, eliminating the need to set thresholds for metric-based alerts or manage complex metric data for historical analysis. Here’s how you can get started using these logs today: Create Diagnostic Settings: Archive or analyze these logs for long-term insights. Leverage Log Analytics: Use powerful querying capabilities to gain detailed insights. Configure Alerts: Set up alerts to trigger actions based on the generated logs. For more detailed instructions on how to enable and use health event logs, refer to our documentation here. Contoso’s Story Context: Contoso uses a Standard Public Load Balancer with outbound rules to connect their application to public APIs. They allocate 8k ports to each backend instance using an outbound rule, anticipating up to 8 backend instances in a pool. Problem: Contoso is concerned about SNAT port exhaustion and wants to create alerts to warn them if backend instances are close to consuming all allocated SNAT ports. Solution with metrics: Initially, they create an alert using the Used SNAT ports metric, triggering when the value exceeds 6k ports (out of 8k). However, this requires constant adjustment as they scale their infrastructure and update port allocation on outbound rules. Solution with health event logs: With the new health event logs, Contoso configures two alerts: HighSnatPortUsage: Sends an email and creates an incident whenever this event is generated, warning network engineers to allocate more SNAT ports. SnatPortExhaustion: Notifies the on-call engineer immediately to address critical impact to outbound connectivity due to lack of SNAT ports. Now, Contoso no longer needs to adjust alert rules as they scale, ensuring seamless monitoring and response. What’s Next? This general availability announcement marks a significant step in enhancing the health and monitoring capabilities of Azure Load Balancer. We are committed to expanding these capabilities with additional health event types, providing configuration guidance, best practices, and warnings for service-related limits. We welcome your feedback and look forward to hearing about your experiences with health event logs. Get started today by exploring our public documentation. Stay tuned on Azure Updates for future announcements and enhancements!725Views1like0CommentsTroubleshoot health probe failures with Azure Load Balancer Health Status
In today's fast-paced cloud computing environment, maintaining the optimal performance and reliability of your applications is crucial. Azure Load Balancer's Health Status feature , now generally available to customers, significantly simplifies this task by providing detailed health information about your backend instances without the need to file a support ticket. This tool offers invaluable insights into the health state of each backend instance and the specific reasons behind their status, whether user-triggered or platform-triggered. By leveraging this feature, customers can proactively address issues, ensure minimal downtime, and enhance the overall user experience, all while reducing reliance on support services. What is Health Status? Health Status is an Azure Load Balancer feature that gives you detailed health information about the backend instances connected to your Azure Load Balancer’s backend pool. Each status is linked to your load balancing rules and provides two key insights: the health state of each backend instance and the reasoning behind its state. The health state indicates whether your backend instance is healthy ("Up") or unhealthy ("Down"). The reasoning behind these states is explained through reason codes, which fall into two categories: User Triggered Reason Codes and Platform Triggered Reason Codes. User Triggered Reason Codes are based on how you configured your load balancer setup and can be addressed by you. Platform Triggered Reason Codes are based on the Azure Load Balancer platform and cannot be addressed by you. For more information about the different reason codes, view our public documentation. Why use Health Status? In the past, customers were not provided with insights into why their backend instances were deemed healthy or unhealthy. To access this crucial information, customers often had to follow troubleshooting procedures such as taking packet captures or going through the process of creating a support ticket, relying on support engineers to identify the cause of a failed health probe. This process was not only complex and time-consuming but also incurred additional costs and added significant management overhead. Now, with the Health Status feature, customers can easily access real-time health information of their backend instances. This empowers them to make swift and informed decisions, minimizing downtime, reducing support costs, and enhancing the overall user experience. By leveraging these insights, customers can proactively manage their environment and ensure optimal performance. Retrieving Health Status Health Status can be easily retrieved on a per load balancing rule basis. To retrieve Health Status: Sign in to the Azure Portal and search for "Load balancers". Select your load balancer and navigate to "Load balancing rules" under Settings. View the health status of the rule by clicking “View details” value of the corresponding rule. Refresh button can be used to get the latest status. Figure 1: Sample Health Status in Azure Portal Contoso's Utilization of Health Status for Game Server Maintenance Let’s explore how one of our customers, Contoso, uses the Health Status feature for efficient decision-making and troubleshooting. Who is Contoso and what is their issue Contoso, a prominent name in the gaming industry, has been leveraging Azure Load Balancer to distribute traffic to their highly popular game server hosted on Azure Virtual Machine Scale Sets. Their users love using Contoso’s servers due to the reliability and performance achieved on them. Recently, Contoso encountered an issue where one of their game servers became unhealthy, leading to disruptions in the gaming experience for their users. How Health Status resolved their issue Thanks to the Azure Load Balancer Health Status feature, the Contoso team was able to quickly navigate to the Load balancing rule page in Portal to view the health status of the unhealthy virtual machine instance. By doing so, they retrieved detailed insights into why their game server was marked unhealthy. This real-time information highlighted “the backend instance was unhealthy due to Admin State set to Down”. Armed with this crucial data, Contoso's Network team swiftly addressed the configuration issue by toggling the Admin State value of unhealthy server to “None”, thereby restoring the server to a healthy state. After a root cause analysis, it was determined that the previous engineer mistakenly toggled the wrong server to a Down Admin State value when trying to do fixes on another server. Benefits of using Health Status Instead of creating a support ticket and waiting for assistance, they utilized the Health Status feature to diagnose and resolve the problem independently. This proactive approach not only minimized downtime but also reduced support costs and enhanced the overall user experience. Conclusion By incorporating the Health Status feature into their operational workflow, Contoso has been able to make efficient, data-driven decisions and troubleshooting issues promptly, ensuring their gaming services remain robust and reliable for their users. Get Started We are excited to bring the Azure Load Balancer’s Health Status feature to you. This feature provides valuable insights into the health of your backend instances, helping you ensure better troubleshooting for optimal performance and reliability of your applications. For more information and to get started, visit the following links: Overview of health status concepts How to retrieve health status We hope you can take advantage of this feature, and we welcome your feedback. Please feel free to leave a comment below.1.1KViews1like0Comments