well-architected
51 TopicsAzure Virtual Network Manager + Azure Virtual WAN
Azure continues to expand its networking capabilities, with Azure Virtual Network Manager (AVNM) and Azure Virtual WAN (vWAN) standing out as two of the most transformative services. When deployed together, they offer the best of both worlds: the operational simplicity of a managed hub architecture combined with the ability for spoke VNets to communicate directly, avoiding additional hub hops and minimizing latency Revisiting the Classic Hub-and-Spoke Pattern (Add traditional Hub-and-spoke role.png from attachment) However, this architecture comes with a trade-off: every spoke-to-spoke packet must route through the hub, introducing additional network hops, increased latency, and potential throughput constraints. How Virtual WAN Modernizes That Design Virtual WAN replaces a do-it-yourself hub VNet with a fully managed hub service: Managed hubs – Azure owns and operates the hub infrastructure. Automatic route propagation – routes learned once are usable everywhere. Integrated add-ons – Firewalls, VPN, and ExpressRoute ports are first-class citizens. By default, Virtual WAN enables any-to-any routing between spokes. Traffic transits the hub fabric automatically—no configuration required. Why Direct Spoke Mesh? Certain patterns require single-hop connectivity Micro-service meshes that sit in different spokes and exchange chatty RPC calls. Database replication / backups where throughput counts, and hub bandwidth is precious. Dev / Test / Prod spokes that need to sync artifacts quickly yet stay isolated from hub services. Segmentation mandates where a workload must bypass hub inspection for compliance yet still talk to a partner VNet. Benefits Lower latency – the hub detour disappears. Better bandwidth – no hub congestion or firewall throughput cap. Higher resilience – spoke pairs can keep talking even if the hub is under maintenance. The Peering Explosion problem With pure VNet peering, the math escalates fast: For n spokes you need n × (n-1)/2 links. Ten spokes? 45 peerings. Add four more? Now 91. Each extra peering forces you to: Touch multiple route tables. Update NSG rules to cover the new paths. Repeat every time you add or retire a spoke. Troubleshoot an ever-growing spider web. Where Azure Virtual Network Manager Steps In? AVNM introduces Network Groups plus a Mesh connectivity policy: (add attachement AVNM concept and what it gives you.png from attachment) Operational complexity collapses from O(n²) to O(1)—you manage a group, not 100+ individual peerings. A Complementary Model: AVNM Mesh Inside vWAN Since AVNM works on any Azure VNet—including the VNets you already attach to a vWAN hub—you can apply mesh policies on top of your exhisting managed hub architecture: Spoke VNets join a vWAN hub for branch connectivity, centralized firewalling, or multi-region reach. The same spokes are added to an AVNM Network Group with a mesh policy. AVNM builds direct peering links between the spokes, while vWAN continues to advertise and learn routes. Result: All VNets still benefit from vWAN’s global routing and on-premises integration. Latency-critical east-west flows now travel the shortest path—one hop—as if the VNets were traditionally peered. Rather than choosing one over the other, organizations can leverage both vWAN and AVNM together, as the combination enhances the strengths of each service. Performance Illustration Spoke-to-Spoke Communication with Virtual WAN without AVNM Mesh: (add Spoke-to-Spoke Communication with Virtual WAN without AVNM Mesh.png from attachment) Spoke-to-Spoke Communication with Virtual WAN with AVNM Mesh: (add Spoke-to-Spoke Communication with Virtual WAN with AVNM Mesh.png from attachment) Observability & Protection NSG Flow Logs – granular packet logs on every peered VNet. AVNM Admin Rules – org-wide guardrails that trump local NSGs. Azure Monitor + SIEM – route flow logs to Log Analytics, Sentinel, or third-party SIEM for threat detection. Layered design – hub firewalls inspect north-south traffic; NSGs plus admin rules secure east-west flows. Putting It All Together Virtual WAN offers fully managed global connectivity, simplifying the integration of branch offices and on-premises infrastructure into your Azure environment. AVNM Mesh establishes direct communication paths between spoke VNets, making it ideal for workloads requiring high throughput or minimal latency in east-west traffic patterns. When combined, these services provide architects with granular control over traffic routing. Each flow can be directed through hub services when needed or routed directly between spokes for optimal performance—all without re-architecting your network or creating additional management complexity. By pairing AVNM’s group-based mesh with VWAN’s managed hubs, you get the best of both worlds: worldwide reach, centralized security, and single-hop performance where it counts.107Views0likes0CommentsDeploying Third-Party Firewalls in Azure Landing Zones: Design, Configuration, and Best Practices
As enterprises adopt Microsoft Azure for large-scale workloads, securing network traffic becomes a critical part of the platform foundation. Azure’s Well-Architected Framework provides the blueprint for enterprise-scale Landing Zone design and deployments, and while Azure Firewall is a built-in PaaS option, some organizations prefer third-party firewall appliances for familiarity, feature depth, and vendor alignment. This blog explains the basic design patterns, key configurations, and best practices when deploying third-party firewalls (Palo Alto, Fortinet, Check Point, etc.) as part of an Azure Landing Zone. 1. Landing Zone Architecture and Firewall Role The Azure Landing Zone is Microsoft’s recommended enterprise-scale architecture for adopting cloud at scale. It provides a standardized, modular design that organizations can use to deploy and govern workloads consistently across subscriptions and regions. At its core, the Landing Zone follows a hub-and-spoke topology: Hub (Connectivity Subscription): Central place for shared services like DNS, private endpoints, VPN/ExpressRoute gateways, Azure Firewall (or third-party firewall appliances), Bastion, and monitoring agents. Provides consistent security controls and connectivity for all workloads. Firewalls are deployed here to act as the traffic inspection and enforcement point. Spokes (Workload Subscriptions): Application workloads (e.g., SAP, web apps, data platforms) are placed in spoke VNets. Additional specialized spokes may exist for Identity, Shared Services, Security, or Management. These are isolated for governance and compliance, but all connectivity back to other workloads or on-premises routes through the hub. Traffic Flows Through Firewalls North-South Traffic: Inbound connections from the Internet (e.g., customer access to applications). Outbound connections from Azure workloads to Internet services. Hybrid connectivity to on-premises datacenters or other clouds. Routed through the external firewall set for inspection and policy enforcement. East-West Traffic: Lateral traffic between spokes (e.g., Application VNet to Database VNet). Communication across environments like Dev → Test → Prod (if allowed). Routed through an internal firewall set to apply segmentation, zero-trust principles, and prevent lateral movement of threats. Why Firewalls Matter in the Landing Zone While Azure provides NSGs (Network Security Groups) and Route Tables for basic packet filtering and routing, these are not sufficient for advanced security scenarios. Firewalls add: Deep packet inspection (DPI) and application-level filtering. Intrusion Detection/Prevention (IDS/IPS) capabilities. Centralized policy management across multiple spokes. Segmentation of workloads to reduce blast radius of potential attacks. Consistent enforcement of enterprise security baselines across hybrid and multi-cloud. Organizations May Choose Depending on security needs, cost tolerance, and operational complexity, organizations typically adopt one of two models for third party firewalls: Two sets of firewalls One set dedicated for north-south traffic (external to Azure). Another set for east-west traffic (between VNets and spokes). Provides the highest security granularity, but comes with higher cost and management overhead. Single set of firewalls A consolidated deployment where the same firewall cluster handles both east-west and north-south traffic. Simpler and more cost-effective, but may introduce complexity in routing and policy segregation. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. 2. Why Choose Third-Party Firewalls Over Azure Firewall? While Azure Firewall provides simplicity as a managed service, customers often choose third-party solutions due to: Advanced features – Deep packet inspection, IDS/IPS, SSL decryption, threat feeds. Vendor familiarity – Network teams trained on Palo Alto, Fortinet, or Check Point. Existing contracts – Enterprise license agreements and support channels. Hybrid alignment – Same vendor firewalls across on-premises and Azure. Azure Firewall is a fully managed PaaS service, ideal for customers who want a simple, cloud-native solution without worrying about underlying infrastructure. However, many enterprises continue to choose third-party firewall appliances (Palo Alto, Fortinet, Check Point, etc.) when implementing their Landing Zones. The decision usually depends on capabilities, familiarity, and enterprise strategy. Key Reasons to Choose Third-Party Firewalls Feature Depth and Advanced Security Third-party vendors offer advanced capabilities such as: Deep Packet Inspection (DPI) for application-aware filtering. Intrusion Detection and Prevention (IDS/IPS). SSL/TLS decryption and inspection. Advanced threat feeds, malware protection, sandboxing, and botnet detection. While Azure Firewall continues to evolve, these vendors have a longer track record in advanced threat protection. Operational Familiarity and Skills Network and security teams often have years of experience managing Palo Alto, Fortinet, or Check Point appliances on-premises. Adopting the same technology in Azure reduces the learning curve and ensures faster troubleshooting, smoother operations, and reuse of existing playbooks. Integration with Existing Security Ecosystem Many organizations already use vendor-specific management platforms (e.g., Panorama for Palo Alto, FortiManager for Fortinet, or SmartConsole for Check Point). Extending the same tools into Azure allows centralized management of policies across on-premises and cloud, ensuring consistent enforcement. Compliance and Regulatory Requirements Certain industries (finance, healthcare, government) require proven, certified firewall vendors for security compliance. Customers may already have third-party solutions validated by auditors and prefer extending those to Azure for consistency. Hybrid and Multi-Cloud Alignment Many enterprises run a hybrid model, with workloads split across on-premises, Azure, AWS, or GCP. Third-party firewalls provide a common security layer across environments, simplifying multi-cloud operations and governance. Customization and Flexibility Unlike Azure Firewall, which is a managed service with limited backend visibility, third-party firewalls give admins full control over operating systems, patching, advanced routing, and custom integrations. This flexibility can be essential when supporting complex or non-standard workloads. Licensing Leverage (BYOL) Enterprises with existing enterprise agreements or volume discounts can bring their own firewall licenses (BYOL) to Azure. This often reduces cost compared to pay-as-you-go Azure Firewall pricing. When Azure Firewall Might Still Be Enough Organizations with simple security needs (basic north-south inspection, FQDN filtering). Cloud-first teams that prefer managed services with minimal infrastructure overhead. Customers who want to avoid manual scaling and VM patching that comes with IaaS appliances. In practice, many large organizations use a hybrid approach: Azure Firewall for lightweight scenarios or specific environments, and third-party firewalls for enterprise workloads that require advanced inspection, vendor alignment, and compliance certifications. 3. Deployment Models in Azure Third-party firewalls in Azure are primarily IaaS-based appliances deployed as virtual machines (VMs). Leading vendors publish Azure Marketplace images and ARM/Bicep templates, enabling rapid, repeatable deployments across multiple environments. These firewalls allow organizations to enforce advanced network security policies, perform deep packet inspection, and integrate with Azure-native services such as Virtual Network (VNet) peering, Azure Monitor, and Azure Sentinel. Note: Some vendors now also release PaaS versions of their firewalls, offering managed firewall services with simplified operations. However, for the purposes of this blog, we will focus mainly on IaaS-based firewall deployments. Common Deployment Modes Active-Active Description: In this mode, multiple firewall VMs operate simultaneously, sharing the traffic load. An Azure Load Balancer distributes inbound and outbound traffic across all active firewall instances. Use Cases: Ideal for environments requiring high throughput, resilience, and near-zero downtime, such as enterprise data centers, multi-region deployments, or mission-critical applications. Considerations: Requires careful route and policy synchronization between firewall instances to ensure consistent traffic handling. Typically involves BGP or user-defined routes (UDRs) for optimal traffic steering. Scaling is easier: additional firewall VMs can be added behind the load balancer to handle traffic spikes. Active-Passive Description: One firewall VM handles all traffic (active), while the secondary VM (passive) stands by for failover. When the active node fails, Azure service principals manage IP reassignment and traffic rerouting. Use Cases: Suitable for environments where simpler management and lower operational complexity are preferred over continuous load balancing. Considerations: Failover may result in a brief downtime, typically measured in seconds to a few minutes. Synchronization between the active and passive nodes ensures firewall policies, sessions, and configurations are mirrored. Recommended for smaller deployments or those with predictable traffic patterns. Network Interfaces (NICs) Third-party firewall VMs often include multiple NICs, each dedicated to a specific type of traffic: Untrust/Public NIC: Connects to the Internet or external networks. Handles inbound/outbound public traffic and enforces perimeter security policies. Trust/Internal NIC: Connects to private VNets or subnets. Manages internal traffic between application tiers and enforces internal segmentation. Management NIC: Dedicated to firewall management traffic. Keeps administration separate from data plane traffic, improving security and reducing performance interference. HA NIC (Active-Passive setups): Facilitates synchronization between active and passive firewall nodes, ensuring session and configuration state is maintained across failovers. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. : NICs of Palo Alto External Firewalls and FortiGate Internal Firewalls in two sets of firewall scenario 4. Key Configuration Considerations When deploying third-party firewalls in Azure, several design and configuration elements play a critical role in ensuring security, performance, and high availability. These considerations should be carefully aligned with organizational security policies, compliance requirements, and operational practices. Routing User-Defined Routes (UDRs): Define UDRs in spoke Virtual Networks to ensure all outbound traffic flows through the firewall, enforcing inspection and security policies before reaching the Internet or other Virtual Networks. Centralized routing helps standardize controls across multiple application Virtual Networks. Depending on the architecture traffic flow design, use appropriate Load Balancer IP as the Next Hop on UDRs of spoke Virtual Networks. Symmetric Routing: Ensure traffic follows symmetric paths (i.e., outbound and inbound flows pass through the same firewall instance). Avoid asymmetric routing, which can cause stateful firewalls to drop return traffic. Leverage BGP with Azure Route Server where supported, to simplify route propagation across hub-and-spoke topologies. : Azure UDR directing all traffic from a Spoke VNET to the Firewall IP Address Policies NAT Rules: Configure DNAT (Destination NAT) rules to publish applications securely to the Internet. Use SNAT (Source NAT) to mask private IPs when workloads access external resources. Security Rules: Define granular allow/deny rules for both north-south traffic (Internet to VNet) and east-west traffic (between Virtual Networks or subnets). Ensure least privilege by only allowing required ports, protocols, and destinations. Segmentation: Apply firewall policies to separate workloads, environments, and tenants (e.g., Production vs. Development). Enforce compliance by isolating workloads subject to regulatory standards (PCI-DSS, HIPAA, GDPR). Application-Aware Policies: Many vendors support Layer 7 inspection, enabling controls based on applications, users, and content (not just IP/port). Integrate with identity providers (Azure AD, LDAP, etc.) for user-based firewall rules. : Example Configuration of NAT Rules on a Palo Alto External Firewall Load Balancers Internal Load Balancer (ILB): Use ILBs for east-west traffic inspection between Virtual Networks or subnets. Ensures that traffic between applications always passes through the firewall, regardless of origin. External Load Balancer (ELB): Use ELBs for north-south traffic, handling Internet ingress and egress. Required in Active-Active firewall clusters to distribute traffic evenly across firewall nodes. Other Configurations: Configure health probes for firewall instances to ensure faulty nodes are automatically bypassed. Validate Floating IP configuration on Load Balancing Rules according to the respective vendor recommendations. Identity Integration Azure Service Principals: In Active-Passive deployments, configure service principals to enable automated IP reassignment during failover. This ensures continuous service availability without manual intervention. Role-Based Access Control (RBAC): Integrate firewall management with Azure RBAC to control who can deploy, manage, or modify firewall configurations. SIEM Integration: Stream logs to Azure Monitor, Sentinel, or third-party SIEMs for auditing, monitoring, and incident response. Licensing Pay-As-You-Go (PAYG): Licenses are bundled into the VM cost when deploying from the Azure Marketplace. Best for short-term projects, PoCs, or variable workloads. Bring Your Own License (BYOL): Enterprises can apply existing contracts and licenses with vendors to Azure deployments. Often more cost-effective for large-scale, long-term deployments. Hybrid Licensing Models: Some vendors support license mobility from on-premises to Azure, reducing duplication of costs. 5. Common Challenges Third-party firewalls in Azure provide strong security controls, but organizations often face practical challenges in day-to-day operations: Misconfiguration Incorrect UDRs, route tables, or NAT rules can cause dropped traffic or bypassed inspection. Asymmetric routing is a frequent issue in hub-and-spoke topologies, leading to session drops in stateful firewalls. Performance Bottlenecks Firewall throughput depends on the VM SKU (CPU, memory, NIC limits). Under-sizing causes latency and packet loss, while over-sizing adds unnecessary cost. Continuous monitoring and vendor sizing guides are essential. Failover Downtime Active-Passive models introduce brief service interruptions while IPs and routes are reassigned. Some sessions may be lost even with state sync, making Active-Active more attractive for mission-critical workloads. Backup & Recovery Azure Backup doesn’t support vendor firewall OS. Configurations must be exported and stored externally (e.g., storage accounts, repos, or vendor management tools). Without proper backups, recovery from failures or misconfigurations can be slow. Azure Platform Limits on Connections Azure imposes a per-VM cap of 250,000 active connections, regardless of what the firewall vendor appliance supports. This means even if an appliance is designed for millions of sessions, it will be constrained by Azure’s networking fabric. Hitting this cap can lead to unexplained traffic drops despite available CPU/memory. The workaround is to scale out horizontally (multiple firewall VMs behind a load balancer) and carefully monitor connection distribution. 6. Best Practices for Third-Party Firewall Deployments To maximize security, reliability, and performance of third-party firewalls in Azure, organizations should follow these best practices: Deploy in Availability Zones: Place firewall instances across different Availability Zones to ensure regional resilience and minimize downtime in case of zone-level failures. Prefer Active-Active for Critical Workloads: Where zero downtime is a requirement, use Active-Active clusters behind an Azure Load Balancer. Active-Passive can be simpler but introduces failover delays. Use Dedicated Subnets for Interfaces: Separate trust, untrust, HA, and management NICs into their own subnets. This enforces segmentation, simplifies route management, and reduces misconfiguration risk. Apply Least Privilege Policies: Always start with a deny-all baseline, then allow only necessary applications, ports, and protocols. Regularly review rules to avoid policy sprawl. Standardize Naming & Tagging: Adopt consistent naming conventions and resource tags for firewalls, subnets, route tables, and policies. This aids troubleshooting, automation, and compliance reporting. Validate End-to-End Traffic Flows: Test both north-south (Internet ↔ VNet) and east-west (VNet ↔ VNet/subnet) flows after deployment. Use tools like Azure Network Watcher and vendor traffic logs to confirm inspection. Plan for Scalability: Monitor throughput, CPU, memory, and session counts to anticipate when scale-out or higher VM SKUs are needed. Some vendors support autoscaling clusters for bursty workloads. Maintain Firmware & Threat Signatures: Regularly update the firewall’s software, patches, and threat intelligence feeds to ensure protection against emerging vulnerabilities and attacks. Automate updates where possible. Conclusion Third-party firewalls remain a core building block in many enterprise Azure Landing Zones. They provide the deep security controls and operational familiarity enterprises need, while Azure provides the scalable infrastructure to host them. By following the hub-and-spoke architecture, carefully planning deployment models, and enforcing best practices for routing, redundancy, monitoring, and backup, organizations can ensure a secure and reliable network foundation in Azure.1.1KViews3likes2CommentsEmpower Smarter AI Agent Investments
This curated series of modules is designed to equip technical and business decision-makers, including IT, developers, engineers, AI engineers, administrators, solution architects, business analysts, and technology managers, with the practical knowledge and guidance needed to make cost-conscious decisions at every stage of the AI agent journey. From identifying high-impact use cases and understanding cost drivers, to forecating ROI, adopting best practices, designing scalable and effective architectures, and optimizing ongoing investments, this learning path provides actionable guidance for building, deploying, and managing AI agents on Azure with confidence. Whether you’re just starting your AI journey or looking to scale enterprise adoption, these modules will help you align innovation with financial discipline, ensuring your AI agent initiatives deliver sustainable value and long-term success. Discover the full learning path here: aka.ms/Cost-Efficient-AI-Agents Explore the sections below for an overview of each module included in this learning path, highlighting the core concepts, practical strategies, and actionable insights designed to help you maximize the value of AI agent investments on Azure: Module 1: Identify and Prioritize High-Impact, Cost-Effective AI Agent Use Cases The journey begins with a strategic approach to selecting AI agent use cases that maximize business impact and cost efficiency. This module introduces a structured framework for researching proven use cases, collaborating across teams, and defining KPIs to evaluate feasibility and ROI. You’ll learn how to target “quick wins” while ensuring alignment with organizational goals and resource constraints. Explore this module Module 2: Understand the Key Cost Drivers of AI Agents Building on the foundation of use case selection, Module 2 dives into the core cost drivers of AI agent development and operations on Azure. It covers infrastructure, integration, data quality, team expertise, and ongoing operational expenses, offering actionable strategies to optimize spending at every stage. The module emphasizes right-sizing resources, efficient data preparation, and leveraging Microsoft tools to streamline development and ensure sustainable, scalable success. Explore this module Module 3: Forecast the Return on Investment (ROI) of AI agents With a clear understanding of costs, the next step is to quantify value. Module 3 empowers both business and technical leaders with practical frameworks for forecasting and communicating ROI, even without a finance background. Through step-by-step guides and real-world examples, you’ll learn to measure tangible and intangible outcomes, apply NPV calculations, and use sensitivity analysis to prioritize AI investments that align with broader organizational objectives. Explore this module Module 4: Implement Best Practices to Empower AI Agent Efficiency and Ensure Long-Term Success To drive efficiency and governance at scale, Module 4 introduces essential frameworks such as the AI Center of Excellence (CoE), FinOps, GenAI Ops, the Cloud Adoption Framework (CAF), and the Well-Architected Framework (WAF). These best practices help organizations accelerate adoption, optimize resources, and foster operational excellence, ensuring AI agents deliver measurable value, remain secure, and support sustainable enterprise growth. Explore this module Module 5: Maximize Cost Efficiency by Choosing the Right AI Agent Development Approach Selecting the right development approach is critical for balancing speed, customization, and cost. In Module 5, you’ll learn how to align business needs and technical skills with SaaS, PaaS, or IaaS options, empowering both business users and developers to efficiently build, deploy, and manage AI agents. The module also highlights how Microsoft Copilot Studio, Visual Studio, and Azure AI Foundry can help your organization achieve its goals. Explore this module Module 6: Architect Scalable and Cost-Efficient AI Agent Solutions on Azure As your AI initiatives grow, architectural choices become paramount. Module 6 explores how to leverage Azure Landing Zones and reference architectures for secure, well-governed, and cost-optimized deployments. It compares single-agent and multi-agent systems, highlights strategies for cost-aware model selection, and details best practices for governance, tagging, and pricing, ensuring your AI solutions remain flexible, resilient, and financially sustainable. Explore this module Module 7: Manage and Optimize AI Agent Investments on Azure The learn path concludes with a focus on operational excellence. Module 7 provides guidance on monitoring agent performance and spending using Azure AI Foundry Observability, Azure Monitor Application Insights, and Microsoft Cost Management. Learn how to track key metrics, set budgets, receive real-time alerts, and optimize resource allocation, empowering your organization to maximize ROI, stay within budget, and deliver ongoing business value. Explore this module Ready to accelerate your AI agent journey with financial confidence? Start exploring the new learning path and unlock proven strategies to maximize the cost efficiency of your AI agents on Azure, transforming innovation into measurable, sustainable business success. Get started todayCloud and AI Cost Efficiency: A Strategic Imperative for Long-Term Business Growth
In this blog, we’ll explore why cost efficiency is a top priority for organizations today, how Azure Essentials can help address this challenge, and provide an overview of Microsoft’s solutions, tools, programs, and resources designed to help organizations maximize the value of their cloud and AI investments.Using Application Gateway to secure access to the Azure OpenAI Service: Customer success story
Introduction A large enterprise customer set out to build a generative AI application using Azure OpenAI. While the app would be hosted on-premises, the customer wanted to leverage the latest large language models (LLMs) available through Azure OpenAI. However, they faced a critical challenge: how to securely access Azure OpenAI from an on-prem environment without private network connectivity or a full Azure landing zone. This blog post walks through how customers overcame these limitations using Application Gateway as a reverse proxy in front of their Azure Open AI along with other Azure services, to meet their security and governance requirements. Customer landscape and challenges The customer’s environment lacked: Private network connectivity (no Site-to-Site VPN or ExpressRoute). This was due to using a new Azure Government environment and not having a cloud operations team set up yet Common network topology such as Virtual WAN and Hub-Spoke network design A full Enterprise Scale Landing Zone (ESLZ) of common infrastructure Security components like private DNS zones, DNS resolvers, API Management, and firewalls This meant they couldn’t use private endpoints or other standard security controls typically available in mature Azure environments. Security was non-negotiable. Public access to Azure OpenAI was unacceptable. Customer needs to: Restrict access to specific IP CIDR ranges from on-prem user machines and data centers Limit ports communicating with Azure OpenAI Implement a reverse proxy with SSL termination and Web Application Firewall (WAF) Use a customer-provided SSL certificate to secure traffic Proposed solution To address these challenges, the customer designed a secure architecture using the following Azure components: Key Azure services Application Gateway – Layer 7 reverse proxy, SSL termination & Web Application Firewall (WAF) Public IP – Allows communication over public internet between customer’s IP addresses & Azure IP addresses Virtual Network – Allows control of network traffic in Azure Network Security Group (NSG) – Layer 4 network controls such as port numbers, service tags using five-tuple information (source, source port, destination, destination port, protocol) Azure OpenAI – Large Language Model (LLM) NSG configuration Inbound Rules: Allow traffic only from specific IP CIDR ranges and HTTP(S) ports Outbound Rules: Target AzureCloud.<region> with HTTP(S) ports (no service tag for Azure OpenAI yet) Application Gateway setup SSL Certificate: Issued by the customer’s on-prem Certificate Authority HTTPS Listener: Uses the on-prem certificate to terminate SSL Traffic flow: Decrypt incoming traffic Scan with WAF Re-encrypt using a well-known Azure CA Override backend hostname Custom health probe: Configured to detect a 404 response from Azure OpenAI (since no health check endpoint exists) Azure OpenAI configuration IP firewall restrictions: Only allow traffic from the Application Gateway subnet Outcome By combining Application Gateway, NSGs, and custom SSL configurations, the customer successfully secured their Azure OpenAI deployment—without needing a full ESLZ or private connectivity. This approach enabled them to move forward with their generative AI app while maintaining enterprise-grade security and governance.473Views1like0CommentsMicrosoft Azure scales Hollow Core Fiber (HCF) production through outsourced manufacturing
Introduction As cloud and AI workloads surge, the pressure on datacenter (DC), Metro and Wide Area Network (WAN) networks has never been greater. Microsoft is tackling the physical limits of traditional networking head-on. From pioneering research in microLED technologies to deploying Hollow Core Fiber (HCF) at global scale, Microsoft is reimagining connectivity to power the next era of cloud networking. Azure’s HCF journey has been one of relentless innovation, collaboration, and a vision to redefine the physical layer of the cloud. Microsoft’s HCF, based on the proprietary Double Nested Antiresonant Nodeless Fiber (DNANF) design, delivers up to 47% faster data transmission and approximately 33% lower latency compared to conventional Single Mode Fiber (SMF), bringing significant advantages to the network that powers Azure. Today, Microsoft is announcing a major milestone: the industrial scale-up of HCF production, powered by new strategic manufacturing collaborations with Corning Incorporated (Corning) and Heraeus Covantics (Heraeus). These collaborations will enable Azure to increase the global fiber production of HCF to meet the demands of the growing network infrastructure, advancing the performance and reliability customers expect for cloud and AI workloads. Real-world benefits for Azure customers Since 2023, Microsoft has deployed HCF across multiple Azure regions, with production links meeting performance and reliability targets. As manufacturing scales, Azure plans to expand deployment of the full end-to-end HCF network solution to help increase capacity, resiliency, and speed for customers, with the potential to set new benchmarks for latency and efficiency in fiber infrastructure. Why it matters Microsoft’s proprietary HCF design brings the following improvements for Azure customers: Increased data transmission speeds with up to 33% lower latency. Enhanced signal performance that improves data transmission quality for customers. Improved optical efficiency resulting in higher bandwidth rates compared to conventional fiber. How Microsoft is making it possible To operationalize HCF across Azure with production grade performance, Microsoft is: Deploying a standardized HCF solution with end-to-end systems and components for operational efficiency, streamlined network management, and reliable connectivity across Azure’s infrastructure. Ensuring interoperability with standard SMF environments, enabling seamless integration with existing optical infrastructure in the network for faster deployment and scalable growth. Creating a multinational production supply chain to scale next generation fiber production, ensuring the volumes and speed to market needed for widespread HCF deployment across the Azure network. Scaling up and out With Corning and Heraeus as Microsoft’s first HCF manufacturing collaborators, Azure plans to accelerate deployment to meet surging demand for high-performance connectivity. These collaborations underscore Microsoft’s commitment to enhancing its global infrastructure and delivering a reliable customer experience. They also reinforce Azure’s continued investment in deploying HCF, with a vision for this technology to potentially set the global benchmark for high-capacity fiber innovation. “This milestone marks a new chapter in reimagining the cloud’s physical layer. Our collaborations with Corning and Heraeus establish a resilient, global HCF supply chain so Azure can deliver a standardized, world-class customer experience with ultra-low latency and high reliability for modern AI and cloud workloads.” - Jamie Gaudette, Partner Cloud Network Engineering Manager at Microsoft To scale HCF production, Microsoft will utilize Corning’s established U.S. facilities, while Heraeus will produce out of its sites in both Europe and the U.S. "Corning is excited to expand our longtime collaboration with Microsoft, leveraging Corning’s fiber and cable manufacturing facilities in North Carolina to accelerate the production of Microsoft's Hollow Core Fiber. This collaboration not only strengthens our existing relationship but also underscores our commitment to advancing U.S. leadership in AI innovation and infrastructure. By working closely with Microsoft, we are poised to deliver solutions that meet the demands of AI workloads, setting new benchmarks for speed and efficiency in fiber infrastructure." - Mike O'Day, Senior Vice President and General Manager, Corning Optical Communications “We started our work on HCF a decade ago, teamed up with the Optoelectronics Research Centre (ORC) at the University of Southampton and then with Lumenisity prior to its acquisition. Now, we are excited to continue working with Microsoft on shaping the datacom industry. With leading solutions in glass, tube, preform, and fiber manufacturing, we are ready to scale this disruptive HCF technology to significant volumes. We’ll leverage our proven track record of taking glass and fiber innovations from the lab to widespread adoption, just as we did in the telecom industry, where approximately 2 billion kilometers of fiber are made using Heraeus products.” - Dr. Jan Vydra, Executive Vice President Fiber Optics, Heraeus Covantics Azure engineers are working alongside Corning and Heraeus to operationalize Microsoft manufacturing process intellectual property (IP), deliver targeted training programs, and drive the yield, metrology, and reliability improvements required for scaled production. The collaborations are foundational to a growing standardized, global ecosystem that supports: Glass preform/tubing supply Fiber production at scale Cable and connectivity for deployment into carrier‑grade environments Building on a foundation of innovation: Microsoft’s HCF program In 2022, Microsoft acquired Lumenisity, a spin‑out from the Optoelectronics Research Centre (ORC) at the University of Southampton, UK. That same year, Microsoft launched the world’s first state‑of‑the‑art HCF fabrication facility in the UK to expand production and drive innovation. This purpose-built site continues to support long‑term HCF research, prototyping, and testing, ensuring that Azure remains at the forefront of HCF technology. Working with industry leaders, Microsoft has developed a proven end‑to‑end ecosystem of components, equipment, and HCF‑specific hardware necessary and successfully proven in production deployments and operations. Pushing the boundaries: recent breakthrough research Today, the University of Southampton announced a landmark achievement in optical communications: in collaboration with Azure Fiber researchers, they have demonstrated the lowest signal loss ever recorded for optical fibers (<0.1 dB/km) using research-grade DNANF HCF technology (see figure 4). This breakthrough, detailed in a research paper published in Nature Photonics earlier this month, paves the way for a potential revolution in the field, enabling unprecedented data transmission capacities and longer unamplified spans. ecords at around 1550nm [1] 2002 Nagayama et al. 1 [2] 2025 Sato et al. 2 [3] 2025 research-grade DNANF HCF Petrovich et al. 3 This breakthrough highlights the potential for this technology to transform global internet infrastructure and DC connectivity. Expected benefits include: Faster: Approximately 47% faster, reducing latency, powering real-time AI inference, cloud gaming and other interactive workloads. More capacity: A wider optical spectrum window enabling exponentially greater bandwidth. Future-ready: Lays the groundwork for quantum-safe links, quantum computing infrastructure, advanced sensing, and remote laser delivery. Looking ahead: Unlocking the future of cloud networking The future of cloud networking is being built today! With record-breaking [3] fiber innovations, a rapidly expanding collaborative ecosystem, and the industrialized scale to deliver next-generation performance, Azure continues to evolve to meet the demands for speed, reliability, and connectivity. As we accelerate the deployment of HCF across our global network, we’re not just keeping pace with the demands of AI and cloud, we’re redefining what’s possible. References: [1] Nagayama, K., Kakui, M., Matsui, M., Saitoh, T. & Chigusa, Y. Ultra-low-loss (0.1484 dB/km) pure silica core fibre and extension of transmission distance. Electron. Lett. 38, 1168–1169 (2002). [2] Sato, S., Kawaguchi, Y., Sakuma, H., Haruna, T. & Hasegawa, T. Record low loss optical fiber with 0.1397 dB/km. In Proc. Optical Fiber Communication Conference (OFC) 2024 Tu2E.1 (Optica Publishing Group, 2024). [3] Petrovich, M., Numkam Fokoua, E., Chen, Y., Sakr, H., Isa Adamu, A., Hassan, R., Wu, D., Fatobene Ando, R., Papadimopoulos, A., Sandoghchi, S., Jasion, G., & Poletti, F. Broadband optical fibre with an attenuation lower than 0.1 decibel per kilometre. Nat. Photon. (2025). https://doi.org/10.1038/s41566-025-01747-5 Useful Links: The Deployment of Hollow Core Fiber (HCF) in Azure’s Network How hollow core fiber is accelerating AI | Microsoft Azure Blog Learn more about Microsoft global infrastructure7.6KViews6likes0CommentsAzure Networking Portfolio Consolidation
Overview Over the past decade, Azure Networking has expanded rapidly, bringing incredible tools and capabilities to help customers build, connect, and secure their cloud infrastructure. But we've also heard strong feedback: with over 40 different products, it hasn't always been easy to navigate and find the right solution. The complexity often led to confusion, slower onboarding, and missed capabilities. That's why we're excited to introduce a more focused, streamlined, and intuitive experience across Azure.com, the Azure portal, and our documentation pivoting around four core networking scenarios: Network foundations: Network foundations provide the core connectivity for your resources, using Virtual Network, Private Link, and DNS to build the foundation for your Azure network. Try it with this link: Network foundations Hybrid connectivity: Hybrid connectivity securely connects on-premises, private, and public cloud environments, enabling seamless integration, global availability, and end-to-end visibility, presenting major opportunities as organizations advance their cloud transformation. Try it with this link: Hybrid connectivity Load balancing and content delivery: Load balancing and content delivery helps you choose the right option to ensure your applications are fast, reliable, and tailored to your business needs. Try it with this link: Load balancing and content delivery Network security: Securing your environment is just as essential as building and connecting it. The Network Security hub brings together Azure Firewall, DDoS Protection, and Web Application Firewall (WAF) to provide a centralized, unified approach to cloud protection. With unified controls, it helps you manage security more efficiently and strengthen your security posture. Try it with this link: Network security This new structure makes it easier to discover the right networking services and get started with just a few clicks so you can focus more on building, and less on searching. What you’ll notice: Clearer starting points: Azure Networking is now organized around four core scenarios and twelve essential services, reflecting the most common customer needs. Additional services are presented within the context of these scenarios, helping you stay focused and find the right solution without feeling overwhelmed. Simplified choices: We’ve merged overlapping or closely related services to reduce redundancy. That means fewer, more meaningful options that are easier to evaluate and act on. Sunsetting outdated services: To reduce clutter and improve clarity, we’re sunsetting underused offerings such as white-label CDN services and China CDN. These capabilities have been rolled into newer, more robust services, so you can focus on what’s current and supported. What this means for you Faster decision-making: With clearer guidance and fewer overlapping products, it's easier to discover what you need and move forward confidently. More productive sales conversations: With this simplified approach, you’ll get more focused recommendations and less confusion among sellers. Better product experience: This update makes the Azure Networking portfolio more cohesive and consistent, helping you get started quickly, stay aligned with best practices, and unlock more value from day one. The portfolio consolidation initiative is a strategic effort to simplify and enhance the Azure Networking portfolio, ensuring better alignment with customer needs and industry best practices. By focusing on top-line services, combining related products, and retiring outdated offerings, Azure Networking aims to provide a more cohesive and efficient product experience. Azure.com Before: Our original Solution page on Azure.com was disorganized and static, displaying a small portion of services in no discernable order. After: The revised solution page is now dynamic, allowing customers to click deeper into each networking and network security category, displaying the top line services, simplifying the customer experience. Azure Portal Before: With over 40 networking services available, we know it can feel overwhelming to figure out what’s right for you and where to get started. After: To make it easier, we've introduced four streamlined networking hubs each built around a specific scenario to help you quickly identify the services that match your needs. Each offers an overview to set the stage, key services to help you get started, guidance to support decision-making, and a streamlined left-hand navigation for easy access to all services and features. Documentation For documentation, we looked at our current assets as well as created new assets that aligned with the changes in the portal experience. Like Azure.com, we found the old experiences were disorganized and not well aligned. We updated our assets to focus on our top-line networking services, and to call out the pillars. Our belief is these changes will allow our customers to more easily find the relevant and important information they need for their Azure infrastructure. Azure Network Hub Before the updates, we had a hub page organized around different categories and not well laid out. In the updated hub page, we provided relevant links for top-line services within all of the Azure networking scenarios, as well as a section linking to each scenario's hub page. Scenario Hub pages We added scenario hub pages for each of the scenarios. This provides our customers with a central hub for information about the top-line services for each scenario and how to get started. Also, we included common scenarios and use cases for each scenario, along with references for deeper learning across the Azure Architecture Center, Well Architected Framework, and Cloud Adoption Framework libraries. Scenario Overview articles We created new overview articles for each scenario. These articles were designed to provide customers with an introduction to the services included in each scenario, guidance on choosing the right solutions, and an introduction to the new portal experience. Here's the Load balancing and content delivery overview: Documentation links Azure Networking hub page: Azure networking documentation | Microsoft Learn Scenario Hub pages: Azure load balancing and content delivery | Microsoft Learn Azure network foundation documentation | Microsoft Learn Azure hybrid connectivity documentation | Microsoft Learn Azure network security documentation | Microsoft Learn Scenario Overview pages What is load balancing and content delivery? | Microsoft Learn Azure Network Foundation Services Overview | Microsoft Learn What is hybrid connectivity? | Microsoft Learn What is Azure network security? | Microsoft Lea Improving user experience is a journey and in coming months we plan to do more on this. Watch out for more blogs over the next few months for further improvements.2.8KViews3likes0CommentsGA: Enhanced Audit in Azure Security Baseline for Linux
We’re thrilled to announce the General Availability (GA) of the Enhanced Azure Security Baseline for Linux—a major milestone in cloud-native security and compliance. This release brings powerful, audit-only capabilities to over 1.6 million Linux devices across all Azure regions, helping enterprise customers and IT administrators monitor and maintain secure configurations at scale. What Is the Azure Security Baseline for Linux? The Azure Security Baseline for Linux is a set of pre-configured security recommendations delivered through Azure Policy and Azure Machine Configuration. It enables organizations to continuously audit Linux virtual machines and Arc-enabled servers against industry-standard benchmarks—without enforcing changes or triggering auto-remediation. This GA release focuses on enhanced audit capabilities, giving teams deep visibility into configuration drift and compliance gaps across their Linux estate. For our remediation experience, there is a limited public preview available here: What is the Azure security baseline for Linux? | Microsoft Learn Why Enhanced Audit Matters In today’s hybrid environments, maintaining compliance across diverse Linux distributions is a challenge. The enhanced audit mode provides: Granular insights into each configuration check Industry aligned benchmark for standardized security posture Detailed rule-level reporting with evidence and context Scalable deployment across Azure and Arc-enabled machines Whether you're preparing for an audit, hardening your infrastructure, or simply tracking configuration drift, enhanced audit gives you the clarity and control you need—without enforcing changes. Key Features at GA ✅ Broad Linux Distribution Support 📘 Full distro list: Supported Client Types 🔍 Industry-Aligned Audit Checks The baseline audits over 200+ security controls per machine, aligned to industry benchmarks such as CIS. These checks cover: OS hardening Network and firewall configuration SSH and remote access settings Logging and auditing Kernel parameters and system services Each finding includes a description and the actual configuration state—making it easy to understand and act on. 🌐 Hybrid Cloud Coverage The baseline works across: Azure virtual machines Arc-enabled servers (on-premises or other clouds) This means you can apply a consistent compliance standard across your entire Linux estate—whether it’s in Azure, on-prem, or multi-cloud. 🧠 Powered by Azure OSConfig The audit engine is built on the open-source Azure OSConfig framework, which performs Linux-native checks with minimal performance impact. OSConfig is modular, transparent, and optimized for scale—giving you confidence in the accuracy of audit results. 📊 Enterprise-Scale Reporting Audit results are surfaced in: Azure Policy compliance dashboard Azure Resource Graph Explorer Microsoft Defender for Cloud (Recommendations view) You can query, export, and visualize compliance data across thousands of machines—making it easy to track progress and share insights with stakeholders. 💰 Cost There’s no premium SKU or license required to use the audit capabilities with charges only applying to the Azure Arc managed workloads hosted on-premises or other CSP environments—making it easy to adopt across your environment. How to Get Started Review the Quickstart Guide 📘 Quickstart: Audit Azure Security Baseline for Linux Assign the Built-In Policy Search for “Linux machines should meet requirements for the Azure compute security baseline” in Azure Policy and assign it to your desired scope. Monitor Compliance Use Azure Policy and Resource Graph to track audit results and identify non-compliant machines. Plan Remediation While this release does not include auto-remediation, the detailed audit findings make it easy to plan manual or scripted fixes. Final Thoughts This GA release marks a major step forward in securing Linux workloads at scale. With enhanced audit now available, enterprise teams can: Improve visibility into Linux security posture Align with industry benchmarks Streamline compliance reporting Reduce risk across cloud and hybrid environmentsDesigning for Certainty: How Azure Capacity Reservations Safeguard Mission‑Critical Workloads
Why capacity reservations matter now Cloud isn’t running out of metal, but demand is compounding and often spikes. Resource strain shows up in specific regions, zones, and VM SKUs, especially for popular CPU families, memory-optimized sizes, and anything involving GPUs. Seasonal events (retail peaks), regulatory cutovers, emergency response, and bursty AI pipelines can trigger sudden surges. Even with healthy regional capacity, a single zone or a specific SKU can be tight. Capacity reservations acknowledge this reality and make it designable instead of probabilistic. Root reality: Capacity is finite at the SKU-in-zone granularity, and demand arrives in waves. Risk profile: The risk is not “no capacity in the cloud,” but “no capacity for this exact size in this exact place at this exact moment.” Strategic move: Reserve what matters, where it matters, before you need it. What capacity means in practice Think of three dimensions: region, zone, and SKU. Your workload’s SLO ties to all three. Region: The biggest pool of resources. It gives you flexibility but doesn’t guarantee availability in a specific zone. Zone: This is where fault isolation happens and where you’ll often feel the pinch first when demand spikes. SKU: The specific type of machine you’re asking for. This is usually the tightest constraint, especially for popular sizes like Dv5, Ev5, or anything with GPUs. Azure Capacity Reservations let you lock capacity for a specific VM size at the regional or zonal scope and then place VMs/scale sets into that reservation. Pay‑as‑you‑go vs capacity reservations vs reserved instances Attribute Pay‑as‑you‑go Capacity Reservations Reserved Instances Primary purpose Flexibility, no commitment Guarantee availability for a VM size Reduce price for steady usage What it guarantees Nothing beyond current availability Capacity in region/zone for N of a SKU Discount on matching usage (1‑ or 3‑year term) Scope Region/zone at runtime, best‑effort Bound to region or specific zone Billing benefit across scope rules Commitment None Active while you keep it (on‑demand) Term commitment (1 or 3 years) Key clarifications Capacity reservations ≠ discount tool: They exist to secure availability. You pay while the reservation is active (even if idle) because Azure is holding that capacity for you. Reserved Instances ≠ capacity guarantee: They reduce the rate you pay when you run matching VMs, but they don’t hold hardware for you. Together: Use Capacity Reservations to ensure the VMs can run; use Reserved Instances to lower the cost of the runtime those VMs consume. This is universal, not just Azure Every major cloud faces the same physics: finite hardware, localized spikes, SKU-specific constraints, and growth in high-demand families (especially GPUs). AWS offers On‑Demand Capacity Reservations; Google Cloud offers zonal reservations. The names differ; the pattern and the need are the same. If your architecture depends on “must run here, as this size, and right now,” you either design for capacity or accept availability risk. When mission‑critical means “reserve it” If failure to acquire capacity breaks your SLO, treat capacity as a dependency to engineer, not a variable to assume. High-stakes cutovers and events: Examples: Black Friday, tax deadlines, trading close, clinical batch windows. Action: Pre‑reserve the exact SKU in the exact zones for the surge window. HA across zones: Goal: Survive a zone failure by scaling in active zones. Action: Consider keeping extra capacity in each zone based on your failover plan, whether that’s N+1 or matching peak load, depending on active/active vs. active/passive. Change windows that deallocate/recreate: Risk: If a VM is deallocated during maintenance, it might not get the same placement when restarted. Action: Associate VMs/VMSS with a capacity reservation group before deallocation. Fixed‑SKU dependencies: Signal: Performance needs, licensing rules, or hardware accelerators that lock you into a specific VM family. Action: Reserve by SKU. If possible, define fallback SKUs and split reservations across them. Regulated or latency‑sensitive workloads: Constraint: Must run in a specific zone or region due to compliance or latency. Action: Prefer zonal reservations to control both locality and availability. How reserved instances complement capacity reservations Two-layer strategy: Layer 1: Availability: Capacity reservations ensure your compute can be placed when needed. Layer 2: Economics: Reserved Instances (or Savings Plans) apply a pricing benefit to the steady‑state hours you actually run. Practical pairing: Steady base load: Cover with 1/3‑year Reserved Instances for maximum savings. Critical surge headroom: Hold with Capacity Reservations; if the surge is predictable, you can still layer partial RI coverage aligned to expected utilization. Dynamic burst: Leave as pay‑as‑you‑go or use short‑lived reservations during known windows. FinOps hygiene: Coverage ratios: Track RI coverage and capacity reservation utilization separately. Rightsizing: Align reservations to the SKU mix you truly run; shift or cancel idle capacity reservations quickly. Chargeback: Attribute the cost of “insurance” (capacity) to the workloads that require the SLO, separate from the cost of “fuel” (compute hours). Conclusion In today’s cloud landscape, resilience isn’t just redundancy; it’s about assured access to the exact resources your workload demands. Capacity Reservations remove uncertainty by guaranteeing placement, while Reserved Instances drive cost efficiency for predictable use. Together, they form a strategic duo that keeps mission‑critical services running smoothly under any demand surge. Build with both in mind, and you turn capacity from a risk into a controlled asset.