Monitoring ExpressRoute: A Workbook Solution
Published Jan 26 2024 09:00 AM 5,193 Views
Microsoft

Summary:

This workbook addresses a common challenge faced by organizations using Azure ExpressRoute to connect their on-premises networks to Azure cloud services. The challenge is to effectively visualize the health and availability of ExpressRoute components. To address this, we have developed a solution using Azure Monitor Workbooks. Our solution is an interactive workbook that provides comprehensive monitoring and troubleshooting for ExpressRoute, including the monitoring of break-glass metrics.
________________________________________________________________________________________________________________________________________________________

Introduction: Azure ExpressRoute Monitoring.

Azure ExpressRoute is a service that enables organizations to create private connections between their on-premises networks and Azure cloud services. ExpressRoute provides faster speeds, lower latencies, higher reliability, and more security than public internet connections. However, ExpressRoute also requires careful monitoring and optimization to ensure optimal performance and availability. Organizations need to track metrics such as ExpressRoute Circuit Status, BGP availability, total throughput, Primary and Secondary Connection traffic, packet drops, Gateway utilization and throughput, and Azure Firewall SNAT and throughput. They also need to troubleshoot issues such as circuit failures, gateway failures, routing errors, bandwidth bottlenecks, firewall misconfigurations, and more.

Problem: Disparate Monitoring of Azure ExpressRoute.

While there are currently tools such as network insights, which highlight resource health and detailed metrics for resources, such as ExpressRoute, the problem is that these existing tools for ExpressRoute monitoring and troubleshooting are not sufficient for a comprehensive view of the health of ExpressRoute components, they can also sometimes not be as user-friendly. For example, Azure Portal provides some basic metrics and alerts for ExpressRoute circuits and gateways, but they are not customizable or interactive. Azure Monitor provides more granular metrics and logs for ExpressRoute resources, but they are scattered across different sources and require complex queries to analyze. Azure Network Watcher provides some diagnostic tools for ExpressRoute connectivity issues, but they are not integrated with other monitoring data or visualizations.

 

Figure 1: network insightsFigure 1: network insights

______________________________________________________________________________________________________

Solution:

The solution is a workbook that we created using Azure Monitor Workbooks. A workbook is a type of dashboard that allows users to combine text, metrics, logs, queries, parameters, charts, tables, and other visualizations in a single view. The workbook that we developed is specifically designed for ExpressRoute monitoring and troubleshooting. It allows users to dynamically choose their ExpressRoute circuit and gateway from a drop-down list and see relevant metrics and logs in an interactive dashboard. The dashboard includes sections for:

 

- ExpressRoute Circuit Status: This section shows the status of the selected circuit (Enabled or Disabled) and the status of its primary and secondary connections (Active or Inactive). It also shows the BGP availability of the circuit (the percentage of time that BGP sessions are established between the circuit and Azure).

Figure 2: ExpressRoute Circuit StatusFigure 2: ExpressRoute Circuit Status

 

- BGP Availability: This section shows the percentage of time that the BGP sessions between the customer edge router and the Microsoft edge router are established and stable. BGP availability of an ExpressRoute depends on factors such as the physical connectivity, the routing configuration, and the network performance of the service provider and the customer.


100% means BGP is being handled properly on both connections of the ExpressRoute Circuit. If you are only using one of the redundant connections or have one of the peers is down during maintenance, 50% is expected.

Figure 3: BGP Availability for ExpressRoute CircuitFigure 3: BGP Availability for ExpressRoute Circuit

 

- Total Throughput: This section shows the total inbound and outbound throughput of the selected circuit in Mbps over a specified time range. The chart will burst as needed from Mbps to Gbps as it needs to scale.    
Figure 4: Total Throughput of ExpressRoute CircuitFigure 4: Total Throughput of ExpressRoute Circuit

 - Primary and Secondary Connection Traffic: This section shows the inbound and outbound traffic of the primary and secondary connections of the selected circuit in Mbps over a specified time range. If one connection is being exceeded and not load balancing to the other connection, there could be a configuration issue that should be investigated.

Figure 5: Primary and Secondary Connection Traffic for the ExpressRoute CircuitFigure 5: Primary and Secondary Connection Traffic for the ExpressRoute Circuit 

- Packet Drops: This section shows the dropped bits/second for ingress and egress traffic through the circuit. This provides an easy way to monitor performance issues that may occur if you regularly need or exceed your circuit bandwidth.

Figure 6: Packet Drops for ExpressRoute CircuitFigure 6: Packet Drops for ExpressRoute Circuit

- Gateway Utilization and Throughput: This section shows the utilization (the percentage of allocated bandwidth used) and throughput (the actual bandwidth used) of the selected gateway in Mbps over a specified time range. It also shows the average latency of the gateway in ms over the same time range.

Figure 7: Gateway Utilization and ThroughputFigure 7: Gateway Utilization and Throughput

 

- Route Advertisements: ExpressRoute supports up to 4000 IPv4 prefixes and 100 IPv6 prefixes advertised to Microsoft through the Azure private peering. This limit can be increased up to 10,000 IPv4 prefixes if the ExpressRoute premium add-on is enabled. ExpressRoute accept up to 200 prefixes per BGP session for Azure public and Microsoft peering.

NOTE: BGP limits should be closely monitored because several of these limits are considered “Break-Glass” limits. For example, if you exceed 1000 address spaces being announced to on-prem via BGP, you will stop advertising BGP from that gateway due to an in-line resource limit. This number CANNOT be increased. To reduce the number of routes being advertised, consider using Route Summarization if possible. 

Figure 8: Route AdvertisementsFigure 8: Route Advertisements

 

 

 

- Azure Firewall SNAT and Throughput: This section shows the SNAT (Source Network Address Translation) sessions (the number of connections from on-premises sources to Azure destinations) and throughput (the inbound and outbound traffic) of the selected firewall in Mbps over a specified time range.Figure 9: SNAT and Throughput MetricsFigure 9: SNAT and Throughput Metrics__________________________________________________________________________________________________________________________________

Conclusion:

This whitepaper has presented a problem faced by many organizations that use Azure ExpressRoute to connect their on-premises networks to Azure cloud services. The problem is how to monitor and optimize the performance and availability of their ExpressRoute circuits and gateways. The white paper has also introduced a solution that we developed using Azure Monitor Workbooks. The solution is a workbook that provides a comprehensive and interactive dashboard for ExpressRoute monitoring and troubleshooting. The white paper has explained how the workbook works, what benefits it offers, and how to use it.

 

Begin Monitoring ExpressRoute:

ExpressRoute Alerts:

ExpressRoute Circuits | Azure Monitor Baseline Alerts

ExpressRoute Gateways | Azure Monitor Baseline Alerts

ExpressRoute Ports | Azure Monitor Baseline Alerts

ExpressRoute Workbook Template:

AzureMonitorCommunity/Azure Services/Azure Monitor/Workbooks/Azure Network Monitoring.workbook at ma...

3 Comments
Version history
Last update:
‎Jan 25 2024 12:16 PM
Updated by: