Azure API Management Diagnostics: Investigate your API performance, reliability and diagnose issues

Published Jun 03 2021 05:34 AM 1,706 Views
Microsoft

Overview:

API Management (APIM) is a way to create consistent and modern API gateways for existing back-end services. API Management helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services. When you build and manage your APIs in API Management in an ideal scenario, APIs configured within are expected to return successful responses (mostly 200 OK) along with the accurate data that is expected from the API. Although, issues may come from 404 not found errors to 502 bad gateway error. New API Management Diagnostics is an intelligent and interactive experience to help you troubleshoot your API published in APIM with no prior configuration Needed. API Management Diagnostics points out what’s wrong and guides you to the right information to quickly troubleshoot and resolve the issue.

 

API Management Diagnostics is currently not supported for Consumption Tier.

 

To access API Management Diagnostics, navigate to your API Management service instance in the Azure portal. In the left navigation, select Diagnose and solve problems.

 

Picture1.png

 

You can search your issues or problems in the search bar on the top of the page. The search also helps you find the tools that may help to troubleshoot or resolve your issues.

Picture2.png

 

Troubleshooting categories:

To start with the investigation, you can troubleshoot issues under different categories. Some of the common issues that are related to your API availability and performance, gateway performance, API policies and service upgrades can all be analyzed within each category. These individual categories also provide more specific diagnostic checks.

Picture3.png

Let’s have a look at all these individual categories and how to leverage them to troubleshoot:

 

Availability and performance

Leverage this category to check your API service’s health and discover performance related issues. For example, if your Service is Down, Platform Health is not good, Backend 5xx responses or SNAT port analysis.

Picture4.png

If you have a specific problem you want to investigate, click a topic in the left navigation as shown below:

Backend 5xx Responses: There may be scenarios where you may observe API requests failing and it may be due to 5XX responses returned by backend service.

Picture5.png

If you observe Backend 5xx responses in the diagnostic category you can refer to the following blog for further troubleshooting:

 

Platform Health

Picture6.png

 

SNAT Port Analysis

Analysis of SNAT port allocations of API Management Services. SNAT Port Exhaustion is a hardware specific failure. The following document highlights that the max concurrent requests from APIM to a back end is 1024 for the developer tier and 2048 for the other tiers. https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-...

When SNAT port resources are exhausted, outbound flows fail until existing flows release SNAT ports. Read more about Outbound connections in Azure. API management will be blocked on the outgoing calls and the clients may receive http 5xx errors.

Picture7.png

API Policies:

This category detects errors and notifies you of your policy issues. Ex: These can be any misconfiguration or issues related to policy expressions, validate JWT, CROS, Caching.

Picture8.png

Let’s say there are some proxy errors in your policy then this category will give you details on the kind of error and once such example is shown as below:


Proxy Errors Detected: The errors reported below are those where the proxy records 500 response code as a result of policy expression processing error. Such errors are generally caused by invalid configuration or unanticipated runtime data.

Picture9.png

You can obtain detailed per request logs from the proxy by enabling Diagnostic Logs for the service.

Alternatively user could use API Inspector to perform traced call and inspect possible errors.

 

Gateway performance

For gateway requests or responses or any 4xx or 5xx errors on your gateway, use this category to monitor and troubleshoot. Use the data to dive deep on the specific area that you want to check for your API gateway performance.

 

Picture10.png

In case there are 4xx or 5xx errors on your gateway you can open the specific problem and it will give you more details on the error. Let’s see how it would look where the gateway recorded 4xx response codes.

 

Picture11.png

Picture12.png

The errors above are those where the gateway recorded 4xx response codes, but the request was either not forwarded to the backend service, the request to the backend timed out, or the backend service returned a successful response. These discrepancies might be either due to policy configuration such as the throttling settings or they might point at an issue within policy code. The Sample Error Message column generally describes the type of error.

 

Service Upgrade

This category checks which service tier (SKU) you are currently using and reminds you to upgrade to avoid any issues that may be related to that tier. The same helps you discover any service upgrade events, Latency or Deployment Failure.

 

Picture13.png

Ask Genie:

You leverage Genie to get answers for any of your questions about diagnosing and solving your problem for the service.

 

Picture14.png

It will quickly help you with reference document and diagnostic check on the mentioned concern as shown below:

Picture15.png

What’s Next:

  • Also use API analytics to analyze the usage and performance of the APIs.
  • With Azure Monitor, you can visualize, query, route, archive, and take actions on the metrics or logs coming from your Azure API Management service.
  • Post your questions or feedback at UserVoice by adding "[Diag]" in the title.

 

Co-Authors
Version history
Last update:
‎Jun 03 2021 05:34 AM
Updated by: