API Management (APIM) is a way to create consistent and modern API gateways for existing back-end services. API Management helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services. When you build and manage your APIs in API Management in an ideal scenario, APIs configured within are expected to return successful responses (mostly 200 OK) along with the accurate data that is expected from the API. Although, issues may come from 404 not found errors to 502 bad gateway error. New API Management Diagnostics is an intelligent and interactive experience to help you troubleshoot your API published in APIM with no prior configuration Needed. API Management Diagnostics points out what’s wrong and guides you to the right information to quickly troubleshoot and resolve the issue.
API Management Diagnostics is currently not supported for Consumption Tier.
To access API Management Diagnostics, navigate to your API Management service instance in the Azure portal. In the left navigation, select Diagnose and solve problems.
You can search your issues or problems in the search bar on the top of the page. The search also helps you find the tools that may help to troubleshoot or resolve your issues.
To start with the investigation, you can troubleshoot issues under different categories. Some of the common issues that are related to your API availability and performance, gateway performance, API policies and service upgrades can all be analyzed within each category. These individual categories also provide more specific diagnostic checks.
Let’s have a look at all these individual categories and how to leverage them to troubleshoot:
Availability and performance
Leverage this category to check your API service’s health and discover performance related issues. For example, if your Service is Down, Platform Health is not good, Backend 5xx responses or SNAT port analysis.
If you have a specific problem you want to investigate, click a topic in the left navigation as shown below:
Backend 5xx Responses: There may be scenarios where you may observe API requests failing and it may be due to 5XX responses returned by backend service.
If you observe Backend 5xx responses in the diagnostic category you can refer to the following blog for further troubleshooting:
When SNAT port resources are exhausted, outbound flows fail until existing flows release SNAT ports. Read more about Outbound connections in Azure. API management will be blocked on the outgoing calls and the clients may receive http 5xx errors.
This category detects errors and notifies you of your policy issues. Ex: These can be any misconfiguration or issues related to policy expressions, validate JWT, CROS, Caching.
Let’s say there are some proxy errors in your policy then this category will give you details on the kind of error and once such example is shown as below:
Proxy Errors Detected: The errors reported below are those where the proxy records 500 response code as a result of policy expression processing error. Such errors are generally caused by invalid configuration or unanticipated runtime data.
You can obtain detailed per request logs from the proxy by enabling Diagnostic Logs for the service.
Alternatively user could use API Inspector to perform traced call and inspect possible errors.
For gateway requests or responses or any 4xx or 5xx errors on your gateway, use this category to monitor and troubleshoot. Use the data to dive deep on the specific area that you want to check for your API gateway performance.
In case there are 4xx or 5xx errors on your gateway you can open the specific problem and it will give you more details on the error. Let’s see how it would look where the gateway recorded 4xx response codes.
The errors above are those where the gateway recorded 4xx response codes, but the request was either not forwarded to the backend service, the request to the backend timed out, or the backend service returned a successful response. These discrepancies might be either due to policy configuration such as the throttling settings or they might point at an issue within policy code. The Sample Error Message column generally describes the type of error.
This category checks which service tier (SKU) you are currently using and reminds you to upgrade to avoid any issues that may be related to that tier. The same helps you discover any service upgrade events, Latency or Deployment Failure.
You leverage Genie to get answers for any of your questions about diagnosing and solving your problem for the service.
It will quickly help you with reference document and diagnostic check on the mentioned concern as shown below:
Also use API analytics to analyze the usage and performance of the APIs.
With Azure Monitor, you can visualize, query, route, archive, and take actions on the metrics or logs coming from your Azure API Management service.
Post your questions or feedback at UserVoice by adding "[Diag]" in the title.