When you are running API Management instance in “internal” Virtual network mode and trying to call APIs hosted in the same APIM service (use APIM gateway endpoint Url as backend Url), you may experience 500 errors with “BackendConnectionFailure”.
Below screenshots demonstrate the steps to reproduce this issue.
Try to send request through Postman, we need allow inspector trace to better troubleshoot. As a result, 500 error returned after 21 seconds.
Now check the APIM inspector trace for more detailed information. We can get trace url from response header “Ocp-Apim-Trace-Location”.
As you can see in the inspector trace, the error happens at backend level, when trying to forward request to APIM itself. Error message is “Unable to connect to the remote server”.
You checked all the network configurations, there are no NSG or force tunneling blocking the traffic from internal VNet to APIM gateway endpoint, or even though you logged into one VM inside the same VNet, the connection from this VM to APIM gateway endpoint still works well.
It’s very confusing, because you may just send a call without any other policy or any other configurations related to the backend Url and it should not fail, as the first request layer with same domain succeeded.
Internal Load Balancer Limitation
The root cause of this issue is the load balancer limitation when accessing the internal Load Balancer frontend from the participating Load Balancer backend pool VM, as documented here:
When deploying APIM service into internal VNet mode, the load balancer of gateway (proxy) endpoint is in the same subnet where APIM backend instances are deployed in.
If an internal Load Balancer is configured inside a VNet, and one of the participant backend VMs is trying to access the internal Load Balancer frontend, failures can occur when the flow is mapped to the originating VM (same VM). This scenario is not supported.
If you use APIM Premium tier, you will have at least two VMs in the subnet, so this issue may intermittently happen (traffic from instance 1 to instance2 will succeed, traffic from instance 1 back to instance 1 will fail). But if you are using Developer pricing tier, you only have one VM instance in the APIM backend pool, this issue will consistently occur.
Historically for Internal VNET mode, APIM used to override environment level DNS on the APIM VMs for Gateway hostnames (default and custom ones) to map them to loopback interface (127.0.0.1) using host file entries on the VMs so that every time Gateway (or any other software on the VM) tried to call one of these hostnames, it would connect to itself through loopback network interface defined in the host file.
After an update in February 2020, a decision was made to stop doing host file DNS overrides. This change caused outgoing traffic from APIM VM to its own hostname to be routed to APIM Load Balancer instead of loopback interface. As a result, API calls that were sent to the same APIM service via forward-request or send-request policies started failing.
The best solution for this issue is to change the Url of the API in the policy to https://127.0.0.1 *and* add a “host” header to the request for the desired proxy host.
APIM proxy can send requests to backend (including itself) using forward-request or send-request policies. Below are the solutions for each kind of policy.
Change Url of forward-request policy
If the failing request is being sent via forward-request policy (the backendUrl of the API has been set as the Url of the APIM Proxy), the hostname of backendUrl should be changed to https://127.0.0.1. Additionally, a set-header policy should be added in <inbound> section to add the desired host header (which previously used to be part of the Url):