502 Bad Gateway error for Azure Application Gateway

Copper Contributor

I am facing a challenge where in App GW can't connect to a backend ubuntu VM when Azure VMSS is being used. When I tested, my python application is responding with 200 status code locally. curl https://0.0.0.0:8000/v2/get_api_version -k {"code":200,"message":"AI API 2.0","version":2.1,"api date":"October 2021"}

When accessing the URL from browser it is timing out with a message, "This site can't be reached".

The App GW health probe responding with, "Cannot connect to backend server. Check whether any NSG/UDR/Firewall is blocking access to the server. Check if application is running on correct port". 

Screenshot 2023-10-25 at 1.52.24 PM.png

I made sure that port 8000 is added to NSG inbound security rule, Load balancer rule and listens to port 8000 on App GW. This VM is pingable but can be accessed from other devices on the same subnet. There were no firewalls that's blocking the incoming traffic. I tried to follow majority of the recommendations but nothing seems to be working.

I understand the incoming traffic is blocked which is causing 502 bad gateway error. Appreciate your suggestions or sharing your experiences.

Thank you

 

3 Replies

@VasuDundi 
It may not necessarily be an issue of incoming traffic being blocked.
Sometimes its just an issue with the health probe being able to get a response in time. If your using a default health probe I think its a 20 second time out so if it doesn't get a response from the backend in time it can result in a 502 for that you can try increasing the timeout on the health probe and open up the HTTP response status code so that pretty much anything returned will be considered 'healthy'. then you could see if its just a health probe issue. If it turns out to be a health probe issue, leaving it opened up with allow the site to still be used but you'll have an inaccurate health probe so you'll want to find a better path and configuration for the probe that will accurately work for your use case.

JeremyWallace_0-1698428637071.png
Also if your using end-to-end TLS there could be a possible issue with certificate name not matching the host name in the HTTP backend settings, or the host name on the listener. You'll want to make sure the hostname listed is a SAN name on the certificate or if its a wildcard cert then make sure host name is any valid name covered by the wildcard.

@JeremyWallace Thank you for the response. I still get unhealthy response with same message no matter I change the interval to 20 or 30 seconds. I do not think this is a health probe issue. 

Screenshot 2023-10-27 at 4.05.45 PM.png

I found that in HTTP backend settings for CN and SAN as below.
Subject: CN = ai-api-vmss-stage.eastus.cloudapp.azure.com
X509v3 Subject Alternative Name:
DNS:ai-api-vmss-stage.eastus.cloudapp.azure.com

Any other suggestions?

@VasuDundi 

Did you try this part of my previous suggestion to confirm it not being the probe?

In addition to upping the timeout, set HTTP response status code match to 200-600. It didn't look like it was set in your screenshot so just wanted to verify. That part is more key than the timeout.

 

JeremyWallace_0-1698475556836.png