AKS not responding

%3CLINGO-SUB%20id%3D%22lingo-sub-141640%22%20slang%3D%22en-US%22%3EAKS%20not%20responding%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-141640%22%20slang%3D%22en-US%22%3E%3CP%3EThe%20entirety%20of%20my%20AKS%20cluster%20does%20not%20appear%20to%20be%20responding%20to%20any%20requests.%3C%2FP%3E%0A%3CP%3EThe%20(linux%20based)%20nodes%20in%20the%20cluster%20appear%20to%20have%20been%20recently%20(a%20few%20hours%20ago)%20restarted%20for%20maintenance%20(by%20Azure).%3C%2FP%3E%0A%3CP%3EThe%20%3CSPAN%20style%3D%22background-color%3A%20%23f5f5f5%3B%20color%3A%20%233e3e3e%3B%20font-family%3A%20Menlo%2C%20Monaco%2C%20Consolas%2C%20'Courier%20New'%2C%20monospace%3B%20font-size%3A%2013px%3B%22%3Eaz%20aks%20browse%3C%2FSPAN%3E%20command%20returns%20with%20%22Unable%20to%20connect%20to%20the%20server%3A%20net%2Fhttp%3A%20TLS%20handshake%20timeout%22.%3C%2FP%3E%0A%3CP%3EI'm%20not%20sure%20when%20this%20happened%20as%20the%20cluster%20is%20a%20test%20one.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20have%20tried%20reploying%20the%20node%20VMs%20(which%20seems%20fine)%20and%20attempting%20to%20upgrade%20the%20kubernetes%20version%20of%20the%20cluster%20(which%20failed%20with%20%22Deployment%20failed.%20Correlation%20ID%3A%20xxxx.%20Operation%20failed%20with%20status%3A%20200.%20Details%3A%20Resource%20state%20Failed%22).%20I'm%20now%20stuck%20with%20a%20new%20set%20of%20VMs%20on%201.8.2%20and%20one%20of%20the%20original%20VMs%20on%201.8.1%20(based%20on%20VM%20tags).%3C%2FP%3E%0A%3CP%3EI'd%20rather%20not%20have%20to%20re-create%20the%20cluster..%20Anyone%20know%20of%20anything%20I%20can%20try%20or%20can%20think%20of%20more%20diagnostic%20steps%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThanks%20for%20any%20help.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-141640%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAKS%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Occasional Visitor

The entirety of my AKS cluster does not appear to be responding to any requests.

The (linux based) nodes in the cluster appear to have been recently (a few hours ago) restarted for maintenance (by Azure).

The az aks browse command returns with "Unable to connect to the server: net/http: TLS handshake timeout".

I'm not sure when this happened as the cluster is a test one.

 

I have tried reploying the node VMs (which seems fine) and attempting to upgrade the kubernetes version of the cluster (which failed with "Deployment failed. Correlation ID: xxxx. Operation failed with status: 200. Details: Resource state Failed"). I'm now stuck with a new set of VMs on 1.8.2 and one of the original VMs on 1.8.1 (based on VM tags).

I'd rather not have to re-create the cluster.. Anyone know of anything I can try or can think of more diagnostic steps?

 

Thanks for any help.

1 Reply

@Colin Bradley 

 

I will recommend, to check the common troubleshooting. if you provide the log so that will be helpful to find the exact issue.

 

Hope it will help you.

https://docs.microsoft.com/en-us/azure/aks/troubleshooting