AATP Service not starting after an upgrade

%3CLINGO-SUB%20id%3D%22lingo-sub-1403711%22%20slang%3D%22en-US%22%3EAATP%20Service%20not%20starting%20after%20an%20upgrade%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1403711%22%20slang%3D%22en-US%22%3E%3CP%3EGot%20updated%20to%20sensor%20version%202.114.8037.53586%20today%20and%20all%20sensors%20services%20are%20failing%20to%20start.%3C%2FP%3E%3CP%3E(old%20version%20was%202.114.8034.28632%20so%20not%20a%20huge%20leap)%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EEvent%20log%20says%20%22Service%20cannot%20be%20started.%20The%20service%20process%20could%20not%20connect%20to%20the%20service%20controller%22%3C%2FP%3E%3CP%3ELogs%20have%20a%20NullReferenceException%20on%20DirectoryServicesResolver%2B%3CCREATESECURITYPRINCIPALASYNC%3Ed__108%3C%2FCREATESECURITYPRINCIPALASYNC%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ENot%20sure%20if%20the%20service%20controller%20is%20external%20or%20internal%3F%3C%2FP%3E%3CP%3EOriginal%20install%20had%20the%20proxy%20config%20on%20the%20command%20line%2C%20could%20it%20be%20that%20has%20been%20lost%20during%20the%20upgrade%20-%20clearly%20there%20have%20been%20many%20upgrades%20since%20I%20originally%20installed%20it%20that%20have%20gone%20without%20issue.%3C%2FP%3E%3CP%3EAny%20bright%20ideas%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1403740%22%20slang%3D%22en-US%22%3ERe%3A%20AATP%20Service%20not%20starting%20after%20an%20upgrade%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1403740%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F673166%22%20target%3D%22_blank%22%3E%40kengaul%3C%2FA%3E%26nbsp%3B%2C%20known%20issue%2C%20we%20are%20working%20on%20a%20fix%20right%20now.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1405290%22%20slang%3D%22en-US%22%3ERe%3A%20AATP%20Service%20not%20starting%20after%20an%20upgrade%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1405290%22%20slang%3D%22en-US%22%3EHi%20yes%20we%20had%20the%20same%20issue%20last%20night%20when%20all%20the%20sensors%20were%20upgraded%20to%202.114.8037.53586%20at%206%3A00pm%20GMT%20they%20were%20failing%20to%20start.%20It%20looks%20like%204%20hours%20later%20there%20was%20a%20second%20update%20to%202.114.8044.7220%20at%2011%3A00pm%20GMT%20that%20fixed%20the%20issue%20and%20we%20saw%20the%20sensors%20coming%20back%20online.%3CBR%20%2F%3E%3CBR%20%2F%3EAdditionally%20on%20the%2018th%20again%20we%20lost%20sensors%20for%20all%20our%20servers%20roughly%206%3A00pm%20as%20well.%20The%20odd%20thing%20was%20even%20the%20ATP%20portal%20was%20offline%20I%20was%20getting%20a%20error%20500%20the%20page%20wouldn't%20load.%3CBR%20%2F%3E%3CBR%20%2F%3EI%20understand%20that%20there%20are%20works%20and%20maintenance%20that%20happen%2C%20however%20I%20cannot%20find%20anything%20in%20the%20Health%20Centre%2C%20web%20documentation%20or%20Twitter%20notifications%20that%20state%20there%20was%20any%20issues%20with%20ATP%20either%20on%20the%2018th%20or%2019th.%20Is%20there%20anything%20we%20can%20reference%3F%3C%2FLINGO-BODY%3E
Highlighted
Occasional Visitor

Got updated to sensor version 2.114.8037.53586 today and all sensors services are failing to start.

(old version was 2.114.8034.28632 so not a huge leap)

 

Event log says "Service cannot be started. The service process could not connect to the service controller"

Logs have a NullReferenceException on DirectoryServicesResolver+<CreateSecurityPrincipalAsync>d__108

 

Not sure if the service controller is external or internal?

Original install had the proxy config on the command line, could it be that has been lost during the upgrade - clearly there have been many upgrades since I originally installed it that have gone without issue.

Any bright ideas?

 

 

3 Replies
Highlighted

@kengaul , known issue, we are working on a fix right now.

Highlighted

Hi yes we had the same issue last night when all the sensors were upgraded to 2.114.8037.53586 at 6:00pm GMT they were failing to start. It looks like 5 hours later there was a second update to 2.114.8044.7220 at 11:00pm GMT that fixed the issue and we saw the sensors coming back online.

Additionally on the 18th again we lost sensors for all our servers roughly 6:00pm as well. The odd thing was even the ATP portal was offline I was getting a error 500 the page wouldn't load.

I understand that there are works and maintenance that happen, however I cannot find anything in the Health Centre, web documentation or Twitter notifications that state there was any issues with ATP either on the 18th or 19th. Is there anything we can reference?

Highlighted

@TlTUS, you can subscribe to get push emails on live site issues with AATP here:
https://health.atp.azure.com/

the issues you mention were posted there as well:
https://health.atp.azure.com/incidents/ztbt75s0kbtp
https://health.atp.azure.com/incidents/jps3jcf9nnsd

As for the portal Error 500, this is not something we are aware off, and we didn't get any other complaints on such a problem.
Can you tell the exact UTC times when it happened and when it resolved so I can try to track back what happened?
Also, you can share privately your workspace ID and I will try to check if there was a specific issue with it...

I do recommend that if you encounter such issue and it's not documented on the health page I shared above to immediately open a support case.
We do have the portal auto monitored, but I didn't see any alert on our systems on portal availability  on that day...