Azure ATP sensors opens hundreds of TCP connections

%3CLINGO-SUB%20id%3D%22lingo-sub-1038043%22%20slang%3D%22en-US%22%3EAzure%20ATP%20sensors%20opens%20hundreds%20of%20TCP%20connections%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1038043%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%3CP%3EStarting%20November%2025th%2C%20most%20of%20the%20sensors%20connected%20to%20azure%20atp%20cloud%20service%20using%20proxy%20went%20offline.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EOur%20proxy%2Ffirewall%20setup%20is%20only%20allowing%20access%20to%20Europe%20(the%20specific%20ULRs%20-%20Europe).%3C%2FP%3E%3CP%3EEver%20since%20then%2C%20the%20azure%20atp%20sensor%20service%20is%20in%20restarting%20state%2C%20without%20being%20able%20to%20start.%3C%2FP%3E%3CP%3EThat%20wouldn't%20normally%20be%20a%20problem%2C%20if%20while%20trying%20to%20connect%20to%20azure%20cloud%20service%2C%20didn't%20open%20like%20tens%20of%20connections%20to%20the%20proxy%2C%20all%20of%20them%20appearing%20with%20status%20time-out.%3C%2FP%3E%3CP%3ELater%2C%20we%20discovered%20that%20because%20of%20all%20of%20these%20connections%20initiated%2C%20that%20never%20received%20a%20response%20back%20from%20the%20proxy%2C%20the%20domain%20controller%20stopped%20responding%20(dcdiag%2C%20authentication%20and%20replication%20fails).%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EQuestion%3A%20is%20this%20a%20normal%20behaviour%20of%20the%20agent%20(to%20consume%20all%20the%20ports%20while%20trying%20to%20reach%20the%20workspace%20in%20a%20context%20of%20non-responsive%20proxy)%3F%3C%2FP%3E%3CP%3EWe%20were%20forced%20to%20kill%20the%202%20processes%2C%20and%20set%20the%202%20services%20to%20disabled%20and%20a%20ticket%20was%20raised%20with%20MS.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20was%20wondering%20if%20anyone%20experienced%20such%20issues%2C%20and%2For%20if%20someone%20from%20Microsoft%20can%20tell%20me%20if%20this%20might%20be%20related%20to%20the%20issues%20reported%20on%20November%2025th%20and%2026th%20on%20%3CA%20href%3D%22https%3A%2F%2Fhealth.atp.azure.com%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fhealth.atp.azure.com%3C%2FA%3E.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThank%20you%20in%20advance!%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1038912%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20ATP%20sensors%20opens%20hundreds%20of%20TCP%20connections%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1038912%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F354414%22%20target%3D%22_blank%22%3E%40mcliviu%3C%2FA%3E%26nbsp%3B%2C%20Hi%2C%3C%2FP%3E%0A%3CP%3EThis%20is%20not%20expected%2C%20although%20if%20you%20have%20a%20unique%20proxy%20setup%20it%20might%20induced%20a%20state%20we%20did%20not%20expect.%3CBR%20%2F%3EIf%20correlated%20to%20the%20outage%20period%2C%20it%20might%20have%20been%20some%20kind%20of%20trigger%2C%3C%2FP%3E%0A%3CP%3Ewhich%20I%20am%20interested%20to%20research%2C%20but%20so%20far%20I%20didn't%20see%20any%20other%20similar%20reports.%3C%2FP%3E%0A%3CP%3EIs%20starting%20the%20sensor%20again%20after%20the%20outage%20was%20over%20still%20caused%20issues%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EYou%20mentioned%20you%20opened%20a%20ticket%20with%20support%20on%20this%20one.%3C%2FP%3E%0A%3CP%3ECan%20you%20share%20with%20me%20in%20a%20private%20message%20the%20case%20%23%2C%20or%20ask%20the%20support%20engineer%20to%20add%20me%20to%20the%20email%20thread%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThanks%2C%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EEli%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
Occasional Contributor

Hi,

Starting November 25th, most of the sensors connected to azure atp cloud service using proxy went offline.

 

Our proxy/firewall setup is only allowing access to Europe (the specific ULRs - Europe).

Ever since then, the azure atp sensor service is in restarting state, without being able to start.

That wouldn't normally be a problem, if while trying to connect to azure cloud service, didn't open like tens of connections to the proxy, all of them appearing with status time-out.

Later, we discovered that because of all of these connections initiated, that never received a response back from the proxy, the domain controller stopped responding (dcdiag, authentication and replication fails).

 

Question: is this a normal behaviour of the agent (to consume all the ports while trying to reach the workspace in a context of non-responsive proxy)?

We were forced to kill the 2 processes, and set the 2 services to disabled and a ticket was raised with MS.

 

I was wondering if anyone experienced such issues, and/or if someone from Microsoft can tell me if this might be related to the issues reported on November 25th and 26th on https://health.atp.azure.com.

 

Thank you in advance!

 

1 Reply
Highlighted

@mcliviu , Hi,

This is not expected, although if you have a unique proxy setup it might induced a state we did not expect.
If correlated to the outage period, it might have been some kind of trigger,

which I am interested to research, but so far I didn't see any other similar reports.

Is starting the sensor again after the outage was over still caused issues?

 

You mentioned you opened a ticket with support on this one.

Can you share with me in a private message the case #, or ask the support engineer to add me to the email thread?

 

Thanks,

 

Eli