SOLVED

Azure Advanced Thread Protection Sensor service failed to start multi-forest DC

%3CLINGO-SUB%20id%3D%22lingo-sub-308197%22%20slang%3D%22en-US%22%3EAzure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-308197%22%20slang%3D%22en-US%22%3E%3CP%3EI%20have%20had%20this%20running%20on%20the%20primary%20domain%20with%204%20sensors%20working%20fine%20for%20a%20week.%26nbsp%3B%20I%20have%20now%20attempted%20to%20add%20a%20new%20domain%20from%20another%20forest%20and%20Sensor%20repeatedly%20fails%20to%20start%20logging%20the%20following%20errors.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CFONT%3E2019-01-02%2022%3A08%3A06.6076%20Error%20DirectoryServicesClient%2B%3CCREATELDAPCONNECTIONASYNC%3Ed__33%20Microsoft.Tri.Infrastructure.ExtendedException%3A%20CreateLdapConnectionAsync%20failed%20%5BDomainControllerDnsName%3Dxxxx.yyyy.prod%5D%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20async%20Task%3CLDAPCONNECTION%3E%20Microsoft.Tri.Sensor.DirectoryServicesClient.CreateLdapConnectionAsync(DomainControllerConnectionData%20domainControllerConnectionData%2C%20bool%20isGlobalCatalog%2C%20bool%20isTraversing)%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20async%20Task%3CBOOL%3E%20Microsoft.Tri.Sensor.DirectoryServicesClient.TryCreateLdapConnectionAsync(DomainControllerConnectionData%20domainControllerConnectionData%2C%20bool%20isGlobalCatalog%2C%20bool%20isTraversing)%3CBR%20%2F%3E2019-01-02%2022%3A08%3A06.6126%20Error%20DirectoryServicesClient%20Microsoft.Tri.Infrastructure.ExtendedException%3A%20Failed%20to%20communicate%20with%20configured%20domain%20controllers%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20new%20Microsoft.Tri.Sensor.DirectoryServicesClient(IConfigurationManager%20configurationManager%2C%20IDomainNetworkCredentialsManager%20domainNetworkCredentialsManager%2C%20IMetricManager%20metricManager%2C%20IWorkspaceApplicationSensorApiJsonProxy%20workspaceApplicationSensorApiJsonProxy)%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20object%20lambda_method(Closure%2C%20object%5B%5D)%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20object%20Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20void%20Microsoft.Tri.Infrastructure.ModuleManager.AddModules(Type%5B%5D%20moduleTypes)%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20ModuleManager%20Microsoft.Tri.Sensor.SensorService.CreateModuleManager()%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20async%20Task%20Microsoft.Tri.Infrastructure.Service.OnStartAsync()%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20void%20Microsoft.Tri.Infrastructure.TaskExtension.Await(Task%20task)%3CBR%20%2F%3E%26nbsp%3B%26nbsp%3B%20at%20void%20Microsoft.Tri.Infrastructure.Service.OnStart(string%5B%5D%20args)%3C%2FBOOL%3E%3C%2FLDAPCONNECTION%3E%3C%2FCREATELDAPCONNECTIONASYNC%3E%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CFONT%3EThere%20is%20a%20two%20way%20trust.%26nbsp%3B%20Output%20from%26nbsp%3Bnltest%20%2Fdomain_trusts%20looks%20like%20this%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAAAAA%20aaaaa%3CFONT%3E.corp%20(NT%205)%20(Direct%20Outbound)%20(Direct%20Inbound)%20(%20Attr%3A%20quarantined%20)%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CFONT%3EI%20have%20tried%20many%20things%20to%20ensure%20the%20foreign%20security%20principle%20has%20LDAP%20access%20and%20went%20as%20far%20as%20testing%20adding%20it%20to%20the%20Administrators%20group%20and%20nothing%20changes%20the%20error.%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CFONT%3EAny%20help%20greatly%20appreciated.%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CFONT%3EJames%3C%2FFONT%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-309393%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-309393%22%20slang%3D%22en-US%22%3E%3CP%3EYes%2C%20we%20did%20lock%20down%20the%20LDAP%20connection%20to%20force%20Kerberos%20only.%3C%2FP%3E%0A%3CP%3EWe%20were%20aware%20it%20will%20be%20more%20sensitive%20to%20configuration%20issues%2C%20but%20as%20you%20mentioned%2C%20as%20a%20security%20product%20we%20felt%20it's%20better%20to%20handle%20some%20of%20those%20than%20falling%20back%20to%20NTLM%20automatically%20and%20not%20be%20fully%20secured.%3C%2FP%3E%0A%3CP%3EI%20was%20already%20aware%20that%20that%20problem%20is%20kerberos%20failing%2C%20but%20this%20could%20happen%20for%20many%20reasons%20due%20to%20odd%20configurations.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-309170%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-309170%22%20slang%3D%22en-US%22%3E%3CP%3ETried%20the%20Quarantine%20bit%20and%20no%20change.%26nbsp%3B%20Then%20tried%20many%20things%20that%20would%20add%20no%20value%20to%20this%20post%26nbsp%3B%20so%20will%20leave%20them%20out%20and%20I%20will%20skip%20to%20the%20revelation...%20Started%20playing%20with%20Powershell%20trying%20to%20reproduce%20the%20error%20and%26nbsp%3B%20oddly%20the%20same%20.NET%20libraries%20which%20were%20failing%20worked%20fine%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CBLOCKQUOTE%3E%3CP%3E%5BSystem.Reflection.Assembly%5D%3A%3ALoadWithPartialName(%22System.DirectoryServices.Protocols%22)%3C%2FP%3E%3CP%3E%5BSystem.Reflection.Assembly%5D%3A%3ALoadWithPartialName(%22System.Net%22)%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%24cred%20%3D%20Get-Credential%3C%2FP%3E%3CP%3E%24connLmk%20%3D%20New-Object%20System.DirectoryServices.Protocols.LdapConnection%20%22DomainName%22%3C%2FP%3E%3CP%3E%24connLmk.Bind(%24cred)%3C%2FP%3E%3C%2FBLOCKQUOTE%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThen%20conversations%20with%20a%20colleague%20led%20us%20down%20the%20path%20to%20requiring%20Kerberos%20(which%20makes%20some%20sense%20in%20a%20security%20product)%E2%80%A6%20The%20default%20behavior%20of%20the%20library%20is%20to%20negotiate...%20Adding%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CBLOCKQUOTE%3E%3CP%3E%24connLmk.AuthType%20%3D%20'Kerberos'%3C%2FP%3E%3C%2FBLOCKQUOTE%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EReproduced%20the%20error%20we%20were%20seeing%20in%20the%20Sensor%20logs.%26nbsp%3B%20So%20now%20the%20question%20was%20why%20Kerberos%20was%20not%20working%20over%20the%20trust%20which%20was%20up%20and%20valid%20and%20working%20fine%20with%20NTLM...%20After%20reading%20a%20lot%20I%20decided%20to%20take%20a%20shortcut%20and%20changed%20the%20trust%20type%20to%20a%20Forest%20Trust%20because%20in%20my%20case%20the%20trusted%20domain%20was%20a%20single%20domain%20forest%20so%20no%20real%20difference.%20%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EOnce%20I%20made%20that%20change%20both%20my%20repo%20and%20the%20sensor%20started%20working.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-308502%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-308502%22%20slang%3D%22en-US%22%3E%3CP%3EIts%20a%20production%20environment%20so%20I%20will%20need%20to%20really%20understand%20the%20change%20and%20potential%20risks%20before%20testing%20but%20I%20will%26nbsp%3Binvestigate%20this.%26nbsp%3B%20It%20would%20be%20good%20if%20there%20were%20more%20details%20on%20exactly%20what%20trust%20configuration%20is%20required%20and%20what%20rights%20the%20trusted%20account%20needs%20to%20have...%26nbsp%3B%20Reading%20your%20reference%20(4.1.2.2)%20the%20%3CSPAN%3EQuarantinedExternal%3C%2FSPAN%3E%20trust%20boundary%20should%20only%20filter%20SIDs%20that%20are%20not%20part%20of%20the%20Trusted%20Domain.%26nbsp%3B%20Since%20the%20ATP%20Directory%20Services%20user%20is%20a%20member%20of%20the%20trusted%20domain%20I%20can't%20really%20understand%20how%20this%20would%20be%20the%20issue.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20have%20also%20seen%20your%20post%20about%20the%20preview%20version%20expanding%20support%20for%20different%20trust%20configurations%20so%20maybe%20the%20developer%20working%20on%20that%20could%20shed%20more%20light%20on%20the%20subject...%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-308488%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-308488%22%20slang%3D%22en-US%22%3E%3CP%3EThis%20is%20not%20a%20stand%20alone%20sensor%20so%20network%20and%20ports%20are%20not%20relevant.%26nbsp%3B%20Trust%20issue%20is%20my%20suspicion%20which%20is%20why%20I%20included%26nbsp%3Btrust%20configuration%20details...%20Eli%20Ofek%20just%20replied%20with%20a%20more%20relevant%20response...%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-308482%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-308482%22%20slang%3D%22en-US%22%3E%3CP%3EThe%20quarantined%20attribute%20indicates%20TRUST_ATTRIBUTE_QUARANTINED_DOMAIN%3C%2FP%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Femea01.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fmsdn.microsoft.com%252Fen-us%252Flibrary%252Fcc223779.aspx%26amp%3Bdata%3D02%257C01%257Celofek%2540microsoft.com%257C5f6fc6b49179407792b908d671946c5c%257C72f988bf86f141af91ab2d7cd011db47%257C1%257C0%257C636821279620751498%26amp%3Bsdata%3DB%252BYBApTt6FmXu%252FGC90maRQq7byZSFIdleKIdIx022Yc%253D%26amp%3Breserved%3D0%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fmsdn.microsoft.com%2Fen-us%2Flibrary%2Fcc223779.aspx%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%E2%80%9CIf%20this%20bit%20is%20set%2C%20the%20trusted%20domain%20is%20quarantined%20and%20is%20subject%20to%20the%20rules%20of%20%3CA%20href%3D%22https%3A%2F%2Femea01.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fmsdn.microsoft.com%252Fen-us%252Flibrary%252Fcc223126.aspx%2523gt_83f2020d-0804-4840-a5ac-e06439d50f8d%26amp%3Bdata%3D02%257C01%257Celofek%2540microsoft.com%257C5f6fc6b49179407792b908d671946c5c%257C72f988bf86f141af91ab2d7cd011db47%257C1%257C0%257C636821279620761507%26amp%3Bsdata%3DvGTgIg9DRVkm5u9yf9HKsfzP8rBGzCe6qvqTHsfIbPw%253D%26amp%3Breserved%3D0%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3ESID%3C%2FA%3E%20Filtering%20as%20described%20in%20%3CA%20href%3D%22https%3A%2F%2Femea01.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fmsdn.microsoft.com%252Fen-us%252Flibrary%252Fcc237917.aspx%26amp%3Bdata%3D02%257C01%257Celofek%2540microsoft.com%257C5f6fc6b49179407792b908d671946c5c%257C72f988bf86f141af91ab2d7cd011db47%257C1%257C0%257C636821279620761507%26amp%3Bsdata%3DnZ7MIhdVDt5OFfZ6HEUfJhXF%252FSmjyHCQ724PLl%252BXQBQ%253D%26amp%3Breserved%3D0%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%5BMS-PAC%5D%3C%2FA%3E%20section%20%3CA%20href%3D%22https%3A%2F%2Femea01.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fmsdn.microsoft.com%252Fen-us%252Flibrary%252Fcc237940.aspx%26amp%3Bdata%3D02%257C01%257Celofek%2540microsoft.com%257C5f6fc6b49179407792b908d671946c5c%257C72f988bf86f141af91ab2d7cd011db47%257C1%257C0%257C636821279620771520%26amp%3Bsdata%3DCrQ%252B1IrgeKgm71M48WTr1t5Ou%252BuXsHaX3d6xy%252BT2qYo%253D%26amp%3Breserved%3D0%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3E4.1.2.2%3C%2FA%3E.%E2%80%9D%3C%2FP%3E%0A%3CP%3EJust%20for%20troublshooting%20-%20can%20you%20remove%20the%26nbsp%3Bquarantine%26nbsp%3B%20and%20see%20if%20things%20start%20to%20work%2C%20it%20will%20give%20a%20clue%20if%20we%20are%20on%20the%20right%20direction%2C%20and%20continue%20from%20there.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-308479%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20Advanced%20Thread%20Protection%20Sensor%20service%20failed%20to%20start%20multi-forest%20DC%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-308479%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%0A%3CP%3ESound%20like%20it%20might%20be%20either%20a%20Trust%20or%20networking%20issue%2C%20have%20you%20checked%20the%20guidance%20on%26nbsp%3B%3CFONT%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure-advanced-threat-protection%2Fatp-multi-forest%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure-advanced-threat-protection%2Fatp-multi-forest%3C%2FA%3E%3C%2FFONT%3E%20including%20the%20opened%20ports%20that%20must%20be%20opened%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
Occasional Contributor

I have had this running on the primary domain with 4 sensors working fine for a week.  I have now attempted to add a new domain from another forest and Sensor repeatedly fails to start logging the following errors.

 

2019-01-02 22:08:06.6076 Error DirectoryServicesClient+<CreateLdapConnectionAsync>d__33 Microsoft.Tri.Infrastructure.ExtendedException: CreateLdapConnectionAsync failed [DomainControllerDnsName=xxxx.yyyy.prod]
   at async Task<LdapConnection> Microsoft.Tri.Sensor.DirectoryServicesClient.CreateLdapConnectionAsync(DomainControllerConnectionData domainControllerConnectionData, bool isGlobalCatalog, bool isTraversing)
   at async Task<bool> Microsoft.Tri.Sensor.DirectoryServicesClient.TryCreateLdapConnectionAsync(DomainControllerConnectionData domainControllerConnectionData, bool isGlobalCatalog, bool isTraversing)
2019-01-02 22:08:06.6126 Error DirectoryServicesClient Microsoft.Tri.Infrastructure.ExtendedException: Failed to communicate with configured domain controllers
   at new Microsoft.Tri.Sensor.DirectoryServicesClient(IConfigurationManager configurationManager, IDomainNetworkCredentialsManager domainNetworkCredentialsManager, IMetricManager metricManager, IWorkspaceApplicationSensorApiJsonProxy workspaceApplicationSensorApiJsonProxy)
   at object lambda_method(Closure, object[])
   at object Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at void Microsoft.Tri.Infrastructure.ModuleManager.AddModules(Type[] moduleTypes)
   at ModuleManager Microsoft.Tri.Sensor.SensorService.CreateModuleManager()
   at async Task Microsoft.Tri.Infrastructure.Service.OnStartAsync()
   at void Microsoft.Tri.Infrastructure.TaskExtension.Await(Task task)
   at void Microsoft.Tri.Infrastructure.Service.OnStart(string[] args)

 

There is a two way trust.  Output from nltest /domain_trusts looks like this

 

AAAAA aaaaa.corp (NT 5) (Direct Outbound) (Direct Inbound) ( Attr: quarantined )

 

I have tried many things to ensure the foreign security principle has LDAP access and went as far as testing adding it to the Administrators group and nothing changes the error.

 

Any help greatly appreciated.

 

James

 

6 Replies
Highlighted

Hi,

Sound like it might be either a Trust or networking issue, have you checked the guidance on https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-multi-forest including the opened ports that must be opened?

Highlighted

The quarantined attribute indicates TRUST_ATTRIBUTE_QUARANTINED_DOMAIN

https://msdn.microsoft.com/en-us/library/cc223779.aspx

“If this bit is set, the trusted domain is quarantined and is subject to the rules of SID Filtering as described in [MS-PAC] section 4.1.2.2.”

Just for troublshooting - can you remove the quarantine  and see if things start to work, it will give a clue if we are on the right direction, and continue from there.

Highlighted

This is not a stand alone sensor so network and ports are not relevant.  Trust issue is my suspicion which is why I included trust configuration details... Eli Ofek just replied with a more relevant response...

Highlighted

Its a production environment so I will need to really understand the change and potential risks before testing but I will investigate this.  It would be good if there were more details on exactly what trust configuration is required and what rights the trusted account needs to have...  Reading your reference (4.1.2.2) the QuarantinedExternal trust boundary should only filter SIDs that are not part of the Trusted Domain.  Since the ATP Directory Services user is a member of the trusted domain I can't really understand how this would be the issue.

 

I have also seen your post about the preview version expanding support for different trust configurations so maybe the developer working on that could shed more light on the subject...

 

 

Highlighted
Best Response confirmed by James_W (Occasional Contributor)
Solution

Tried the Quarantine bit and no change.  Then tried many things that would add no value to this post  so will leave them out and I will skip to the revelation... Started playing with Powershell trying to reproduce the error and  oddly the same .NET libraries which were failing worked fine 

 

[System.Reflection.Assembly]::LoadWithPartialName("System.DirectoryServices.Protocols")

[System.Reflection.Assembly]::LoadWithPartialName("System.Net")

 

$cred = Get-Credential

$connLmk = New-Object System.DirectoryServices.Protocols.LdapConnection "DomainName"

$connLmk.Bind($cred)

 

Then conversations with a colleague led us down the path to requiring Kerberos (which makes some sense in a security product)… The default behavior of the library is to negotiate... Adding

 

$connLmk.AuthType = 'Kerberos'

 

Reproduced the error we were seeing in the Sensor logs.  So now the question was why Kerberos was not working over the trust which was up and valid and working fine with NTLM... After reading a lot I decided to take a shortcut and changed the trust type to a Forest Trust because in my case the trusted domain was a single domain forest so no real difference.  

 

Once I made that change both my repo and the sensor started working.

 

 

Highlighted

Yes, we did lock down the LDAP connection to force Kerberos only.

We were aware it will be more sensitive to configuration issues, but as you mentioned, as a security product we felt it's better to handle some of those than falling back to NTLM automatically and not be fully secured.

I was already aware that that problem is kerberos failing, but this could happen for many reasons due to odd configurations.