spotty LookupQuery results from DNS Connector (preview) [Monitoring Agent]

%3CLINGO-SUB%20id%3D%22lingo-sub-2118334%22%20slang%3D%22en-US%22%3Espotty%20LookupQuery%20results%20from%20DNS%20Connector%20(preview)%20%5BMonitoring%20Agent%5D%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2118334%22%20slang%3D%22en-US%22%3E%3CP%3ELast%20week%2C%20I%20on-boarded%20my%20DNS%20servers%20into%20the%20Sentinel%20DNS%20Connector%20(preview).%20Initially%2C%20things%20were%20working%20great%2C%20I%20saw%20DNS%20operational%20activity%20like%20Registrations%2C%20Config%20changes%20and%20(the%20ones%20I%20was%20most%20interested%20in)%20LookupQueries.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAfter%20a%20few%20hours%2C%20though%2C%20one%20of%20my%20DNS%20servers%20stopped%20reporting%20queries.%20It%20continued%20to%20send%20the%20registrations%20and%20config%20changes%2C%20but%20I%20saw%20no%20more%20queries%20for%20a%20few%20hours%20-%20then%20it%20resumed.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI'm%20confident%20that%20the%20server%20was%20still%20handling%20queries%20during%20that%20time.%20We%20have%20a%20load%20balancing%20solution%20in%20front%20of%20two%20nodes.%20The%20other%20node%20reported%20queries%2C%20the%20LB%20Config%20has%20not%20changed%2C%20and%20the%20LB%20sees%20both%20nodes%20as%20available.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIf%20that%20had%20been%20the%20only%20blip%2C%20no%20problem.%20But%20since%20then%2C%20out%20of%20my%20three%20DNS%20servers%2C%20I'm%20rarely%20getting%20query%20data%20back%20from%20all%20three%20at%20the%20same%20time%20and%20most%20of%20the%20time%20I'm%20getting%20zero%20to%20one%20server%20reporting.%20They%20do%20seem%20to%20self-heal%2C%20but%26nbsp%3BI%20can%20wake%20up%20the%20monitoring%20agent%20by%20using%20the%20DNS%20solution%20configure%20panel%2C%20tweaking%20the%20'Talkative%20Client%20Threshold'%20and%20saving%2C%20but%20that%20doesn't%20last%20very%20long.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ELooking%20at%20the%20'Operations%20Manager'%20event%20logs%20on%20one%20of%20these%20servers%2C%20I%20see%20events%204512%20and%201103%20that%20have%20contents%20like%3A%3CBR%20%2F%3E%3CSTRONG%3E4512%3C%2FSTRONG%3E%3A%20Converting%20data%20batch%20to%20XML%20failed%20with%20error%20%220x80131500%22%20(0x80131500)%20in%20rule%20%22Microsoft.SystemCenter.CollectDnsEtwEvents%22%20running%20for%20instance%20%22%22%20with%20id%3A%22%7B%3CGUID%3E%7D%22%20in%20management%20group%20%22AOI-%3CGUID%3E%22.%3C%2FGUID%3E%3C%2FGUID%3E%3C%2FP%3E%3CP%3E%3CSTRONG%3E1103%3C%2FSTRONG%3E%3A%26nbsp%3BSummary%3A%201%20rule(s)%2Fmonitor(s)%20failed%20and%20got%20unloaded%2C%200%20of%20them%20reached%20the%20failure%20limit%20that%20prevents%20automatic%20reload.%20Management%20group%20%22AOI-%3CGUID%3E%22.%20This%20is%20summary%20only%20event%2C%20please%20see%20other%20events%20with%20descriptions%20of%20unloaded%20rule(s)%2Fmonitor(s).%3C%2FGUID%3E%3C%2FP%3E%3CP%3EAfter%20a%20few%20of%20those%2C%20it%20looks%20like%20the%20monitor%20job%20stops%20and%20doesn't%20restart.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI'm%20not%20sure%20where%20to%20find%20details%20on%20the%20rule%20ID%20or%20the%20record%20that%20is%20failing%20to%20parse%2C%20but%20it%20looks%20like%20something%20is%20failing%20to%20convert%20and%20after%20a%20few%20runs%20the%20whole%20system%20gives%20up%20trying.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhere%20do%20I%20go%20from%20here%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
New Contributor

Last week, I on-boarded my DNS servers into the Sentinel DNS Connector (preview). Initially, things were working great, I saw DNS operational activity like Registrations, Config changes and (the ones I was most interested in) LookupQueries.

 

After a few hours, though, one of my DNS servers stopped reporting queries. It continued to send the registrations and config changes, but I saw no more queries for a few hours - then it resumed.

 

I'm confident that the server was still handling queries during that time. We have a load balancing solution in front of two nodes. The other node reported queries, the LB Config has not changed, and the LB sees both nodes as available.

 

If that had been the only blip, no problem. But since then, out of my three DNS servers, I'm rarely getting query data back from all three at the same time and most of the time I'm getting zero to one server reporting. They do seem to self-heal, but I can wake up the monitoring agent by using the DNS solution configure panel, tweaking the 'Talkative Client Threshold' and saving, but that doesn't last very long.

 

Looking at the 'Operations Manager' event logs on one of these servers, I see events 4512 and 1103 that have contents like:
4512: Converting data batch to XML failed with error "0x80131500" (0x80131500) in rule "Microsoft.SystemCenter.CollectDnsEtwEvents" running for instance "" with id:"{<guid>}" in management group "AOI-<guid>".

1103: Summary: 1 rule(s)/monitor(s) failed and got unloaded, 0 of them reached the failure limit that prevents automatic reload. Management group "AOI-<guid>". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

After a few of those, it looks like the monitor job stops and doesn't restart.

 

I'm not sure where to find details on the rule ID or the record that is failing to parse, but it looks like something is failing to convert and after a few runs the whole system gives up trying.

 

Where do I go from here?

 

0 Replies