SOLVED

Azure ATP sensor update and communication error

Copper Contributor

Hello,

I have noticed some errors on our ATP Health Center.

The sensors installed on two DC randomly stopped communicating.

After some time the health alert is automatically closed.

 

Concurrently with this errors I noticed on the sensor logs this entries:

 

Microsoft.Tri.Sensor.Updater.log

2020-01-31 02:41:55.5375 Warn  ResourceManager RestrictCpuAsync process doesn't exist [Process=Microsoft.Tri.Sensor]

 

Microsoft.Tri.Sensor.log

2020-01-31 02:41:34.4173 Error FrameReader`1 CaptureFrames exception, exiting

Microsoft.Tri.Sensor.FrameReaderException: Failed reading frame [resultCode=-1 message=read error: PacketReceivePacket failed]

   at bool Microsoft.Tri.Sensor.FrameReader<TCaptureDevice>.TryReadFrame(out DateTime time, out BufferSlice bufferSlice)

   at bool Microsoft.Tri.Sensor.NetworkListener.ParseFrame(FrameReader frameReader)

   at void Microsoft.Tri.Sensor.NetworkListener.CaptureFrames(LiveFrameReader[] liveFrameReaders)

 

The event ID 7031 is written on the System Event Log:

The Azure Advanced Threat Protection Sensor service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 5000 milliseconds: Restart the service.

 

The sensor version is now (Jan, 31th) 2.106.7618 and is marked as up to date but the version 2.107 is our from the Jan 26th.

 

Does anyone have any suggestion?

Thanks.

Mike

 

16 Replies

@Michele D'Angelantonio ,

Did this sensor ever worked, or did it stopped working at some point?

is the message "CaptureFrames exception, exiting" logged after every start crash ?

Di dyou recently installed any product that is using winpcap or npcap on the same machine?

@EliOfek thanks for your attention.

The sensor worked on all DCs for some months. I have the first error on january (sensors installed on june).

Nothing is changed on the DCs, the sowtware installed are:

clipboard_image_0.png

for OfficeSscan we did not use the firewall.  So I don't think there is something using npcap or winpcap.

the message "CaptureFrames exception, exiting" is logged on every crash (on my log I have only the last two) but not at the exact time.

thanks again

Mike

 

@Michele D'Angelantonio was there any recent change to the network stack?

nics removed/added ? drivers changed?

is this a VM or a physical machine?

hi @EliOfek 

I verified again every log and I found the event ID 7031 with the sensor restart even on june.

the exact error I coan find on the log is:

2020-01-30 02:17:01.9748 Error FrameReader`1 CaptureFrames exception, exiting
Microsoft.Tri.Sensor.FrameReaderException: Failed reading frame [resultCode=-1 message=read error: PacketReceivePacket failed]
at bool Microsoft.Tri.Sensor.FrameReader<TCaptureDevice>.TryReadFrame(out DateTime time, out BufferSlice bufferSlice)
at bool Microsoft.Tri.Sensor.NetworkListener.ParseFrame(FrameReader frameReader)
at void Microsoft.Tri.Sensor.NetworkListener.CaptureFrames(LiveFrameReader[] liveFrameReaders)

Two DCs are VM on Hyper-V cluster, the third is on Azure. No network changes.

The issue seems to happen completly randomly. 

My first impression was that the issue could be connected to the sensor updates but I've no evidence of that.

 

I've seen that the new version of the sensor is available from the Jan 26th but no DCs are updated (the ATP portal marks all three DCs as up to date, with version 2.106.7618).

 

 

best response confirmed by Michele D'Angelantonio (Copper Contributor)
Solution

@Michele D'Angelantonio  2.107 will be deployed in the coming days.

So just so I understand it better - it's not always crashing, but from time to time it crashes, and when it does, it's with this error?

How many total failures  since first installed?

 

I would advise to open a support ticket for this.

Since we haven't seen it before, we will need to collect more data to analyze.

 

Also, another thing you can try , is that if you currently work with the default winpcap driver, you can replace it with npcap and see if it resolves the issue.

See this for instructions:

https://docs.microsoft.com/en-us/azure-advanced-threat-protection/troubleshooting-atp-known-issues#a...

you can follow it even of not using nic teaming.

Just make sure to use version 
https://nmap.org/npcap/dist/npcap-0.9984.exe

As we currently have compatibility issues with the newer version which we haven't fixed yet.

thanks @EliOfek for your support, we will probably open a ticket in the next days.   

@ Michele
I hope the issue has been fixed? Please confirm.

@Vishal_Sharma_4224 sorry for the delay.

the issue is not really fixed. 

I found the same random behaviour in most of our implementations.

however, you can close the thread I suppose it could be a random network problem.

thanks again

mike 

@Michele D'Angelantonio 

 

Thank you..

 

However, you may try steps mentioned below:-

 

1.       Stop and disable both the ATP sensor and updater services on DC

2.        Update the NPCAP version to 0.9988(latest one) without uninstalling the existing version.

3.        Reboot  the server.

4.        Start both the sensor services and changed the startup type to automatic.

 

Please update if you this resolves your issue.

I have planned this update for the next week. I will confirm you if the issue is solved.

@Michele D'Angelantonio 

 

I 'd suggest you to install npcap version 0.9984 in the first place and then carry out those steps.

As npcap 0.9984 is the current supported version we have.

You can download version 0.9984 from here.

https://nmap.org/npcap/dist/npcap-0.9984.exe 

 

Please keep me posted on the progress.

just for confirmation: I have to install 0.9984 *instead of* 0.9988, right?

@Michele D'Angelantonio 

 

Yes, please install version 0.9984 in the first place..

 

Then try steps below in sequential manner:-

  1. Stop and disable both the ATP sensor and updater services on DC.
  2. Update the NPCAP version to 0.9988 without uninstalling the existing version(0.9984) it will do it own its own.
  3. Reboot  the server
  4. Start both the sensor services and change the startup type to automatic.

 

Hi Vishal,

Same issue as Michele, ATP sensors were installed last year and working fine. Don't know since when but now they are down and impossible to start up (7031). updated NPCAP as described but in my cas it didn't helped.
If you have any other idea i'm taking it.
Thanks,
Cyril

@CyrilDBV Are you still facing the same issue? Sorry for the delay here. 

No. I think I figured out what the situation was. When I first downloaded the sensor, I then regenerated the access code. because it was not taking the original code (which was my fault as I did not copy the full string). I should have then downloaded a new sensor package which I did not do. 

 

I therefore regenerated the access code and then downloaded the sensor. this has worked. I have tested this theory out on another server and it seems that you cannot regenerate the access code without then downloading a new sensor package. Anyway, I am good. everything is up and running thank you. @Vishal_Sharma_4224 

1 best response

Accepted Solutions
best response confirmed by Michele D'Angelantonio (Copper Contributor)
Solution

@Michele D'Angelantonio  2.107 will be deployed in the coming days.

So just so I understand it better - it's not always crashing, but from time to time it crashes, and when it does, it's with this error?

How many total failures  since first installed?

 

I would advise to open a support ticket for this.

Since we haven't seen it before, we will need to collect more data to analyze.

 

Also, another thing you can try , is that if you currently work with the default winpcap driver, you can replace it with npcap and see if it resolves the issue.

See this for instructions:

https://docs.microsoft.com/en-us/azure-advanced-threat-protection/troubleshooting-atp-known-issues#a...

you can follow it even of not using nic teaming.

Just make sure to use version 
https://nmap.org/npcap/dist/npcap-0.9984.exe

As we currently have compatibility issues with the newer version which we haven't fixed yet.

View solution in original post