Device Health question

Highlighted
New Contributor

I am based in Sydney and this is at 2AM so needless to say I will be hitting the ZZZ at this time. I have only just made a couple of machines report into OMS to see what info it can provide. I may get the answers after getting some data in my testing but I have a couple of questions anyway. 1 - when is Device Health likely to move out of public preview? 2 - Will device health be able to provide details from BSOD dump files so techs can see what caused the blue screens without having to get physical access to the device? 3 - Is the logging of these events done in near real-time or is it uploaded on a schedule? 4 - Will device health just be on crash analytics or will it start to move into performance type monitoring and alerting at some point as well? Thanks

7 Replies
Highlighted

Hi Brett. I'm going to break up the answers by post.

"1 - when is Device Health likely to move out of public preview?"

In our current Public Preview state the solution is available to all OMS workspaces. We are integrating feedback and testing additional functionality. At some point soonish-ish (not sharing any dates today) we will declare "General Availability" for the v1 round of features, and then start previewing new features.

As a cloud service our expectation is to iterate continually, so expect ongoing Preview states for new features as they arrive.

 


 

Highlighted
"2 - Will device health be able to provide details from BSOD dump files so techs can see what caused the blue screens without having to get physical access to the device?"
Today we provide a summary of each BSOD including stop code, failure ID, etc. to accelerate diagnostic and support efforts. I love what you are proposing as a next step. I would love to get enterprises access to their own dump files as well for follow up analytics and to further accelerate support experiences (e.g., instead of doing a bunch of work to generate *another* dump to work with Microsoft or 3rd party support, you could use the original dump from the original incident.). This is tricky to implement as we have to get the privacy and access exactly right. Dump files are closely guarded.
Please tell us more about what you would do with the dump files if you had access. This will help us to justify investments in that space.
Highlighted
"3 - Is the logging of these events done in near real-time or is it uploaded on a schedule?"
Currently we use a daily cadence for pumping info into OMS. This works great for slow moving information like inventory and health of the herd trends. There are definitely cases where we'd love to have a "hot path" for low latency data in the future. Crashes are a good example. E.g., if you have a bunch of crashes that started this morning, you like to know about it before tomorrow. This is an area of active investigation. Please tell us more about scenarios where lower latency would be valuable to you.
Highlighted
"4 - Will device health just be on crash analytics or will it start to move into performance type monitoring and alerting at some point as well?"
Our first feature was kernel mode crashes. Next we added Windows Information Protection insights. Next we are working on several additional scenarios. Performance/responsiveness is "very warm" but not "red hot" in the backlog. We love performance scenarios as device responsiveness matters a great deal. We have deferred a bit on implementing perf scenarios as we get some other scenarios out the door. Feel free to tell us what performance scenarios you'd like to see. The most requested so far have been: boot, logon, resume, shutdown, logoff, and shell responsiveness (e.g., how long does is take for the start menu to render). What else would you like to see?
WRT alerts. You can use native OMS functionality to implement alerts on Windows Analytics data today, but we'd like to lower the effort involved in this for the future.
Highlighted

Thanks Matthew. I can definately understand the privacy requirements for something like this. Also I probably should have made my original question around data provided for the "AbnormalShutdownCount" instead of the BSOD kernal crashes. I just imported a couple of devices into my workspace and can see that one has some unexpected shutdowns but I cannot seem to find any info on what caused them. Am I missing a configuration to pull in that data or is this just supposed to provide a count that could indicate an issue on the device?

Highlighted
We recently added the abnormal shutdown count in addition to the bluescreen count. This could potentially be used by organizations to help quantify trends such as certain device configurations encountering high rates of non-blue-screen abnormal shutdowns, such as black-screen+hard-power-down. Currently we don't include any diagnostic details for these, only the count-- and then only in log search mode (not in any of the default graphs). If we go further down that path... what information about the abnormal shutdowns would be useful to you?
Highlighted

Thanks for that reply Matthew. I guess its difficult to provide any further details for an abnormal shutdown other than the count since there is no real diagnotic details available on what causes them. I think the count is probably as best that we can expect. ANd now that I have devices actually showing with BSOD crash details in my console, I can see the differentiation more clearly as previously I had no devices with kernal crashes. Great for the environment but not great for understanding the product entirely.

Thanks