I get asked a lot about the best way to use Operations Manager Reporting to answer business specific questions such as "How do I gain efficiencies with my IT Operators?" or "How can I continually improve my monitoring system?" . Although these may be process related, the data we keep within the data warehouse can be used in conjunction with our reporting to supplement these business improvements.
Throughout this blog series I will explore various data collections and attempt to show how these questions can be best answered.
Alert data is a good place to start since it is what most operators pertain to when using the monitoring system. Typically they will have an operations console and will act upon alerts as and when they happen. Mining this data therefore allows you to see what objects cause the most amount of work, what alerts take the longest time to fix and what rules may be configured incorrectly.
Lets start with what we provide out of the box.
Allows you for a specified time period to show all alerts for a managed entity or group of managed entities. This is particularly useful to answer questions like, For the past week which of my Windows 2003 Servers has the most Priority 1 alerts.
Click for Link to Alerts Report
Available as a drill through from the Alerts report this will show you all instances of a specific alert useful for determining what alerts have high repeat counts. Also includes additional data like Owner and Ticket ID.
Click for Link to Alert Details Report
Most Common Alerts
This report is extremely useful when you are trying to fine tune your environment to ensure your operators are spending their time working on priority issues. Filtered by Management Pack you can show the most common alerts by percentage of total, the time taken to resolve these alerts and the average time taken per alert.
Click for link to Most Common Alerts Report 1
So using features such as published reports and linked reports you can create views into the Alert data that help you not only see what is happening but also drive improvements over time.
The reports above provide a certain amount of functionality but there may be cases where you need to drill in further. Questions such as "on average for last week how long were alerts open for?","How many alerts per category were closed last week?" and "What's the average length of time per alert category an alert takes to close?"
This then requires you querying the additional data available in the warehouse either to create your own reports, pivot in excel or publish to a dashboard for regular review.
A simple query that answers these additional questions is shown below.
COUNT(vAlertDetail.AlertGuid) AS TotalAlerts,
AVG(Alert.vAlert.RepeatCount) AS AverageRepeatCount,
AVG(Alert.vAlertResolutionState.TimeFromRaisedSeconds) AS AverageOpenTimeSeconds
FROM Alert.vAlertDetail INNER JOIN
Alert.vAlert ON Alert.vAlertDetail.AlertGuid = Alert.vAlert.AlertGuid INNER JOIN
vManagedEntity ON Alert.vAlert.ManagedEntityRowId = vManagedEntity.ManagedEntityRowId INNER JOIN
Alert.vAlertResolutionState ON Alert.vAlert.AlertGuid = Alert.vAlertResolutionState.AlertGuid INNER JOIN
vResolutionState ON Alert.vAlertResolutionState.ResolutionState = vResolutionState.ResolutionStateId
Alert.vAlert.RaisedDateTime BETWEEN getutcdate()-7 AND getutcdate()