Azure Admin Monitoring Dashboard

Copper Contributor

Hello,

 

I have recently just finished setting up and sharing a dashboard I have created based on several workbooks I have created for some of my team members to quickly look at see the status on specific resources within Azure. These include the status of specific services that are not running, heartbeats missed within the last 5 minutes, free disk space, failed backups and missing critical updates.

 

The dashboard itself is I would say a bit clunky and not ideal to look at and I would in no means say user-friendly but it does get the job done. I was curious if anyone has had any experience with setting something like this up and had anything they could share, if similar, to what I have done. I assume I am not the first and only to have done something like this.

 

I am faced with a few challenges in this effort. I would prefer to do this natively within Azure and not use any third-party tools. My experience with KQL is limited, however I have no problem learning more. I also am working with several different subscriptions across lighthouse which means I am faced with only being able to query across 100 different LA workspaces at one time.

 

I look forward to any recommendations or knowledge sharing the community might be able to provide!

 

Thank you!

4 Replies

@cdranschak 

 

Are you planning on using Azure Monitor - Workbooks or Azure Dashboards - they are two different things?

 

The Workbooks team, have samples here: microsoft/AzureMonitorCommunity: An open repo for Azure Monitor queries, workbooks, alerts and more ... and maybe look at some examples from Billy York https://www.cloudsma.com/2020/10/ultimate-azure-inventory-dashboard/ or some of my examples, like this one for multiple Workspaces (and via Lighthouse) KQLpublic/KQL/Workbooks/Azure Sentinel Central at master · CliveW-MSFT/KQLpublic (github.com)

100+ Workspaces is a lot, you will need to find a way to split these, not only to get below 100, but to reduce latency.  Maybe filter by Azure Region, using the Workspace location. This will also help the query and display performance.  
You would normally use a ARG query within a Workbook parameter, to filter on Subscription and Workspaces, lets call our new parameter "myRegion" using this ARG query.

resources
| where type =~ 'microsoft.operationalinsights/workspaces' 
| distinct location

and then your Workspace parameter would be:

resources
| where type =~ 'microsoft.operationalinsights/workspaces' 
| where location =={myRegion}


Hope that helps
 

@CliveWatson

 

Thanks for the response. We are logging everything to log analytics and then I have created workbooks from queries of the logs and am pinning those to a dashboard. I hope that answers your question as to what I am using.

 

As for querying across the lot of the workspaces I had decided to beak it apart alphabetically when querying workspaces as there is a 100 limit. For example my query for diskspace is for workspace A-l and another M-Z.

 

I know this is all possible but my purpose\intent is to make the displayed results more concise, easy to look at and useable from a users perspective when it comes to a dashboard for administrators to use to make decisions for support purposes so there's not so much across the page.

 

For example...lets go back to the disk space query. Rather than have a box pinned with Disks less than 10% available and it lists every disk it would list just a number. I can then click the number on the dashboard and it takes me to the list. Or for updates: Rather than have the entire list of servers with their respective list of updates I have a row for server missing critical, a row for servers missing security and they have numbers that I can click that takes me to the list.

 

Ideally that's where I would like to get to as what I have now is kind of information overload and clunky. Thanks again for your response!

@cdranschak 

 

1. Thanks that explains the Dashboard

2. Are all the workspaces in the same Azure Regions, if not, when you split them alphabetically, you could be trying to get data from many local or remote clusters.  This may affect query time.

3. For the disk example, does this demo help, are you trying to click on a item to aid filtering later on (happy to share this file with you) byLocationDemo.gif

@CliveWatson 

 

No. Unfortunately they are spread across a few regions, although we do not typically see any issues with query times. 

 

I would say even that is a bit too busy for what I am talking about. If you were an admin and looking at a dashboard and you wanted a granular view, specific things you wanted to know about right then and there all in one place. Lets consider one of those things to be server up and server down. If you have hundreds of servers up/down you wouldn't want to see all of them on a dashboard but rather maybe just a number. From the dashboard you could click that # and it would take you to say a workbook where it displayed the list of servers that were up or down.

 

That is more or less what I am looking to do, but with about 5 or 6 different items to measure.