Blog Post

ITOps Talk Blog
4 MIN READ

Planning the Monitoring of my hybrid environment.

Pierre_Roman's avatar
Pierre_Roman
Icon for Microsoft rankMicrosoft
Aug 11, 2022

Hello folks,

 

By now you may have read that I’ve rebuilt my demo environment to look like what a typical hybrid environment would look like.  I did it slowly without having to rip and replace everything in my on-prem environment.

 

Started out with establishing a site-to-site VPN, then a solution to remote into all the servers in my environment, configured a resilient way of resolving the names of all servers in my hybrid deployment, and lastly, configuring an Azure Arc Private Link Scope so that all my on-prem machines could connect to Azure using the VPN and not the open internet.

 

Now as I look at all the operational tasks I need to implement (monitoring/insights, patch management, change management, etc...)  To support all these operational requirements, I need the common underpinning provided by the Azure Log Analytics workspace.

 

Azure Log Analytics Workspace Design Strategy

As the best practices suggest, a Log Analytics Workspace (LAW) should always start with a single workspace to limit the complexity of managing and querying multiple workspaces' data. Now, when the need arises you can create additional LAWs, you should really use the fewest LAWs necessary to match your requirements.

 

Before you mention that I used to think that having a distributed number of LAWs in a Hub and spoke model was the goal.  I WAS WRONG.  I admit it... With the developments over the last couple of years and since I had conversations with the product group I now see that the fewer, the better.

 

So, for our design, we will evaluate multiple criteria by using a decision tree that has been published by the Microsoft Sentinel product group. But since I will be using my LAW for both SOC and non-SOC data it’s a useful exercise to use it as a basis for my design.  I also reviewed the design criteria listed here.

 

I stepped through the decision tree to guide my design. And based on a careful evaluation of the 8 steps listed and a review of all the accompanying notes, I have determined that a single LAW is appropriate for my environment. This may not work for you as your requirements and compliance issues may dictate otherwise however the fewer LAW the better.  (Easier to manage and generally cheaper to operate)

 

I created my LAW using the steps listed in the documentation.

 

Monitors The Performance and Health of My Servers

I will monitor my machines using VM insights. It’s part of Azure monitor and it will monitor the performance and health (Including their running processes and dependencies on other resources) of all my servers:

  • Azure virtual machines
  • Azure virtual machine scale sets
  • Hybrid virtual machines connected with Azure Arc
  • Hybrid physical machines connected with Azure Arc
  • On-premises virtual machines
  • Virtual machines hosted in another cloud environment

 

Please review this doc to see if your OS is supported

 

When I enable VM insights for my machines, 2 agents will be installed

Note:  I recorded a short introduction to Azure Monitor video, which includes a quick demo of how to set up the agent from the Azure portal: ITOps Talk: Azure Monitor Agent  Check it out.

So, to deploy VM Insight to all my environment there is multiple ways.

For my demo environment, I used the Portal. To do so, I navigated to Azure Monitor in the Azure portal, and in the Virtual Machines blade I clicked on Configure Insights

 

 

 

I can now see all my machines and whether Insights has been enabled. 

 

 

 

By clicking the Enable button for each machine I want to monitor, it will open the Monitoring Configuration page and allow you to pick the agent and Data Collection Rule you want to use.

 

 

 

In my case it was the first time I enabled a machine with VM Insights, so I had no data collection rule and I clicked on Create New and configure the rule with a meaningful name (the process will add MSVMI has a prefix to the name you enter), checked the Process and Dependencies, then selected the LAW we created earlier as my target.

 

 

 

Now. In my case I have a limited number of machines so it’s no big deal to enable them one by one.  However, if I had many machines to enable or if I wanted to ensure that any NEW machines in my environment got automatically enabled, I would go to the Other onboarding option and enable a policy to ensure all VMs and VM Scale Sets in your subscriptions and resource groups are configured for monitoring. You require to have Owner or User Access Administrator as a role for the selected subscriptions.

 

 

 

That’s it for today.  Now I wait for the data to be collected.

 

To recap. I figured out with the help of best practice documentation what Log Analytics Workspace architecture I needed, I enabled monitoring by deploying the 2 required agents and configured the data collection rule to dictate where those agents will sent the logs and metrics.

 

Next time, I’ll carry on with patch management.

 

For now, please Let me know in the comments below if there are scenarios for hybrid management that you have questions about.

 

Cheers!

Pierre

 

Published Aug 11, 2022
Version 1.0
  • WendellGillen's avatar
    WendellGillen
    Copper Contributor

    And yes, we can incorporate custom dashboarding into the Azure Portal and distribute it to teams that demand a solitary pane of glass for instant access to environmental health.

     

  • joes5's avatar
    joes5
    Copper Contributor
    Thanks for sharing good information. But you also need to know I have a problem with my. Can you solve it?
  • alexhensen1195's avatar
    alexhensen1195
    Copper Contributor

    Use of DCPROMO is still the proper way to remove a DC server in an Active Directory infrastructure. Certain situations, such as server crash or failure of the DCPROMO option, require manual removal of the DC from the system by cleaning up the server's metadata. The following detailed steps will help you accomplish this:

  • JonPaulBoyd's avatar
    JonPaulBoyd
    Copper Contributor

    Thanks for sharing! I'd be interested in how you choose specific metrics for each of the resource types you want to alert on when caring about categories such as availability, latency, errors etc, and whether you have any preference over metrics/log queries/heartbeats etc when setting up your alerts.  Any tips other than reading the Google SRE books (original + workbook)  🙂

     

  • Hey James, Thanks for the kind words.  And yes, custom dashboarding is something we can integrate into the Azure Portal and share with teams that required that single pane of glass for quick visibility into the health of the environment.  I'll put that on my list of scenarios to explore and I'll post that in the near future.

    Thanks for the suggestion.

     

  • Great blogpost Pierre_Roman Thank you for Sharing with the Community.

     

    Question for Pierre and Community :

    What do you use for Hybrid Cloud ( Azure + On-premises ) Single pain of Glass Dashboarding in Azure Monitor,  to monitor all your Servers in One View, for example like this :

     When something is wrong with your server, it will be at the top like :

    Grey = Server is Shutdown

    Orange = Warning

    Red = Service down

    Is there an Azure Monitor dashboard like this for Central Monitoring?

     

    Best Regards,

    James