Azure VM monitor

%3CLINGO-SUB%20id%3D%22lingo-sub-680331%22%20slang%3D%22en-US%22%3EAzure%20VM%20monitor%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-680331%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Guys%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWe%20have%20around%20500%20azure%20VMs%20in%20our%20subscription.%20What%20is%20the%20best%20way%20to%20monitor%20them%20for%20below%20alerts%3B%3C%2FP%3E%3COL%3E%3CLI%3ECPU%20utilization%3C%2FLI%3E%3CLI%3ELogical%20Disk%20Utilization%3C%2FLI%3E%3CLI%3EMemory%20Utilization%3C%2FLI%3E%3CLI%3EVM%20Up%2Fdown%20alerts.%3C%2FLI%3E%3C%2FOL%3E%3CP%3ECan%20we%20use%20Log%20Analytics%20Workspace%20to%20monitor%20them%20(with%20Queries)%20OR%20%26nbsp%3Bis%20there%20any%20way%20we%20can%20use%20Azure%20monitor%20with%20metrics%20(here%20I%20don%E2%80%99t%20find%20Guest%20Metrics%20while%20creating%20new%20rules%20though%20I%20enabled%20them%20in%20Diagnostics%20settings%20and%20i%20can%20see%20them%20in%20metrics%20tab)%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-680331%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Log%20Analytics%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-681103%22%20slang%3D%22en-US%22%3ERe%3A%20Azure%20VM%20monitor%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-681103%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F338025%22%20target%3D%22_blank%22%3E%40roopesh_shetty%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EFor%201%2C2%20and%203%20Metric%20alerts%20are%20on%20the%20fastest%20pipeline%2C%20so%20you%20will%20see%20them%20quickly%20-%20subject%20to%20being%20able%20to%20set%20Guest%20metric%20alerts%20in%20your%20case.%26nbsp%3B%20However%20in%20all%20three%20cases%20being%20able%20to%20view%20and%20query%20the%20data%20in%20log%20Analytics%20can%20also%20provide%20benefits.%26nbsp%3B%20E%2Cg%2C%20Alerts%20just%20look%20at%20the%20past%2024hrs%2C%20so%20you%20will%20miss%20patterns%20that%20occur%20beyond%20that%2C%20a%20Log%20query%20can%20go%20back%20and%20look%20at%20whatever%20data%20you%20have%20retained.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Fportal.loganalytics.io%2FDemo%3Fq%3DH4sIAAAAAAAAA41RTUvDQBC9F%252FofHgExgUDx5KleAhUPSsDWq0x3p000uxtmN5VIf7ybIDXRi3Mb5n3Mm1mtUFQkAdQ0UM60XWDx1%252Bg8a%252Bjav%252BMgzDkMGyc9yGoU5S6HO7EgVIyGfICFpt5juShZDsvFGR8VC2NbG75ny0Ihqt2Bji691dkPIC1cZ6PhExnGeo3kCqU4xd47GdnJ6PhgfSCr%252BAJ73bpATZJFRyBCh%252FortRsyPI6LJ1PkzPUbu4kp8dyS%252BoflGa24N1ZhHjCfrnBpXqjpeCT5zhiS%252BpNBp2M6HWfY93PyvrbpL%252FGbajycsNXD6eNQDY9bfAEA6noywgEAAA%253D%253D%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3EGo%20to%20Log%20Analytics%20and%20Run%20Query%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ESample%20output%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CDIV%3E%0A%3CTABLE%20cellspacing%3D%221%22%20cellpadding%3D%225%22%3E%0A%3CTBODY%3E%0A%3CTR%3E%0A%3CTH%3ECounterName%3C%2FTH%3E%0A%3CTH%3ETimeGenerated%3C%2FTH%3E%0A%3CTH%3Eavg_CounterValue%3C%2FTH%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Processor%20Time%3C%2FTD%3E%0A%3CTD%3E2019-06-04T05%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E60.91787148115488%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Used%20Memory%3C%2FTD%3E%0A%3CTD%3E2019-06-04T05%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E31.629809780248415%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Free%20Space%3C%2FTD%3E%0A%3CTD%3E2019-06-04T05%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E82.89545989470517%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Free%20Space%3C%2FTD%3E%0A%3CTD%3E2019-06-04T02%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E82.91151748380445%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Processor%20Time%3C%2FTD%3E%0A%3CTD%3E2019-06-04T02%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E59.627356628337175%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Used%20Memory%3C%2FTD%3E%0A%3CTD%3E2019-06-04T02%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E31.471773493138787%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Processor%20Time%3C%2FTD%3E%0A%3CTD%3E2019-06-04T03%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E59.52993177313613%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Free%20Space%3C%2FTD%3E%0A%3CTD%3E2019-06-04T03%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E82.8896254469986%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Used%20Memory%3C%2FTD%3E%0A%3CTD%3E2019-06-04T03%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E31.54788573413215%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E%25%20Processor%20Time%3C%2FTD%3E%0A%3CTD%3E2019-06-04T04%3A00%3A00Z%3C%2FTD%3E%0A%3CTD%3E59.35017016026048%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3C%2FTBODY%3E%0A%3C%2FTABLE%3E%0A%3C%2FDIV%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ENeither%20do%20methods%20do%20%234%20well%2C%20there%20are%20some%20data%20points%20in%20the%20Log%2C%20such%20as%20Heartbeat%20but%20its%20not%20a%20reliable%20up%2Fdown%20indicator%20on%20its%20own.%26nbsp%3B%20e.g.%20the%20Heartbeat%20can%20fail%20but%20the%20server%20is%20still%20up.%26nbsp%3B%20You%20can%20check%20certain%20EventIDs%2C%20but%20if%20you%20get%20a%20crash%20and%20the%20server%20never%20comes%20back%20up%2C%20you%20might%20not%20see%20what%20the%20last%20EventID%20was.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EQueries%20that%20can%20show%20Heartbeat%20are%3A%3C%2FP%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Fportal.loganalytics.io%2FDemo%3Fq%3DH4sIAAAAAAAAA12NMQ7CMBAE%252B7xiO5LKVTqoKKBICQ%252B44BO2sM%252FIvhBAPB4joSDR7uzsGoOj0I18oDEwTileJ%252BVcGmMw%252BKKgEHCRNMuPQR0prLeyUhQWC4Jjyjpyzb1UzghU5R4uTXVsv9DmhTLFSNk%252FGUPtLGgT6d4efOQdC2dSth3GB7bf1485O85%252FFtagc2p71zVvktXsU8sAAAA%253D%26amp%3Btimespan%3DP1D%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3EGo%20to%20Log%20Analytics%20and%20Run%20Query%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Fportal.loganalytics.io%2FDemo%3Fq%3DH4sIAAAAAAAAA22RwWqEMBCG74Lv8B%252B1SHVlrxaWPdRjKb2X0R3XQIwlmZRu2YdvDGtLa29JZvJ%252F3zBlicM7KU2d0kousCScJmWJI%252Bne63CDjAz624N5AFM%252Fop%252BN4V74FE7Tmxe2adIyWemYJCZ1yrySQDl4F9pkhmOJqaImxtmSMsvrDuPsbQEn4bcyZ%252FAH9aIvqPex4kDnOU2ucH6ayKpPxriCnti2oQVNsPBGshzd5QbOXgLlkQ0v2qcCu7FYgrJ6P%252BZ5geO39TUAhc1pHVbzT6oahmwDe0BVQKznAgNpx%252FlvOZmF9GHNauMENz81ZFtIE7Oi%252BSqFrVVcwfOygeY%252Fwt2uqu6rst4nX9xsaI3aAQAA%26amp%3Btimespan%3DP1D%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3EGo%20to%20Log%20Analytics%20and%20Run%20Query%3C%2FA%3E%3C%2FP%3E%0A%3CP%3EMore%20examples%20are%20shown%20in%20the%20LOGS%20portal%2C%20when%20you%20open%20a%20new%20Query%20tab.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Contributor

Hi Guys,

 

We have around 500 azure VMs in our subscription. What is the best way to monitor them for below alerts;

  1. CPU utilization
  2. Logical Disk Utilization
  3. Memory Utilization
  4. VM Up/down alerts.

Can we use Log Analytics Workspace to monitor them (with Queries) OR  is there any way we can use Azure monitor with metrics (here I don’t find Guest Metrics while creating new rules though I enabled them in Diagnostics settings and i can see them in metrics tab)?

1 Reply

@roopesh_shetty 

 

For 1,2 and 3 Metric alerts are on the fastest pipeline, so you will see them quickly - subject to being able to set Guest metric alerts in your case.  However in all three cases being able to view and query the data in log Analytics can also provide benefits.  E,g, Alerts just look at the past 24hrs, so you will miss patterns that occur beyond that, a Log query can go back and look at whatever data you have retained.

 

Go to Log Analytics and Run Query

 

Sample output:

 

CounterName TimeGenerated avg_CounterValue
% Processor Time 2019-06-04T05:00:00Z 60.91787148115488
% Used Memory 2019-06-04T05:00:00Z 31.629809780248415
% Free Space 2019-06-04T05:00:00Z 82.89545989470517
% Free Space 2019-06-04T02:00:00Z 82.91151748380445
% Processor Time 2019-06-04T02:00:00Z 59.627356628337175
% Used Memory 2019-06-04T02:00:00Z 31.471773493138787
% Processor Time 2019-06-04T03:00:00Z 59.52993177313613
% Free Space 2019-06-04T03:00:00Z 82.8896254469986
% Used Memory 2019-06-04T03:00:00Z 31.54788573413215
% Processor Time 2019-06-04T04:00:00Z 59.35017016026048

 

Neither do methods do #4 well, there are some data points in the Log, such as Heartbeat but its not a reliable up/down indicator on its own.  e.g. the Heartbeat can fail but the server is still up.  You can check certain EventIDs, but if you get a crash and the server never comes back up, you might not see what the last EventID was.

 

Queries that can show Heartbeat are:

Go to Log Analytics and Run Query

Go to Log Analytics and Run Query

More examples are shown in the LOGS portal, when you open a new Query tab.