VMSS based CMGs and the Cloud heavy ConfigMgr – Part 2
Published Feb 07 2022 01:52 PM 8,137 Views
Microsoft

This post is a successor to our previous blog post describing the changes to Cloud Management Gateway (CMG) infrastructure in the internal Microsoft environment to align with Zero Trust Networking (ZTN) standards. This post covers the management and migration of Cloud Service Classic CMGs to Virtual Machine Scale Sets (VMSS), aka CMGv2.

 

Case for VMSS

While the migration to CMG-first communications (described in the previous post) was underway, we were simultaneously piloting early releases of VMSS based CMGs. Within our team and broader organization, there was a push to migrate away from Cloud Service Classic for multiple reasons. In general, Cloud Service Classic is based on legacy Azure Service Manager and moving to a resource fully built on Azure Resource Manager (ARM) model brings parity with other Az resources. For instance, with native support for Autoscale in VMSS, we could make the case to our colleagues on the dev team for future integration with ConfigMgr. The default Cloud Service classic CMG implementation was also based on the older 2012 OS Family model. One of our team’s core goals is to “Stay Current” and hence it was a no-brainer to migrate to the latest VMSS based CMGs when the opportunity arose.

 

Migration to VMSS CMGs

As we were already piloting VMSS CMGs, we did not perform any conversions from Classic to VMSS. We did perform conversions to larger SKUs as they started to become available. There are some in-house monitoring tools that are required by policy to be running on VM instances, so we generally spun up new VMSS CMGs, confirmed policy compliance and switched existing Proxy Connection Points to the new CMGs in a gradual manner. Note: enviromental configurations (devices being offline, speed at which connectors are moved and inability of some devices to enumerate AD etc..) may cause some devices to not become aware of these changes and potentially continue to access old CMG URLs. We leveraged a Proactive Remediation script in Intune to detect drift and set the appropriate CMG values for the CMGFQDNs key under HKLM\SOFTWARE\Microsoft\CCM. We used the high-level SQL queries below for tracking progress:

 

--Clients by CMG
SELECT SUBSTRING(bgb.AccessMP, 1, CHARINDEX('/', bgb.AccessMP, 1)-1) as 'CMG',  
COUNT(bgb.ResourceID) as 'Count', 
SUM(bgb.OnlineStatus) as 'Online'
FROM BGB_ResStatus bgb WITH (NOLOCK)
WHERE bgb.AccessMP LIKE '%ccm_proxy%'
GROUP BY SUBSTRING(bgb.AccessMP, 1, CHARINDEX('/', bgb.AccessMP, 1)-1)
ORDER BY 'Count' DESC

--Per MP Client Count
SELECT b.[DBID] AS SiteName
       ,b.ServerName AS MPName
       ,COUNT(CASE
                      WHEN PATINDEX('%CCM_Proxy%', a.AccessMP) > 0
                             THEN a.ResourceID
                      END) AS CMGClientCount
       ,SUM(CASE
                      WHEN PATINDEX('%CCM_Proxy%', a.AccessMP) > 0
                             THEN a.OnlineStatus
                      END) AS CMGOnlineClients
       ,COUNT(CASE
                      WHEN PATINDEX('%CCM_Proxy%', a.AccessMP) = 0
                             THEN a.ResourceID
                      END) AS IntranetClientCount
       ,SUM(CASE
                      WHEN PATINDEX('%CCM_Proxy%', a.AccessMP) = 0
                             THEN a.OnlineStatus
                      END) AS IntranetOnlineClients
FROM dbo.Bgb_ResStatus a WITH (NOLOCK)
INNER JOIN dbo.BGB_Server b WITH (NOLOCK)
       ON a.ServerID = b.ServerID
GROUP BY b.[DBID] ,b.ServerName
ORDER BY b.[DBID] ,b.ServerName

 

 

Monitoring

When only a portion of our client traffic was using CMGs (prior to moving to CMG-first), we did not employ extensive monitoring beyond some general tracking of CMG client counts in operational reports and health monitoring via component status messages for CloudMgr. With the shift to CMG-first for all clients, our team added traffic trends and current connections to our Power BI based reporting :

BankimPatel_0-1643660272801.png

BankimPatel_1-1643660272846.png

Additionally, we needed to validate with all clients using CMGs 24x7, if there were any availability issues or excess queuing on the CMG channels. For example, CCMSetup is not a happy camper when timeouts greater than 30 sec occur during client installs. To provide high level insights into availability and response times, we resorted to a URL monitor in Azure Monitor. We used the synthetic URL below against all CMGs and were able to validate high-level response times executed from different Az regions and in the case of an exception, drill into the number of failures and IIS return code (500/503 etc..) as well.

https://CMGFQDN/CCM_Proxy_ServerAuth/AADAuthInfo

BankimPatel_2-1643660272875.png

 

BankimPatel_3-1643660272937.png

Azure Monitor also includes a Downtime and Outage workbook which can be used to analyze potential outages, downtime and failures based on location.

We later relied on our in-house monitoring tools running on VM instances to natively monitor and alert on conditions such as High CPU, IIS errors etc. We have also filed change requests that advocate for additional in-console health monitoring for customers that may pursue CMG-first implementations.

BankimPatel_4-1643660272975.png

 

Troubleshooting

In troubleshooting scenarios, at times, nothing beats some of the great logging already available in CMG logs on the Service Connection Point role in your Site. For example, for a quick check on a particular VM instance, you can pull up one of the CMG-cmgname-vminstance0001-CMGService.log files and filter for “Summary” to get some quick health insights:

BankimPatel_5-1643660273097.png

To quickly examine a Proxy Connector, you can also inspect the SMS_Cloud_ProxyConnector.log files and filter for “ReportTrafficData - state message to send” and amongst all the other per-endpoint details, the very first xml tag will have the MaxConcurrentRequests:

<ProxyTrafficStateDetails ServerName="abc.def.com" StartTime="12/01/2021 03:46:14" EndTime="12/01/2021 03:51:15" MaxConcurrentRequests="5073">

After upgrades/certificate update operations, if individual VM instances have needed a reboot etc.., we’ve used the Az REST API or PowerShell commands to inspect overall config/instances for the VMSS. In PowerShell, this would be via the Get-AzVMSS and Get-AzVMSSVM commands:

 

#per VM provisioning state
Get-AzVmssVM -ResourceGroupName MYVMSSCMGRG -VMScaleSetName MYVMSSCMG -InstanceView

#specific VM instance config (with/without -InstanceView parameter)
Get-AzVmssVM -ResourceGroupName MYVMSSCMGRG -VMScaleSetName MYVMSSCMG -InstanceID 1

 

 

Conclusion

We hope the posts in this series are useful in your own CMG implementations and demonstrate ways of pursuing CMG-first communication with the possibility of simplified infrastructure.

Co-Authors
Version history
Last update:
‎Feb 07 2022 11:32 AM
Updated by: