SCOM Failover Questions

Brass Contributor

I have very little SCOM experience and I am trying to understand how failover works with the couple of different SCOM servers we have. We had an outage that took down a bunch of random servers and alerts didn't send so I want to understand how its supposed to work so I know what to fix.  

 

Current setup

  • Two AD domains with a trust (in the process of migrating domains)
  • Two SCOM management servers in domain A
  • One SCOM gateway server in domain B that communicates with the management servers in domain A
  • One SCOM sql database server in domain A
  • Servers in domain A are set to the two management servers (primary and failover)
  • Servers in domain B are set to the gateway server in domain B

Can someone help me understand the flow of alerts?

  • If management server 1 goes down will management server 2 send email alerts?
  • If we lose the gateway server will alerts from domain B be sent?
  • If the database server goes down will alerts be sent?

Hopefully this makes sense.  Thank you

1 Reply

Hi @brentmattson,


The failover for the SCOM management servers happens automatically, so if management server 1 goes down, agents will automatically start communicating with management server 2, and management server 2 will start sending notifications.

 

However it works differently with gateway servers, you need to configure a failover gateway server for each agent, otherwise automatic failover will not happen.

 

How to set the primary and failover management / gateway server:

$PrimaryMS = Get-SCOMManagementServer | Where {$_.Name –match "MS1"} 
$FailoverMS = Get-SCOMManagementServer | Where {$_.Name –match "MS2"} 
$GatewayMS = Get-SCOMManagementServer | Where {$_.IsGateway -eq $True} 
Set-SCOMParentManagementServer -GatewayServer: $GatewayMS -PrimaryServer: $PrimaryMS 
Set-SCOMParentManagementServer -GatewayServer: $GatewayMS -FailoverServer: $FailoverMS
#Agents reporting to "Gateway 1" – Failover to "Gateway 2"

$PrimaryMS = Get-SCOMManagementServer | Where {$_.Name –eq "GW1"}  $FailoverMS = Get-SCOMManagementServer | Where {$_.Name –eq "GW2"}  $Agent = Get-SCOMAgent | Where {$_.PrimaryManagementServerName -eq "GW1"}  Set-SCOMParentManagementServer -Agent: $Agent -PrimaryServer: $PrimaryMS  Set-SCOMParentManagementServer -Agent: $Agent -FailoverServer: $FailoverMS

 

For the SCOM database and data warehouse, you should have a stretch cluster or use SQL Always On to provide high availability.

If you only have one SCOM database & data warehouse server, and it goes down, you will not receive any alerts.

 


Best regards,
Leon