Forum Discussion

brentmattson's avatar
brentmattson
Brass Contributor
Apr 24, 2019

SCOM Failover Questions

I have very little SCOM experience and I am trying to understand how failover works with the couple of different SCOM servers we have. We had an outage that took down a bunch of random servers and alerts didn't send so I want to understand how its supposed to work so I know what to fix.  

 

Current setup

  • Two AD domains with a trust (in the process of migrating domains)
  • Two SCOM management servers in domain A
  • One SCOM gateway server in domain B that communicates with the management servers in domain A
  • One SCOM sql database server in domain A
  • Servers in domain A are set to the two management servers (primary and failover)
  • Servers in domain B are set to the gateway server in domain B

Can someone help me understand the flow of alerts?

  • If management server 1 goes down will management server 2 send email alerts?
  • If we lose the gateway server will alerts from domain B be sent?
  • If the database server goes down will alerts be sent?

Hopefully this makes sense.  Thank you

  • Leon Laude's avatar
    Leon Laude
    Iron Contributor

    Hi brentmattson,


    The failover for the SCOM management servers happens automatically, so if management server 1 goes down, agents will automatically start communicating with management server 2, and management server 2 will start sending notifications.

     

    However it works differently with gateway servers, you need to configure a failover gateway server for each agent, otherwise automatic failover will not happen.

     

    How to set the primary and failover management / gateway server:

    $PrimaryMS = Get-SCOMManagementServer | Where {$_.Name –match "MS1"} 
    $FailoverMS = Get-SCOMManagementServer | Where {$_.Name –match "MS2"} 
    $GatewayMS = Get-SCOMManagementServer | Where {$_.IsGateway -eq $True} 
    Set-SCOMParentManagementServer -GatewayServer: $GatewayMS -PrimaryServer: $PrimaryMS 
    Set-SCOMParentManagementServer -GatewayServer: $GatewayMS -FailoverServer: $FailoverMS
    #Agents reporting to "Gateway 1" – Failover to "Gateway 2"
    
    $PrimaryMS = Get-SCOMManagementServer | Where {$_.Name –eq "GW1"}  $FailoverMS = Get-SCOMManagementServer | Where {$_.Name –eq "GW2"}  $Agent = Get-SCOMAgent | Where {$_.PrimaryManagementServerName -eq "GW1"}  Set-SCOMParentManagementServer -Agent: $Agent -PrimaryServer: $PrimaryMS  Set-SCOMParentManagementServer -Agent: $Agent -FailoverServer: $FailoverMS

     

    For the SCOM database and data warehouse, you should have a stretch cluster or use SQL Always On to provide high availability.

    If you only have one SCOM database & data warehouse server, and it goes down, you will not receive any alerts.

     


    Best regards,
    Leon

Resources