SOLVED

Direct Routing failover behaviour and lack of recovery from MS Teams side

Copper Contributor

Hi,

 

We have Direct Routing working fine towards two separate SBC's (each with own wildcard cert) and had calls routing from MS Teams to both of them randomly for the first test customer tenant. All good.

 

sbc1.domain.nz

sbc2.domain.nz

 

We then wanted to understand the failover performance and recovery handling if we were to lose a regional SBC and so we performed the following test.

 

- Perform calls from MS Teams to PSTN and calls routing towards both SBC1 and SBC2 randomly as both SBC's are within customer tenant voice route as defined below.

New-CsOnlineVoiceRoute -identity "unrestricted.voiceroute" -NumberPattern ".*" -OnlinePstnGatewayList cust-tenant1.sbc1.domain.nz, cust-tenant1.sbc2.domain.nz -Priority 1   -OnlinePstnUsages "NZ.PU"

 

- We then shutdown the SBC public interface for sbc1.domain.nz and calls were routed from MS Teams to sbc2.domain.nz interface only, as expected

- We then shutdown the SBC public interface for sbc2.domain.nz and calls continued to try and route towards sbc2.domain.nz, even though OPTIONS had been failing for a while towards this interface from MS Teams.

- We then activated the SBC public interface for sbc1.domain.nz. OPTIONs were sent and ACK from sbc1 to MS Teams OK. However MS Teams never started to send OPTION's to this sbc1.

- We waited for 30 mins and there was no change. 

- We then activated the SBC public interface for sbc2.domain.nz. OPTION's were quickly established in both directions and calls started working from MS Teams towards this sbc2 quite quickly.

- It has now been 4 days and I am still unable to get MS Teams to send OPTION's messages to sbc1.  The MS teams admin portal shows the "SIP Options Status"  for sbc1 as "Warning"

 

"There is a problem with the SIP OPTIONS.
The Session Border Controller exists in our database (your administrator created it using the command New-CSOnlinePSTNGateway). But we have difficulties determining SIP Options status. Please check in 15 minutes."
 
I have tried the following to recover this situation.
1. Deleted the sbc1 from msteams admin portal for 24 hours and recreated. No difference.
2. Change the SIP Port used for sbc1. No difference.
 
My observations are as follows :-
1. MS Teams appears to only try and recover SIP OPTIONS for the "last" SBC that was working.
2. Other SBC's that failed prior to last working SBC are not recovered from MS Teams side. 
3. Don't perform this kind of controlled failure when everything is working fine .. you will regret it :)
 
If anyone has any advice on how we can get sbc1 working normally again (i.e. get MS Teams to sned OPTIONS towards it) .. your help would be appreciated.
 
Thanks
David
3 Replies

Hi @David_NOW_NZ
I'm also operating two geographical dispersed SBCs and can turn off one of two for maintenance without any issues after maintenance is done it get's back to work.

I'd try to disable and re-enable some basic sbc settings.

Did you also try to disable and enable it again in the Teams Admin Center after you re-created sbc1?

Erik365Online_0-1617826217303.png

 

Did you also check the SIP options settings in the sbc1 settings in the Teams Admin Center? Maybe try to disable and re-enable it?

Erik365Online_1-1617826332551.png

I'd also re-check the firewall rules if there is one in front of the public interface of sbc1, just to be sure that all the (new) Teams Phone System Signaling IPs can reach the public interface ip of sbc1. Plan Direct Routing - Microsoft Teams | Microsoft Docs 



best response confirmed by ThereseSolimeno (Microsoft)
Solution

Hi @Erik365Online 

 

Thanks for your suggestions.

 

It seems to have sprung back to life today after I deleted the SBC and recreated it (yesterday evening) using powershell commands below. Not sure how long it took to come back to life after this change but I will look back through SBC verbose logs to try and determine that.  It didnt come back straight away.

 

Remove-CsOnlinePSTNGateway

New-CsOnlinePSTNGateway

 

Prior to this, I had tried disabling the sbc in admin portal for 12 hours and then re-enabling it, but this did not help. Therefore it looks like if you (or anyone) get yourself in this situation, the delete and recreation of SBC using powershell appears to be the way to fix it.

 

Cheers

David

 

@David_NOW_NZ 

You are welcome.

Good to hear that you could fix it by re-creating it via PowerShell.

1 best response

Accepted Solutions
best response confirmed by ThereseSolimeno (Microsoft)
Solution

Hi @Erik365Online 

 

Thanks for your suggestions.

 

It seems to have sprung back to life today after I deleted the SBC and recreated it (yesterday evening) using powershell commands below. Not sure how long it took to come back to life after this change but I will look back through SBC verbose logs to try and determine that.  It didnt come back straight away.

 

Remove-CsOnlinePSTNGateway

New-CsOnlinePSTNGateway

 

Prior to this, I had tried disabling the sbc in admin portal for 12 hours and then re-enabling it, but this did not help. Therefore it looks like if you (or anyone) get yourself in this situation, the delete and recreation of SBC using powershell appears to be the way to fix it.

 

Cheers

David

 

View solution in original post