Skype for Business Blog

15 MIN READ

How Communicator Uses SDP and ICE To Establish a Media Channel

Brass Contributor

May 20, 2019

First published on TECHNET on Apr 22, 2009

This article describes the steps taken by Office Communicator to establish a Communicator call between an OC client sitting on a typical home network, connected to the Internet using a NAT router and another OC client placed on the company's internal network. The user initiating the call will be Alice and the data and logs are collected from Alice's computer.

Author: Bernd Ott

Publication date: April 22, 2009

Product version: Office Communications Server 2007 R2

The main problem when establishing a media connection (audio or video) between Alice and Bob is finding a way media can travel through the intermediate network, without being blocked. This is where SDP, ICE, STUN and TURN come into the picture.

SDP

Office Communicator uses SDP (Session Description Protocol) to provide initialization parameters for the media stream in an audio or audio/video session. It is a proposed standard published by IETF in several RFCs (e.g. RFC 4566) and completely based on ASCII, which makes it easy to read.
Although SDP helps initializing media flow between two entities, every client is only describing its own view of the connection. If you ever wondered, what side of the media stream the advertised IP addresses in the SDP blob belong to, remember SDP as the "Self Description Protocol".

ICE

The Interactive Connectivity Establishment (ICE) Extensions protocol is used to establish media flow between two endpoints. In typical deployments, NATs or firewalls might exist between the two endpoints that are intended to communicate. NATs and firewalls are deployed to provide private address space and to "secure" the private networks to which the endpoints belong. This type of deployment blocks incoming traffic. If the endpoint advertises its local interface address, the remote endpoint might not be able to reach it. Advertising the address exposed by the NAT or firewall is not as straightforward, because the endpoints would first need to determine the external routable mapping address created by the NAT (NAT-mapped address) for its local interface address. Moreover, NATs and firewalls exhibit differint behavior in the way they create the NAT-mapped addresses. Section 5 of [IETFDRAFT-STUN-02] provides an overview of NAT types.
ICE provides a mechanism to assist media in traversing NATs without requiring the endpoints to be aware of their network topologies. ICE assists by identifying one or more transport addresses, which the two endpoints can potentially use to communicate and ICE determines which transport address is best for both endpoints to use for their media session.

Provisioning Process During OC Sign-in

Before going into the details of call establishment, I want to explain what is happening during the sign-in of Office Communicator, regarding provisioning of OC with A/V Edge server names and credentials. Here is a brief overview of what is happening on the SIP channel, while starting a Communicator sign-in:

After successfully registering, the OC client asks for in-band provisioning information. This is done in a SIP SUBSCRIBE transaction, asking for Content-Type "application/vnd-microsoft-roaming-provisioning-v2+xml" and requesting "ServerConfiguration".
SIP SUBSCRIBE for Content-Type:application/vnd-microsoft-roaming-provisioning-v2+xml
<provisioningGroupList>
<provisioningGroup name="ServerConfiguration"/>
...
</provisioningGroupList

The OCS Frontend server returns the requested server configuration in a big XML blob. The interesting information for us gets enclosed in the "mrasUri" tag:
SIP 200 OK
<provisionGroupList>
<provisionGroup name="ServerConfiguration">
<mrasUri> sip:avauthentication.contoso.com@contoso.com;gruu;opaque=srvr:MRAS:2jRa2f1gbU</mrasUri>
...
</provisionGroup>
...
</provisionGroupList>

The GRUU in the mrasUri field provides the necessary information on where we can obtain our credentials for the A/V Edge server service. Asking for our credentials is the next step in the provisioning process. You might notice, that Alice requests credentials that are valid for 480 minutes and provide information that she is located on the external network:
SIP SERVICE avauthentication.contoso.com@contoso.com;gruu;opaque=srvr:MRAS:2jRa2f1gbU
<request requestID="128584360" version="2.0" to="sip:avauthentication.contoso.com@contoso.com;gruu;opaque=srvr:MRAS:2jRa2f1gbU" from="sip:alice@contoso.com">
<credentialsRequest credentialsRequestID="128584360">
<identity>sip:alice@contoso.com</identity>
<location>internet</location>
<duration>480</duration>
</credentialsRequest>
</request>

In return, the client gets all necessary information to connect and authenticate against the A/V Edge server for later usage:
SIP 200 OK
<response xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance " xmlns:xsd=" http://www.w3.org/2001/XMLSchema " requestID="128584360" version="2.0" serverVersion="2.0" to="sip:avauthentication.contoso.com@contoso.com;gruu;opaque=srvr:MRAS:2jRa2f1gbU" from="sip:alice@contoso.com" reasonPhrase="OK" xmlns=" http://schemas.microsoft.com/2006/09/sip/mrasp ">
<credentialsResponse credentialsRequestID="128584360">
<credentials>
<username>AgAAJAFTru4ByZVFx9H5de8Za9IwTrB=</username>
<password>I+hdiU3UffKdZVxy85tHmkTrx1g=</password>
<duration>480</duration>
</credentials>
<mediaRelayList>
<mediaRelay>
<location>internet</location>
<hostName>avext.contoso.com</hostName>
<udpPort>3478</udpPort>
<tcpPort>443</tcpPort>
</mediaRelay>
</mediaRelayList>
</credentialsResponse>
</response>

Starting a PC2PC Call by obtaining the Candidate List

When Alice initiates the Communicator call to Bob, before sending out any SIP INVITE, OC needs to determine what possible candidates Alice can send to Bob. This is the time for ICE, STUN and TURN and if you want to see more details on what is happening, you will have to use a network sniffing tool of your choice. Two very popular tools are Network Monitor 3 and Wireshark.

The candidate list includes the local list of IP address and port combinations (host candidates), a list of IP address and port combinations allocated by a NAT device (server reflexive candidates) and a list of TURN server IP address and port combinations (relayed candidates).

Here is a typical sequence of packets you can see while obtaining the candidate list. Please keep in mind that connection testing takes place for several different TCP and UDP port numbers. The testing for TCP and UDP candidates is done in parallel, although the following pictures implies that TCP and UDP tests are done in serial order.

The TURN Allocate Response messages from the A/V Edge server include all information Alice needs to determine whether she is sitting behind a NAT device and what IP/Protocol/Port combination to use for all candidates provided in the subsequent SIP/SDP offer.

Converting the XORMappedAddress Field

Here is an example how information for the TURN Allocate Response gets parsed:

The "MappedAddress" field contains the IP address and port combination of the A/V Edge server interface, Bob can use sending media information to.

The XORMappedAddress field contains information about Alice's IP address and port combination from the A/V Edge server's point of view. This field provides the information Alice needs for detecting a NAT device and what her internal IP address and port gets mapped to on the external side.

In its current version, the Network Monitor parser does not convert the XORMappedAddress field and you might want to manually check the content of that field.

To convert the IP address, you have to XOR it with the 32 most significant bits of the TransactionID field:

Converting the XORMappedAddress IP to a hex view: 0x75895AF6 (117.137.90.246)

XOR with the 32 most significant bits of TransactionID: 0x2112A442

0x549BFEB4

Convert to a human readable format: 84.155.254.180 (NAT-mapped IP address)

Alice's local (private) IP address of 192.168.0.103 gets mapped to 84.155.254.180 through intermediate NAT devices. For the process of obtaining the list of candidates, it does not matter how many NAT devices are between Alice and the A/V Edge server. Only the NAT device closest to the A/V Edge server will be relevant for that process.
To convert the port number, you have to XOR it with the 16 most significant bits of the TransactionID field:

Converting the XORMappedAddress port to a hex view: 0xE263

XOR with the 16 most significant bits of TransactionID: 0x2112

0xC371

Convert to decimal format: 50033 (NAT-mapped port number)

You will see those IP addresses and ports later in the SIP/SDP Offer packet, sent to Bob.

Negotiating the Candidate with Bob

Generally speaking, there is a lot of SIP and candidate testing traffic, before the media channel will be established. Here is a high level overview on what is going on when Alice and Bob are both using Office Communicator 2007 R2 clients. In case of MPOP or legacy clients, the following sequence will differ.

As a first step, Alice will send out a "SIP INVITE", including her list of candidates. With Office Communciator R2, Bob will return his list of candidates in the "SIP 183 SESSION PROGRESS". After Alice received the SDP candidate list from Bob, she will start connection testing and build a matrix with possible media channels to Bob (for more details, please check the "ICE Candidate testing" section). The same process happens on Bob's side. Depending on the priority of the possible candidates, Alice will send a single SDP candidate in a second "SIP INVITE" and, as she is the controlling agent, she will ask Bob to use certain candidates from his list for this media session. Bob now has to double check the proposed candidates from Alice and will accept the candidates in his answer packet ("SIP 200 OK").
As both parties agreed on their IP, protocol and port combinations, they will now create the media channels and media information gets transmitted between both parties. Depending on the intermediate network layout, this might be a direct connection (always preferred) or a relayed connection with the A/V Edge as the data relay.

Where to find SDP information in a SIP Message Flow

The "SIP INVITE" contains an SDP block, also called the SDP Offer and provides the list of all candidates Alice identified in the previous ICE tests.
Depending on what OC client version Bob is using, the SDP Answer information can be found in different places:

- SIP 18x provisional response only for OC 2007 R2, supporting Early Media

- SIP OK valid for all OC client versions

MPOP differences

In case Bob is signed in to more than one OC client, you will see several "SIP 183 SESSION PROGRESS" replies. Those replies differ in the "SIP To" header. For every OC Client, you can identify a different "epid" and "tag" field. In addition to that, every MPOP client sends his list of candidates and candidate testing is done for all of them. As soon as the client receives a media packet on one of the candidates protocol/port combinations, the remaining endpoints will be dropped.

Differences with legacy clients

There is no "SIP 183 SESSION PROGRESS" and "SIP PRACK" transaction with legacy OC clients. Bob returns his candidate list with the "SIP 200 OK" and candidate testing starts after that. This is the reason, why the media channel gets established later with OC 2007 than with OC 2007 R2 and an initial greeting from Bob or Alice might get cut off.

OC 2007 R2 Additions

The candidate lists exchanged between two OC 2007 R2 clients, establishing a call between a remote party (on the Internet) and an internal party (on the corporate network) changed between Office Communicator 2007 and Office Communicator 2007 R2. We changed the SDP section, because we had to solve issues with Multiple Points of Presence (MPOP) that were not covered with the previous version of ICE. In addition, we enhanced support for Early Media and added the new modality for Application Sharing. For more details on what changed for media traversal, please check Alan Shens' post at http://www.unifysquare.com/blog/post/OCS-2007-R2-Whate28099s-new-for-Media-Traversal.aspx .

Starting with OCS 2007 R2, you will see two almost similar parts of SDP information in SIP INVITE requests from the new Office Communicator 2007 R2. There have been changes to the ICE negotiation that cannot be used with older versions of OCS. Therefore Office Communicator has to offer two SDP versions during the initial session setup.

The content for legacy clients using ICEv6 (see IETFDRAFT-ICENAT-06) starts with a section, containing the "Content-Disposition" information of "ms-proxy-2007fallback":

Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional; ms-proxy-2007fallback

The content for the clients using the new ICEv19 (see IETFDRAFT-ICENAT-19) version starts with the following lines and does NOT include the ms-proxy-2007fallback attribute:

Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional

The "ms-proxy-2007fallback" parameter in the "Content-Disposition" header field is used as a hint to the Proxy Server to retry the SIP INVITE with only a single body when a "415 Unsupported Media Type" response is received, indicating the remote User Agent does not accept multipart SDP messages.
You will only see the multipart SDP information in the first SIP INVITE. All subsequent SIP messages containing SDP information will only use the SDP format suitable for the clients involved.

SDP Details

Here is an example for an OC 2007 R2 client running in a private network (behind a NAT device) and using IP 192.168.100.112 on its NIC.
The next lines are from the ICEv6 candidate list:

[---------]:[---------1----------] 2 [-----3------] [4] [-5-] [-------6-----] [-7-]
a=candidate:uuK9Gym3F0zReasv+FyKCM 1 UwaQkvk5hiWgVg UDP 0.850 192.168.100.112 50005
a=candidate:uuK9Gym3F0zReasv+FyKCM 2 UwaQkvk5hiWgVg UDP 0.850 192.168.100.112 50031
a=candidate:9/vLJjcL+aemcsR1AxpVM0 1 6Puckob7qP8GFA TCP 0.190 213.199.141.181 56909
a=candidate:9/vLJjcL+aemcsR1AxpVM0 2 6Puckob7qP8GFA TCP 0.190 213.199.141.181 56909
a=candidate:iHdvoVfXm2i2IPGmfO0xa4 1 UbNAQVuHoEoHVA UDP 0.490 213.199.141.181 52003
a=candidate:iHdvoVfXm2i2IPGmfO0xa4 2 UbNAQVuHoEoHVA UDP 0.490 213.199.141.181 50126
a=candidate:3ECLhrmJtmDK/j3FY4O5Tw 1 c6zoXvSFIqRfcw TCP 0.250 171.231.102.218 50025
a=candidate:3ECLhrmJtmDK/j3FY4O5Tw 2 c6zoXvSFIqRfcw TCP 0.250 171.231.102.218 50025
a=candidate:FKVarEmvn9yEvjD5xahFa0 1 qNTE/3CmryPpGA UDP 0.550 171.231.102.218 50015
a=candidate:FKVarEmvn9yEvjD5xahFa0 2 qNTE/3CmryPpGA UDP 0.550 171.231.102.218 50028

1.    This is the hash of a user name
2.    This is an indicator for (S)RTP or (S)RTCP
3.    This is a hash of the user password
4.    The protocol used (UDP or TCP) on this IP and port
5.    This is a weight, indicating which of the candidates is preferred over the others. Higher numbers are preferred over lower numbers, in case a connection can be established to this IP, port and protocol combination. Generally speaking, UDP gets preferred over TCP and local candidates are preferred over NATed candidates (STUN), which are preferred over relayed IP addresses (TURN).
6.    The IP address the second party can connect to
7.    The port number the second party can connect to. If you take a closer look at the port numbers used, you will see, that UDP ports for RTP and RTCP always differ, whereas TCP ports for RTP and RTCP get multiplexed over the same port number.

The following lines are from the ICEv19 candidate list:

[---------]:1 2 [---3--] [----4---] [------5------] [-6-] [---7---] [---------------8---------------]
a=candidate:1 1 UDP      2130706431 192.168.100.112 50036 typ host
a=candidate:1 2 UDP      2130705918 192.168.100.112 50032 typ host
a=candidate:2 1 TCP-PASS    6556159 213.199.141.181 52899 typ relay raddr 213.199.141.181 rport 52899
a=candidate:2 2 TCP-PASS    6556158 213.199.141.181 52899 typ relay raddr 213.199.141.181 rport 52899
a=candidate:3 1 UDP        16648703 213.199.141.181 57309 typ relay raddr 213.199.141.181 rport 57309
a=candidate:3 2 UDP        16648702 213.199.141.181 54054 typ relay raddr 213.199.141.181 rport 54054
a=candidate:4 1 TCP-ACT     7076863 213.199.141.181 52899 typ relay raddr 213.199.141.181 rport 52899
a=candidate:4 2 TCP-ACT     7076350 213.199.141.181 52899 typ relay raddr 213.199.141.181 rport 52899
a=candidate:5 1 TCP-ACT 1684797951 171.231.102.218 50032 typ srflx raddr 192.168.100.112 rport 50032
a=candidate:5 2 TCP-ACT 1684797438 171.231.102.218 50032 typ srflx raddr 192.168.100.112 rport 50032
a=candidate:6 1 UDP      1694234623 171.231.102.218 50033 typ srflx raddr 192.168.100.112 rport 50033
a=candidate:6 2 UDP      1694234110 171.231.102.218 50039 typ srflx raddr 192.168.100.112 rport 50039

1.    The first column after "a=candidate" is called foundation. According to draft-ietf-mmusic-ice-19, the foundation is used to optimize ICE performance in the Frozen algorithm.
2.    This is the component ID, an indicator for (S)RTP or (S)RTCP
3.    This column is describing the protocol used. When the user agents perform address allocations to gather TCP-based candidates, two types of candidates can be obtained. These are active candidates (TCP-ACT) or passive candidates (TCP-PASS). An active candidate is one for which the agent will attempt to open an outbound connection, but will not receive incoming connection requests. A passive candidate is one for which the agent will receive incoming connection attempts, but not attempt a connection.
4.    This is the weight used to prioritize single candidates. Higher numbers are preferred over lower numbers, in case a connection can be established to this IP, port and protocol combination. Each candidate for a media stream must have a unique priority (positive integer up to 2^31-1).
5.    The IP address the second party can connect to.
6.    The port number the second party can connect to. The port for TCP RTP and RTCP gets multiplexed, whereas UDP ports for RTP and RTCP always differ.
7.    This is type information, describing the type of "advertised" address.
host    This is a local address
relay   This is the IP address from a relay (TURN) server
srflx    Server reflexive address is the NATed IP address
8.    IP address and port combination

ICE Candidate Testing

After Alice received Bob's list of candidates, she will start building candidate pairs. The candidate pairs are ordered, based on their corresponding priorities on both sides. This makes sure that both peers are using the same list of candidate pairs in the same order.

In addition to that, the foundation from SDP Offer and Answer gets used to group pairs with similar network conditions. Candidate pairs must have the same protocol type. Mixing TCP and UDP candidates is not allowed. Candidate pairs with the same foundation are ordered by their priority and all, but the candidate pair with the highest priority is set to frozen state. This is mainly for reducing the number of connectivity tests. If, for instance, the connectivity test for the UDP/(S)RTP host candidate fails, it is most likely that the UDP/(S)RTCP candidate for the host will fail too and we will omit this test. If the connectivity test for a candidate pair succeeds, its state gets set to "Succeeded". All other candidate pairs with the same foundation are unfrozen now and will initiate their "STUN Binding Requests" for connectivity checking.

The connectivity testing for each unfrozen candidate pair will be done through a "STUN Binding Request", sent from the local candidates endpoint to the matching remote candidate. These checks are called ordinary checks. As soon as the peer receives a "STUN Binding Request", it responds with the corresponding "STUN Binding Response" and initiates his own "STUN Binding Request" on the same IP/protocol/port combination. This is called a triggered check.

Alice will serve as controlling agent, as she initiated the call. This means that Alice will be responsible for selecting the final candidates for media flow. Bob, as the called user, serves as controlled agent. Bob is responsible to validate the candidates from the final offer. If his list of candidate pairs does not contain the final candidates, the call must fail. If there is a matching candidate pair, Bob will send a final answer to enable media flow.

Here is an example of how the candidate and remote-candidate attributes might look like in a final offer:

a=candidate:3 1 UDP 16648703 213.199.141.181 57309 typ relay raddr 213.199.141.181 rport 57309
a=candidate:3 2 UDP 16648702 213.199.141.181 54054 typ relay raddr 213.199.141.181 rport 54054
a=remote-candidates:1 213.199.141.81 51721 2 213.199.141.81 58975

I hope this explains some of the details while establishing a media session between clients using SDP and ICE.

This insight into Office Communications Server 2007 R2 was created as part of Bernd Ott’s participation in the Microsoft Certified Master program .

The Microsoft Certified Master Program : The Microsoft Certified Master: Microsoft Office Communications Server 2007 program provides the most in-depth and comprehensive training available today for Office Communications Server 2007. This three-week training program is delivered by recognized experts from Microsoft and Microsoft partner organizations.