Session Initiation Protocol (SIP) is one of the most common protocols used in VoIP technology. It is a communication protocol for signaling in voice and video applications. As part of this post we will look at different elements involved in a SIP call flow ladder and what the various fields are.
A typical SIP call flow would have different stages before voice/media connects and RTP begins to flow. These stages are: INVITE, TRYING, RINGING, OK, BYE.
Codec in SIP
SIP is considered an umbrella protocol. This means that it leverages other standards-based protocols to provide a wide range of features. One such protocol is Session Description Protocol (SDP). SDP is a format for describing streaming media initialization parameters in an ASCII string. SDP is similar in function to H.245 in the H.323 protocol.
SIP leverages SDP to negotiate the type of media (audio and video), the transport protocol (RTP or UDP ports), and the format of the media (audio and video codecs).
SIP uses the Offer/Answer model for establishing SIP sessions. An offer is contained in the SDP fields that are sent in the body of a SIP message. The offer defines the media characteristics that are supported by the device (media streams, codecs, directional attributes, IP address, and ports to use).
The device receiving the offer sends an answer in the SDP fields of its SIP response with its corresponding matching media streams and codec, whether accepted or not, and the IP address and port on which it wants to receive the media streams
Session Description Protocol - Exchange Methods
Early Offer is the default method of SDP exchange in Cisco IOS gateways. In this method, the calling device sends its SDP information as part of the INVITE message. The response from the called device is delivered in the 200 OK acknowledgments to the calling party INVITE.
Not all carriers use the Early Offer method, so it is always recommended that you verify what the carrier expects from the customer equipment.
In a Delayed Offer, the session initiator does not send its capabilities in the initial INVITE but waits for the called device to send its capabilities first (for example, the list of codecs that are supported by the called device, thus allowing the calling device to choose the codec to be used for the session).
This SDP information is sent in the 200 OK acknowledgment to the original INVITE message. The calling device sends its response in the acknowledgment of the 200 OK. The Delayed Offer method is recommended for SIP trunks because it enables the Internet Telephony Service Provider (ITSP) to provide its capabilities first.
Early Media negotiates the media streams before the call has been accepted by the called device. This is used when trying to access a voice-driven Interactive Voice Response (IVR) system, when DTMF tones are required, or when an early audio signal is desired.
In Early Media with Delayed Offer, the SDP information is sent from the called device using either a 183 Session Progress message or a 180 Ringing message. The 183 option is more common. With either message, the calling device sends a pre-ACK message, as a response to either 180 or 183, with the SDP information
Real-Time Transport Protocol (RTP): RTP is a Layer 4 protocol that is encapsulated inside UDP segments. RTP is the protocol that carries the actual digitized voice samples in a call.
Real-Time Control Protocol (RTCP): RTCP is a companion protocol to RTP. Both RTP and RTCP operate at Layer 4 and are encapsulated in UDP. UDP ports 16384 to 32767 are used by RTP and RTCP. However, RTP uses the even port numbers in that range, whereas RTCP uses the odd port numbers. While RTP is responsible for carrying the voice stream, RTCP carries information about the RTP stream such as latency, jitter packets sent and received.
Typical bandwidth consumption
Based on RFC 2327 the SIP fields in SDP and SIP messages have the following definitions
Session description v= (protocol version)
o= (owner/creator and session identifier). s= (session name) i=* (session information) u=* (URI of description) e=* (email address) p=* (phone number) c=* (connection information - not required if included in all media) b=* (bandwidth information) One or more time descriptions (see below) z=* (time zone adjustments) k=* (encryption key) a=* (zero or more session attribute lines)
Time description t= (time the session is active) r=* (zero or more repeat times)
Media description m= (media name and transport address) i=* (media title) c=* (connection information - optional if included at session-level) b=* (bandwidth information) k=* (encryption key) a=* (zero or more media attribute lines)