mirror of https://github.com/sipwise/kamailio.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1015 lines
46 KiB
1015 lines
46 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
|
|
<section id="sip_intro" xmlns:xi="http://www.w3.org/2001/XInclude">
|
|
<sectioninfo>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Jan</firstname>
|
|
<surname>Janak</surname>
|
|
<email>jan@iptel.org</email>
|
|
</author>
|
|
</authorgroup>
|
|
<copyright>
|
|
<year>2003</year>
|
|
<holder>FhG FOKUS</holder>
|
|
</copyright>
|
|
<abstract>
|
|
<para>
|
|
A brief overview of SIP describing all important aspects of the Session Initiation
|
|
Protocol.
|
|
</para>
|
|
</abstract>
|
|
</sectioninfo>
|
|
|
|
<title>SIP Introduction</title>
|
|
<section id="purpose">
|
|
<title>Purpose of SIP</title>
|
|
<simpara>
|
|
SIP stands for Session Initiation Protocol. It is an application-layer control
|
|
protocol which has been developed and designed within the IETF. The protocol has
|
|
been designed with easy implementation, good scalability, and flexibility in mind.
|
|
</simpara>
|
|
<simpara>
|
|
The specification is available in form of several <abbrev>RFCs</abbrev>, the most
|
|
important one is RFC3261 which contains the core protocol specification. The
|
|
protocol is used for creating, modifying, and terminating sessions with one or more
|
|
participants. By sessions we understand a set of senders and receivers that
|
|
communicate and the state kept in those senders and receivers during the
|
|
communication. Examples of a session can include Internet telephone calls,
|
|
distribution of multimedia, multimedia conferences, distributed computer games, etc.
|
|
</simpara>
|
|
<simpara>
|
|
SIP is not the only protocol that the communicating devices will need. It is not
|
|
meant to be a general purpose protocol. Purpose of SIP is just to make the
|
|
communication possible, the communication itself must be achieved by another means
|
|
(and possibly another protocol). Two protocols that are most often used along with
|
|
SIP are RTP and SDP. RTP protocol is used to carry the real-time multimedia
|
|
data (including audio, video, and text), the protocol makes it possible to encode
|
|
and split the data into packets and transport such packets over the
|
|
Internet. Another important protocol is SDP, which is used to describe and encode
|
|
capabilities of session participants. Such a description is then used to negotiate
|
|
the characteristics of the session so that all the devices can participate (that
|
|
includes, for example, negotiation of codecs used to encode media so all the
|
|
participants will be able to decode it, negotiation of transport protocol used and
|
|
so on).
|
|
</simpara>
|
|
<simpara>
|
|
SIP has been designed in conformance with the Internet model. It is an end-to-end
|
|
oriented signaling protocol which means, that all the logic is stored in end
|
|
devices (except routing of SIP messages). State is also stored in end-devices
|
|
only, there is no single point of failure and networks designed this way scale
|
|
well. The price that we have to pay for the distributiveness and scalability is
|
|
higher message overhead, caused by the messages being sent end-to-end.
|
|
</simpara>
|
|
<simpara>
|
|
It is worth of mentioning that the end-to-end concept of SIP is a significant
|
|
divergence from regular PSTN (Public Switched Telephone Network) where all the
|
|
state and logic is stored in the network and end devices (telephones) are very
|
|
primitive. Aim of SIP is to provide the same functionality that the traditional
|
|
PSTNs have, but the end-to-end design makes SIP networks much more powerful and
|
|
open to the implementation of new services that can be hardly implemented in the
|
|
traditional PSTNs.
|
|
</simpara>
|
|
<simpara>
|
|
SIP is based on HTTP protocol. The HTTP protocol inherited format of message
|
|
headers from RFC822. HTTP is probably the most successful and widely used
|
|
protocol in the Internet. It tries to combine the best of the both. In fact, HTTP
|
|
can be classified as a signaling protocol too, because user agents use the protocol
|
|
to tell a HTTP server in which documents they are interested in. SIP is used to
|
|
carry the description of session parameters, the description is encoded into a
|
|
document using SDP. Both protocols (HTTP and SIP) have inherited encoding of
|
|
message headers from RFC822. The encoding has proven to be robust and flexible
|
|
over the years.
|
|
</simpara>
|
|
</section>
|
|
<section id="sip_uri">
|
|
<title>SIP URI</title>
|
|
<simpara>
|
|
SIP entities are identified using SIP URI (Uniform Resource Identifier). A
|
|
SIP URI has form of sip:username@domain, for instance,
|
|
sip:joe@company.com. As we can see, SIP URI consists of username part and
|
|
domain name part delimited by @ (at) character. SIP URIs are similar to
|
|
e-mail addresses, it is, for instance, possible to use the same URI for e-mail
|
|
and SIP communication, such URIs are easy to remember.
|
|
</simpara>
|
|
</section>
|
|
<section id="sip_network_elements">
|
|
<title>SIP Network Elements</title>
|
|
<simpara>
|
|
Although in the simplest configuration it is possible to use just two user agents
|
|
that send SIP messages directly to each other, a typical SIP network will
|
|
contain more than one type of SIP elements. Basic SIP elements are user agents,
|
|
proxies, registrars, and redirect servers. We will briefly describe them in this
|
|
section.
|
|
</simpara>
|
|
<simpara>
|
|
Note that the elements, as presented in this section, are often only logical
|
|
entities. It is often profitable to co-locate them together, for instance, to
|
|
increase the speed of processing, but that depends on a particular implementation
|
|
and configuration.
|
|
</simpara>
|
|
<section id="user_agents">
|
|
<title>User Agents</title>
|
|
<simpara>
|
|
Internet end points that use SIP to find each other and to negotiate a session
|
|
characteristics are called <emphasis>user agents</emphasis>. User agents
|
|
usually, but not necessarily, reside on a user's computer in form of an
|
|
application--this is currently the most widely used approach, but user agents
|
|
can be also cellular phones, PSTN gateways, <acronym>PDAs</acronym>, automated
|
|
<acronym>IVR</acronym> systems and so on.
|
|
</simpara>
|
|
<simpara>
|
|
User agents are often referred to as <emphasis>User Agent Server</emphasis>
|
|
(UAS) and <emphasis>User Agent Client</emphasis> (UAC). UAS and UAC are
|
|
logical entities only, each user agent contains a UAC and UAS. UAC is the
|
|
part of the user agent that sends requests and receives responses. UAS is the
|
|
part of the user agent that receives requests and sends responses.
|
|
</simpara>
|
|
<simpara>
|
|
Because a user agent contains both UAC and UAS, we often say that a user
|
|
agent behaves like a UAC or UAS. For instance, caller's user agent behaves
|
|
like UAC when it sends an INVITE requests and receives responses to the
|
|
request. Callee's user agent behaves like a UAS when it receives the INVITE
|
|
and sends responses.
|
|
</simpara>
|
|
<simpara>
|
|
But this situation changes when the callee decides to send a BYE and terminate
|
|
the session. In this case the callee's user agent (sending BYE) behaves like
|
|
UAC and the caller's user agent behaves like UAS.
|
|
</simpara>
|
|
<figure id="uac_and_uas">
|
|
<title>UAC and UAS</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/ua.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing UAC and UAS</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
<xref linkend="uac_and_uas"/> shows three user agents and one stateful forking
|
|
proxy. Each user agent contains UAC and UAS. The part of the proxy that
|
|
receives the INVITE from the caller in fact acts as a UAS. When forwarding the
|
|
request statefully the proxy creates two UACs, each of them is responsible for
|
|
one branch.
|
|
</simpara>
|
|
<simpara>
|
|
In our example callee B picked up and later when he wants to tear down the call
|
|
it sends a BYE. At this time the user agent that was previously UAS becomes a
|
|
UAC and vice versa.
|
|
</simpara>
|
|
</section>
|
|
<section id="proxy_servers">
|
|
<title>Proxy Servers</title>
|
|
<simpara>
|
|
In addition to that SIP allows creation of an infrastructure of network hosts
|
|
called <emphasis>proxy servers</emphasis>. User agents can send messages to a
|
|
proxy server. Proxy servers are very important entities in the SIP
|
|
infrastructure. They perform routing of a session invitations according to
|
|
invitee's current location, authentication, accounting and many other important
|
|
functions.
|
|
</simpara>
|
|
<simpara>
|
|
The most important task of a proxy server is to route session invitations
|
|
"closer" to callee. The session invitation will usually traverse a
|
|
set of proxies until it finds one which knows the actual location of the
|
|
callee. Such a proxy will forward the session invitation directly to the callee
|
|
and the callee will then accept or decline the session invitation.
|
|
</simpara>
|
|
<simpara>
|
|
There are two basic types of SIP proxy servers--stateless and stateful.
|
|
</simpara>
|
|
|
|
<section id="stateless_servers">
|
|
<title>Stateless Servers</title>
|
|
<simpara>
|
|
Stateless server are simple message forwarders. They forward messages
|
|
independently of each other. Although messages are usually arranged into
|
|
transactions (see <xref linkend="sip_transactions"/>), stateless proxies
|
|
do not take care of transactions.
|
|
</simpara>
|
|
<simpara>
|
|
Stateless proxies are simple, but faster than stateful proxy servers. They
|
|
can be used as simple load balancers, message translators and routers. One
|
|
of drawbacks of stateless proxies is that they are unable to absorb
|
|
retransmissions of messages and perform more advanced routing, for instance,
|
|
forking or recursive traversal.
|
|
</simpara>
|
|
</section>
|
|
<section id="stateful_servers">
|
|
<title>Stateful Servers</title>
|
|
<simpara>
|
|
Stateful proxies are more complex. Upon reception of a request, stateful
|
|
proxies create a state and keep the state until the transaction
|
|
finishes. Some transactions, especially those created by INVITE, can last
|
|
quite long (until callee picks up or declines the call). Because stateful
|
|
proxies must maintain the state for the duration of the transactions, their
|
|
performance is limited.
|
|
</simpara>
|
|
<simpara>
|
|
The ability to associate SIP messages into transactions gives stateful
|
|
proxies some interesting features. Stateful proxies can perform forking,
|
|
that means upon reception of a message two or more messages will be sent
|
|
out.
|
|
</simpara>
|
|
<simpara>
|
|
Stateful proxies can absorb retransmissions because they know, from the
|
|
transaction state, if they have already received the same message (stateless
|
|
proxies cannot do the check because they keep no state).
|
|
</simpara>
|
|
<simpara>
|
|
Stateful proxies can perform more complicated methods of finding a user. It
|
|
is, for instance, possible to try to reach user's office phone and when he
|
|
doesn't pick up then the call is redirected to his cell phone. Stateless
|
|
proxies can't do this because they have no way of knowing how the
|
|
transaction targeted to the office phone finished.
|
|
</simpara>
|
|
<simpara>
|
|
Most SIP proxies today are stateful because their configuration is usually
|
|
very complex. They often perform accounting, forking, some sort of NAT
|
|
traversal aid and all those features require a stateful proxy.
|
|
</simpara>
|
|
</section>
|
|
<section id="proxy_server_usage">
|
|
<title>Proxy Server Usage</title>
|
|
<simpara>
|
|
A typical configuration is that each centrally administered entity (a
|
|
company, for instance) has it's own SIP proxy server which is used by all
|
|
user agents in the entity. Let's suppose that there are two companies A and
|
|
B and each of them has it's own proxy server. <xref linkend="companies"/>
|
|
shows how a session invitation from employee Joe in company A will reach
|
|
employee Bob in company B.
|
|
</simpara>
|
|
<figure id="companies">
|
|
<title>Session Invitation</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/companies.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing a session invitation message flow</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
User Joe uses address sip:bob@b.com to call Bob. Joe's user agent doesn't
|
|
know how to route the invitation itself but it is configured to send all
|
|
outbound traffic to the company SIP proxy server proxy.a.com. The proxy
|
|
server figures out that user sip:bob@b.com is in a different company so it
|
|
will look up B's SIP proxy server and send the invitation there. B's proxy
|
|
server can be either pre-configured at proxy.a.com or the proxy will use
|
|
<acronym>DNS SRV</acronym> records to find B's proxy server. The invitation
|
|
reaches proxy.bo.com. The proxy knows that Bob is currently sitting in his
|
|
office and is reachable through phone on his desk, which has IP address
|
|
1.2.3.4, so the proxy will send the invitation there.
|
|
</simpara>
|
|
</section>
|
|
</section>
|
|
<section id="sip_intro.registrar">
|
|
<title>Registrar</title>
|
|
<simpara>
|
|
We mentioned that the SIP proxy at proxy.b.com knows current Bob's location
|
|
but haven't mentioned yet how a proxy can learn current location of a
|
|
user. Bob's user agent (SIP phone) must register with a
|
|
<emphasis>registrar</emphasis>. The registrar is a special SIP entity that
|
|
receives registrations from users, extracts information about their current
|
|
location (IP address, port and username in this case) and stores the
|
|
information into location database. Purpose of the location database is to map
|
|
sip:bob@b.com to something like sip:bob@1.2.3.4:5060. The location database is
|
|
then used by B's proxy server. When the proxy receives an invitation for
|
|
sip:bob@b.com it will search the location database. It finds
|
|
sip:bob@1.2.3.4:5060 and will send the invitation there. A registrar is very
|
|
often a logical entity only. Because of their tight coupling with proxies
|
|
registrars, are usually co-located with proxy servers.
|
|
</simpara>
|
|
<simpara>
|
|
<xref linkend="registrar_fig"/> shows a typical SIP registration. A REGISTER
|
|
message containing Address of Record sip:jan@iptel.org and contact address
|
|
sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP address of the phone, is sent to the
|
|
registrar. The registrar extracts this information and stores it into the
|
|
location database. If everything went well then the registrar sends a 200 OK
|
|
response to the phone and the process of registration is finished.
|
|
</simpara>
|
|
<figure id="registrar_fig">
|
|
<title>Registrar Overview</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/registrar.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing a typical registrar</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
Each registration has a limited lifespan. Expires header field or expires
|
|
parameter of Contact header field determines for how long is the registration
|
|
valid. The user agent must refresh the registration within the lifespan
|
|
otherwise it will expire and the user will become unavailable.
|
|
</simpara>
|
|
</section>
|
|
<section id="redirect_server">
|
|
<title>Redirect Server</title>
|
|
<simpara>
|
|
The entity that receives a request and sends back a reply containing a list of the
|
|
current location of a particular user is called <emphasis>redirect server</emphasis>. A
|
|
redirect server receives requests and looks up the intended recipient of the request in
|
|
the location database created by a registrar. It then creates a list of current
|
|
locations of the user and sends it to the request originator in a response within 3xx
|
|
class.
|
|
</simpara>
|
|
<simpara>
|
|
The originator of the request then extracts the list of destinations and sends
|
|
another request directly to them. <xref linkend="redirect"/> shows a typical
|
|
redirection.
|
|
</simpara>
|
|
<figure id="redirect">
|
|
<title>SIP Redirection</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/redirect.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing a redirection</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
</section>
|
|
</section>
|
|
<section id="sip_messages">
|
|
<title>SIP Messages</title>
|
|
<simpara>
|
|
Communication using SIP (often called signaling) comprises of series of
|
|
<emphasis>messages</emphasis>. Messages can be transported independently by the
|
|
network. Usually they are transported in a separate UDP datagram each. Each
|
|
message consist of "first line", message header, and message body. The
|
|
first line identifies type of the message. There are two types of
|
|
messages--<emphasis>requests</emphasis> and <emphasis>responses</emphasis>.
|
|
Requests are usually used to initiate some action or inform recipient of the request
|
|
of something. Replies are used to confirm that a request was received and processed
|
|
and contain the status of the processing.
|
|
</simpara>
|
|
<simpara>
|
|
A typical SIP request looks like this:
|
|
</simpara>
|
|
<programlisting>
|
|
<![CDATA[
|
|
INVITE sip:7170@iptel.org SIP/2.0
|
|
Via: SIP/2.0/UDP 195.37.77.100:5040;rport
|
|
Max-Forwards: 10
|
|
From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-d56e91fe104f
|
|
To: <sip:jiri@bat.iptel.org>
|
|
Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
|
|
CSeq: 2 INVITE
|
|
Contact: <sip:213.20.128.35:9315>
|
|
User-Agent: Windows RTC/1.0
|
|
Proxy-Authorization: Digest username="jiri", realm="iptel.org",
|
|
algorithm="MD5", uri="sip:jiri@bat.iptel.org",
|
|
nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c",
|
|
response="53fe98db10e1074
|
|
b03b3e06438bda70f"
|
|
Content-Type: application/sdp
|
|
Content-Length: 451
|
|
|
|
v=0
|
|
o=jku2 0 0 IN IP4 213.20.128.35
|
|
s=session
|
|
c=IN IP4 213.20.128.35
|
|
b=CT:1000
|
|
t=0 0
|
|
m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101
|
|
a=rtpmap:97 red/8000
|
|
a=rtpmap:111 SIREN/16000
|
|
a=fmtp:111 bitrate=16000
|
|
a=rtpmap:112 G7221/16000
|
|
a=fmtp:112 bitrate=24000
|
|
a=rtpmap:6 DVI4/16000
|
|
a=rtpmap:0 PCMU/8000
|
|
a=rtpmap:4 G723/8000
|
|
a=rtpmap: 3 GSM/8000
|
|
a=rtpmap:101 telephone-event/8000
|
|
a=fmtp:101 0-16
|
|
]]>
|
|
</programlisting>
|
|
<simpara>
|
|
The first line tells us that this is INVITE message which is used to establish a
|
|
session. The URI on the first line--sip:7170@iptel.org is called <emphasis>Request
|
|
URI</emphasis> and contains URI of the next hop of the message. In this case it
|
|
will be host iptel.org.
|
|
</simpara>
|
|
<simpara>
|
|
A SIP request can contain one or more Via header fields which are used to record
|
|
path of the request. They are later used to route SIP responses exactly the same
|
|
way. The INVITE message contains just one Via header field which was created by the
|
|
user agent that sent the request. From the Via field we can tell that the user agent
|
|
is running on host 195.37.77.100 and port 5060.
|
|
</simpara>
|
|
<simpara>
|
|
From and To header fields identify initiator (caller) and recipient (callee) of the
|
|
invitation (just like in SMTP where they identify sender and recipient of a
|
|
message). From header field contains a tag parameter which serves as a dialog
|
|
identifier and will be described in <xref linkend="sip_dialogs"/>.
|
|
</simpara>
|
|
<simpara>
|
|
Call-ID header field is a dialog identifier and it's purpose is to identify messages
|
|
belonging to the same call. Such messages have the same Call-ID identifier. CSeq is
|
|
used to maintain order of requests. Because requests can be sent over an unreliable
|
|
transport that can re-order messages, a sequence number must be present in the
|
|
messages so that recipient can identify retransmissions and out of order requests.
|
|
</simpara>
|
|
<simpara>
|
|
Contact header field contains IP address and port on which the sender is awaiting
|
|
further requests sent by callee. Other header fields are not important and will be
|
|
not described here.
|
|
</simpara>
|
|
<simpara>
|
|
Message header is delimited from message body by an empty line. Message body of the INVITE
|
|
request contains a description of the media type accepted by the sender and encoded in
|
|
SDP.
|
|
</simpara>
|
|
<section id="sip_requests">
|
|
<title>SIP Requests</title>
|
|
<simpara>
|
|
We have described how an INVITE request looks like and said that the request is
|
|
used to invite a callee to a session. Other important requests are:
|
|
</simpara>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>ACK</emphasis>--This message acknowledges receipt of a final
|
|
response to INVITE. Establishing of a session utilizes 3-way
|
|
hand-shaking due to asymmetric nature of the invitation. It may take a
|
|
while before the callee accepts or declines the call so the callee's
|
|
user agent periodically retransmits a positive final response until it
|
|
receives an ACK (which indicates that the caller is still there and
|
|
ready to communicate).
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>BYE</emphasis>--Bye messages are used to tear down multimedia
|
|
sessions. A party wishing to tear down a session sends a BYE to the
|
|
other party.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>CANCEL</emphasis>--Cancel is used to cancel not yet fully
|
|
established session. It is used when the callee hasn't replied with a
|
|
final response yet but the caller wants to abort the call (typically
|
|
when a callee doesn't respond for some time).
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>REGISTER</emphasis>--Purpose of REGISTER request is to let
|
|
registrar know of current user's location. Information about current
|
|
IP address and port on which a user can be reached is carried in
|
|
REGISTER messages. Registrar extracts this information and puts it into
|
|
a location database. The database can be later used by SIP proxy
|
|
servers to route calls to the user. Registrations are time-limited and
|
|
need to be periodically refreshed.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<simpara>
|
|
The listed requests usually have no message body because it is not needed in
|
|
most situations (but can have one). In addition to that many other request types
|
|
have been defined but their description is out of the scope of this document.
|
|
</simpara>
|
|
</section>
|
|
<section id="sip_responses">
|
|
<title>SIP Responses</title>
|
|
<simpara>
|
|
When a user agent or proxy server receives a request it send a reply. Each
|
|
request must be replied except ACK requests which trigger no replies.
|
|
</simpara>
|
|
<simpara>
|
|
A typical reply looks like this:
|
|
</simpara>
|
|
<programlisting>
|
|
<![CDATA[
|
|
SIP/2.0 200 OK
|
|
Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68
|
|
From: sip:sip2@iptel.org
|
|
To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
|
|
Call-ID: 2443936363@192.168.1.30
|
|
CSeq: 63629 REGISTER
|
|
Contact: Msip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
|
|
Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))
|
|
Content-Length: 0
|
|
Warning: 392 195.37.77.101:5060 "Noisy feedback tells:
|
|
pid=5110 req_src_ip=66.87.48.68 req_src_port=5060 in_uri=sip:iptel.org
|
|
out_uri=sip:iptel.org via_cnt==1"
|
|
]]>
|
|
</programlisting>
|
|
<simpara>
|
|
As we can see, responses are very similar to the requests, except for the first
|
|
line. The first line of response contains protocol version (SIP/2.0), reply
|
|
code, and reason phrase.
|
|
</simpara>
|
|
<simpara>
|
|
The <emphasis>reply code</emphasis> is an integer number from 100 to 699 and
|
|
indicates type of the response. There are 6 classes of responses:
|
|
</simpara>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>1xx</emphasis> are <emphasis>provisional</emphasis>
|
|
responses. A provisional response is response that tells to its
|
|
recipient that the associated request was received but result of the
|
|
processing is not known yet. Provisional responses are sent only when
|
|
the processing doesn't finish immediately. The sender must stop
|
|
retransmitting the request upon reception of a provisional response.
|
|
</simpara>
|
|
<simpara>
|
|
Typically proxy servers send responses with code 100 when they start
|
|
processing an INVITE and user agents send responses with code 180
|
|
(Ringing) which means that the callee's phone is ringing.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>2xx</emphasis> responses are <emphasis>positive
|
|
final</emphasis> responses. A final response is the ultimate response
|
|
that the originator of the request will ever receive. Therefore final
|
|
responses express result of the processing of the associated
|
|
request. Final responses also terminate transactions. Responses with
|
|
code from 200 to 299 are positive responses that means that the request
|
|
was processed successfully and accepted. For instance a 200 OK response
|
|
is sent when a user accepts invitation to a session (INVITE request).
|
|
</simpara>
|
|
<simpara>
|
|
A UAC may receive several 200 messages to a single INVITE
|
|
request. This is because a forking proxy (described later) can fork the
|
|
request so it will reach several UAS and each of them will accept the
|
|
invitation. In this case each response is distinguished by the tag
|
|
parameter in To header field. Each response represents a distinct dialog
|
|
with unambiguous dialog identifier.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>3xx</emphasis> responses are used to redirect a caller. A
|
|
redirection response gives information about the user's new location or
|
|
an alternative service that the caller might use to satisfy the
|
|
call. Redirection responses are usually sent by proxy servers. When a
|
|
proxy receives a request and doesn't want or can't process it for any
|
|
reason, it will send a redirection response to the caller and put
|
|
another location into the response which the caller might want to
|
|
try. It can be the location of another proxy or the current location of
|
|
the callee (from the location database created by a registrar). The
|
|
caller is then supposed to re-send the request to the new location. 3xx
|
|
responses are final.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>4xx</emphasis> are <emphasis>negative final</emphasis>
|
|
responses. a 4xx response means that the problem is on the sender's
|
|
side. The request couldn't be processed because it contains bad syntax
|
|
or cannot be fulfilled at that server.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>5xx</emphasis> means that the problem is on server's side. The
|
|
request is apparently valid but the server failed to fulfill it. Clients
|
|
should usually retry the request later.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
<emphasis>6xx</emphasis> reply code means that the request cannot be
|
|
fulfilled at any server. This response is usually sent by a server that
|
|
has definitive information about a particular user. User agents usually
|
|
send a 603 Decline response when the user doesn't want to participate in
|
|
the session.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<simpara>
|
|
In addition to the response class the first line also contains <emphasis>reason
|
|
phrase</emphasis>. The code number is intended to be processed by
|
|
machines. It is not very human-friendly but it is very easy to parse and
|
|
understand by machines. The reason phrase usually contains a human-readable
|
|
message describing the result of the processing. A user agent should render
|
|
the reason phrase to the user.
|
|
</simpara>
|
|
<simpara>
|
|
The request to which a particular response belongs is identified using the CSeq
|
|
header field. In addition to the sequence number this header field also contains
|
|
method of corresponding request. In our example it was REGISTER request.
|
|
</simpara>
|
|
</section>
|
|
</section>
|
|
<section id="sip_transactions">
|
|
<title>SIP Transactions</title>
|
|
<simpara>
|
|
Although we said that SIP messages are sent independently over the network, they
|
|
are usually arranged into <emphasis>transactions</emphasis> by user agents and
|
|
certain types of proxy servers. Therefore SIP is said to be a
|
|
<emphasis>transactional protocol</emphasis>.
|
|
</simpara>
|
|
<simpara>
|
|
A transaction is a sequence of SIP messages exchanged between SIP network
|
|
elements. A transaction consists of one request and all responses to that
|
|
request. That includes zero or more provisional responses and one or more final
|
|
responses (remember that an INVITE might be answered by more than one final response
|
|
when a proxy server forks the request).
|
|
</simpara>
|
|
<simpara>
|
|
If a transaction was initiated by an INVITE request then the same transaction also
|
|
includes ACK, but only if the final response was not a 2xx response. If the final
|
|
response was a 2xx response then the ACK is not considered part of the transaction.
|
|
</simpara>
|
|
<simpara>
|
|
As we can see this is quite asymmetric behavior--ACK is part of transactions with a
|
|
negative final response but is not part of transactions with positive final
|
|
responses. The reason for this separation is the importance of delivery of all 200
|
|
OK messages. Not only that they establish a session, but also 200 OK can be
|
|
generated by multiple entities when a proxy server forks the request and all of them
|
|
must be delivered to the calling user agent. Therefore user agents take
|
|
responsibility in this case and retransmit 200 OK responses until they receive an
|
|
ACK. Also note that only responses to INVITE are retransmitted !
|
|
</simpara>
|
|
<simpara>
|
|
SIP entities that have notion of transactions are called
|
|
<emphasis>stateful</emphasis>. Such entities usually create a state associated with
|
|
a transaction that is kept in the memory for the duration of the transaction. When a
|
|
request or response comes, a stateful entity tries to associate the request (or
|
|
response) to existing transactions. To be able to do it it must extract a unique
|
|
transaction identifier from the message and compare it to identifiers of all
|
|
existing transactions. If such a transaction exists then it's state gets updated
|
|
from the message.
|
|
</simpara>
|
|
<simpara>
|
|
In the previous SIP RFC2543 the transaction identifier was calculated as hash of
|
|
all important message header fields (that included To, From, Request-URI and
|
|
CSeq). This proved to be very slow and complex, during interoperability tests such
|
|
transaction identifiers used to be a common source of problems.
|
|
</simpara>
|
|
<simpara>
|
|
In the new RFC3261 the way of calculating transaction identifiers was completely
|
|
changed. Instead of complicated hashing of important header fields a SIP message now
|
|
includes the identifier directly. Branch parameter of Via header fields contains directly
|
|
the transaction identifier. This is significant simplification, but there still exist old
|
|
implementations that don't support the new way of calculating of transaction identifier so
|
|
even new implementations have to support the old way. They must be backwards compatible.
|
|
</simpara>
|
|
<simpara>
|
|
<xref linkend="transactions"/> shows what messages belong to what transactions
|
|
during a conversation of two user agents.
|
|
</simpara>
|
|
<figure id="transactions">
|
|
<title>SIP Transactions</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/transaction.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Message flow showing messages belonging to the same transaction.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
</section>
|
|
<section id="sip_dialogs">
|
|
<title>SIP Dialogs</title>
|
|
<simpara>
|
|
We have shown what transactions are, that one transaction includes INVITE and it's
|
|
responses and another transaction includes BYE and it responses when a session is
|
|
being torn down. But we feel that those two transactions should be somehow
|
|
related--both of them belong to the same <emphasis>dialog</emphasis>. A dialog
|
|
represents a peer-to-peer SIP relationship between two user agents. A dialog
|
|
persists for some time and it is very important concept for user agents. Dialogs
|
|
facilitate proper sequencing and routing of messages between SIP endpoints.
|
|
</simpara>
|
|
<simpara>
|
|
Dialogs are identified using Call-ID, From tag, and To
|
|
tag. Messages that have these three identifiers same belong to the
|
|
same dialog. We have shown that CSeq header field is used to order
|
|
messages, in fact it is used to order messages within a dialog. The
|
|
number must be monotonically increased for each message sent within
|
|
a dialog otherwise the peer will handle it as out of order request
|
|
or retransmission. In fact, the CSeq number identifies a
|
|
transaction within a dialog because we have said that requests and
|
|
associated responses are called transaction. This means that only
|
|
one transaction in each direction can be active within a
|
|
dialog. One could also say that a <emphasis>dialog is a sequence of
|
|
transactions</emphasis>. <xref linkend="dialog"/> extends <xref
|
|
linkend="transactions"/> to show which messages belong to the
|
|
same dialog.
|
|
</simpara>
|
|
<figure id="dialog">
|
|
<title>SIP Dialog</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/dialog.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Message flow showing transactions belonging to the same dialog.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
Some messages establish a dialog and some do not. This allows to explicitly express
|
|
the relationship of messages and also to send messages that are not related to other
|
|
messages outside a dialog. That is easier to implement because user agent don't have
|
|
to keep the dialog state.
|
|
</simpara>
|
|
<simpara>
|
|
For instance, INVITE message establishes a dialog, because it will be later followed
|
|
by BYE request which will tear down the session established by the INVITE. This BYE
|
|
is sent within the dialog established by the INVITE.
|
|
</simpara>
|
|
<simpara>
|
|
But if a user agent sends a MESSAGE request, such a request doesn't establish any
|
|
dialog. Any subsequent messages (even MESSAGE) will be sent independently of the
|
|
previous one.
|
|
</simpara>
|
|
<section id="dialogs_facilitate_routing">
|
|
<title>Dialogs Facilitate Routing</title>
|
|
<simpara>
|
|
We have said that dialogs are also used to route the messages between user
|
|
agents, let's describe this a little bit.
|
|
</simpara>
|
|
<simpara>
|
|
Let's suppose that user sip:bob@a.com wants to talk to user sip:pete@b.com. He
|
|
knows SIP address of the callee (sip:pete@b.com) but this address doesn't say
|
|
anything about current location of the user--i.e. the caller doesn't know to
|
|
which host to send the request. Therefore the INVITE request will be sent to a
|
|
proxy server.
|
|
</simpara>
|
|
<simpara>
|
|
The request will be sent from proxy to proxy until it reaches one that knows
|
|
current location of the callee. This process is called routing. Once the request
|
|
reaches the callee, the callee's user agent will create a response that will be
|
|
sent back to the caller. Callee's user agent will also put Contact header field
|
|
into the response which will contain the current location of the user. The
|
|
original request also contained Contact header field which means that both user
|
|
agents know the current location of the peer.
|
|
</simpara>
|
|
<simpara>
|
|
Because the user agents know location of each other, it is not necessary to send
|
|
further requests to any proxy--they can be sent directly from user agent to user
|
|
agent. That's exactly how dialogs facilitate routing.
|
|
</simpara>
|
|
<simpara>
|
|
Further messages within a dialog are sent directly from user agent to user
|
|
agent. This is a significant performance improvement because proxies do not see
|
|
all the messages within a dialog, they are used to route just the first request
|
|
that establishes the dialog. The direct messages are also delivered with much
|
|
smaller latency because a typical proxy usually implements complex routing
|
|
logic. <xref linkend="trapezoid"/> contains an example of a message
|
|
within a dialog (BYE) that bypasses the proxies.
|
|
</simpara>
|
|
<figure id="trapezoid">
|
|
<title>SIP Trapezoid</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/trapezoid.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Message flow showing SIP trapezoid.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
</section>
|
|
<section id="dialogs_identifiers">
|
|
<title>Dialog Identifiers</title>
|
|
<simpara>
|
|
We have already shown that dialog identifiers consist of three parts, Call-Id,
|
|
From tag, and To tag, but it is not that clear why are dialog identifiers
|
|
created exactly this way and who contributes which part.
|
|
</simpara>
|
|
<simpara>
|
|
Call-ID is so called <emphasis>call identifier</emphasis>. It must be a unique
|
|
string that identifies a call. A call consists of one or more dialogs. Multiple
|
|
user agents may respond to a request when a proxy along the path forks the
|
|
request. Each user agent that sends a 2xx establishes a separate dialog with the
|
|
caller. All such dialogs are part of the same call and have the same Call-ID.
|
|
</simpara>
|
|
<simpara>
|
|
From tag is generated by the caller and it uniquely identifies the dialog in the
|
|
caller's user agent.
|
|
</simpara>
|
|
<simpara>
|
|
To tag is generated by a callee and it uniquely identifies, just like From tag,
|
|
the dialog in the callee's user agent.
|
|
</simpara>
|
|
<simpara>
|
|
This hierarchical dialog identifier is necessary because a single call
|
|
invitation can create several dialogs and caller must be able to distinguish
|
|
them.
|
|
</simpara>
|
|
</section>
|
|
</section>
|
|
<section id="typical_sip_scenarios">
|
|
<title>Typical SIP Scenarios</title>
|
|
<simpara>
|
|
This section gives a brief overview of typical SIP scenarios that usually make up the
|
|
SIP traffic.
|
|
</simpara>
|
|
<section id="registration">
|
|
<title>Registration</title>
|
|
<simpara>
|
|
Users must register themselves with a registrar to be reachable by other
|
|
users. A registration comprises a REGISTER message followed by a 200 OK sent by
|
|
registrar if the registration was successful. Registrations are usually
|
|
authorized so a 407 reply can appear if the user didn't provide valid
|
|
credentials. <xref linkend="register_fig"/> shows an example of registration.
|
|
</simpara>
|
|
<figure id="register_fig">
|
|
<title>REGISTER Message Flow</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/register.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Message flow of a registration.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
</section>
|
|
<section id="session_invitation">
|
|
<title>Session Invitation</title>
|
|
<simpara>
|
|
A session invitation consists of one INVITE request which is usually sent to a
|
|
proxy. The proxy sends immediately a 100 Trying reply to stop retransmissions
|
|
and forwards the request further.
|
|
</simpara>
|
|
<simpara>
|
|
All provisional responses generated by callee are sent back to the caller. See
|
|
180 Ringing response in the call flow. The response is generated when callee's
|
|
phone starts ringing.
|
|
</simpara>
|
|
<figure id="invite1">
|
|
<title>INVITE Message Flow</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/invite1.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing a session invitation.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
A 200 OK is generated once the callee picks up the phone and it is retransmitted
|
|
by the callee's user agent until it receives an ACK from the caller. The session
|
|
is established at this point.
|
|
</simpara>
|
|
</section>
|
|
<section id="session_termination">
|
|
<title>Session Termination</title>
|
|
<simpara>
|
|
Session termination is accomplished by sending a BYE request within dialog
|
|
established bye INVITE. BYE messages are sent directly from one user agent to
|
|
the other unless a proxy on the path of the INVITE request indicated that it
|
|
wishes to stay on the path by using record routing (see <xref
|
|
linkend="record_routing"/>.
|
|
</simpara>
|
|
<simpara>
|
|
Party wishing to tear down a session sends a BYE request to the other party
|
|
involved in the session. The other party sends a 200 OK response to confirm the
|
|
BYE and the session is terminated. See <xref linkend="bye"/>, left message
|
|
flow.
|
|
</simpara>
|
|
</section>
|
|
<section id="record_routing">
|
|
<title>Record Routing</title>
|
|
<simpara>
|
|
All requests sent within a dialog are by default sent directly from one user agent
|
|
to the other. Only requests outside a dialog traverse SIP proxies. This approach
|
|
makes SIP network more scalable because only a small number of SIP messages hit
|
|
the proxies.
|
|
</simpara>
|
|
<simpara>
|
|
There are certain situations in which a SIP proxy need to stay on the path of all
|
|
further messages. For instance, proxies controlling a NAT box or proxies doing
|
|
accounting need to stay on the path of BYE requests.
|
|
</simpara>
|
|
<simpara>
|
|
Mechanism by which a proxy can inform user agents that it wishes to stay on the path
|
|
of all further messages is called <emphasis>record routing</emphasis>. Such a proxy
|
|
would insert Record-Route header field into SIP messages which contain address of
|
|
the proxy. Messages sent within a dialog will then traverse all SIP proxies that
|
|
put a Record-Route header field into the message.
|
|
</simpara>
|
|
<simpara>
|
|
The recipient of the request receives a set of Record-Route header fields in the
|
|
message. It must mirror all the Record-Route header fields into responses because
|
|
the originator of the request also needs to know the set of proxies.
|
|
</simpara>
|
|
<figure id="bye">
|
|
<title>BYE Message Flow (With and without Record Routing)</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/bye.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing BYE message flow with and without record routing.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
Left message flow of <xref linkend="bye"/> show how a BYE (request
|
|
within dialog established by INVITE) is sent directly to the other user agent
|
|
when there is no Record-Route header field in the message. Right message flow
|
|
show how the situation changes when the proxy puts a Record-Route header field
|
|
into the message.
|
|
</simpara>
|
|
<section id="strict_vs_loose">
|
|
<title>Strict versus Loose Routing</title>
|
|
<simpara>
|
|
The way how record routing works has evolved. Record routing according to
|
|
RFC2543 rewrote the Request-URI. That means the Request-URI always
|
|
contained URI of the next hop (which can be either next proxy server which
|
|
inserted Record-Route header field or destination user agent). Because of
|
|
that it was necessary to save the original Request-URI as the last Route
|
|
header field. This approach is called <emphasis>strict routing</emphasis>.
|
|
</simpara>
|
|
<simpara>
|
|
<emphasis>Loose routing</emphasis>, as specified in RFC3261, works in a
|
|
little bit different way. The Request-URI is no more overwritten, it always
|
|
contains URI of the destination user agent. If there are any Route header
|
|
field in a message, than the message is sent to the URI from the topmost
|
|
Route header field. This is significant change--Request-URI doesn't
|
|
necessarily contain URI to which the request will be sent. In fact, loose
|
|
routing is very similar to IP source routing.
|
|
</simpara>
|
|
<simpara>
|
|
Because transit from strict routing to loose routing would break backwards
|
|
compatibility and older user agents wouldn't work, it is necessary to make
|
|
loose routing backwards compatible. The backwards compatibility
|
|
unfortunately adds a lot of overhead and is often source of major problems.
|
|
</simpara>
|
|
</section>
|
|
</section>
|
|
<section id="sub_not">
|
|
<title>Event Subscription And Notification</title>
|
|
<simpara>
|
|
The SIP specification has been extended to support a general mechanism allowing
|
|
subscription to asynchronous events. Such evens can include SIP proxy statistics
|
|
changes, presence information, session changes and so on.
|
|
</simpara>
|
|
<simpara>
|
|
The mechanism is used mainly to convey information on presence (willingness to
|
|
communicate) of users. <xref linkend="event"/> shows the basic message
|
|
flow.
|
|
</simpara>
|
|
<figure id="event">
|
|
<title>Event Subscription And Notification</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/event.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing subscription and notification.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
<simpara>
|
|
A user agent interested in event notification sends a SUBSCRIBE message to a
|
|
SIP server. The SUBSCRIBE message establishes a dialog and is immediately
|
|
replied by the server using 200 OK response. At this point the dialog is
|
|
established. The server sends a NOTIFY request to the user every time the event
|
|
to which the user subscribed changes. NOTIFY messages are sent within the dialog
|
|
established by the SUBSCRIBE.
|
|
</simpara>
|
|
<simpara>
|
|
Note that the first NOTIFY message in <xref linkend="event"/> is sent
|
|
regardless of any event that triggers notifications.
|
|
</simpara>
|
|
<simpara>
|
|
Subscriptions--as well as registrations--have limited lifespan and therefore must be
|
|
periodically refreshed.
|
|
</simpara>
|
|
</section>
|
|
<section id="im">
|
|
<title>Instant Messages</title>
|
|
<simpara>
|
|
Instant messages are sent using MESSAGE request. MESSAGE requests do not establish a
|
|
dialog and therefore they will always traverse the same set of proxies. This is the
|
|
simplest form of sending instant messages. The text of the instant message is
|
|
transported in the body of the SIP request.
|
|
</simpara>
|
|
<figure id="message">
|
|
<title>Instant Messages</title>
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="figures/message.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Picture showing a MESSAGE.</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
</figure>
|
|
</section>
|
|
</section>
|
|
</section>
|