You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
kamailio/doc/sip/sip_introduction.xml

1015 lines
46 KiB

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<section id="sip_intro" xmlns:xi="http://www.w3.org/2001/XInclude">
<sectioninfo>
<authorgroup>
<author>
<firstname>Jan</firstname>
<surname>Janak</surname>
<email>jan@iptel.org</email>
</author>
</authorgroup>
<copyright>
<year>2003</year>
<holder>FhG FOKUS</holder>
</copyright>
<abstract>
<para>
A brief overview of SIP describing all important aspects of the Session Initiation
Protocol.
</para>
</abstract>
</sectioninfo>
<title>SIP Introduction</title>
<section id="purpose">
<title>Purpose of SIP</title>
<simpara>
SIP stands for Session Initiation Protocol. It is an application-layer control
protocol which has been developed and designed within the IETF. The protocol has
been designed with easy implementation, good scalability, and flexibility in mind.
</simpara>
<simpara>
The specification is available in form of several <abbrev>RFCs</abbrev>, the most
important one is RFC3261 which contains the core protocol specification. The
protocol is used for creating, modifying, and terminating sessions with one or more
participants. By sessions we understand a set of senders and receivers that
communicate and the state kept in those senders and receivers during the
communication. Examples of a session can include Internet telephone calls,
distribution of multimedia, multimedia conferences, distributed computer games, etc.
</simpara>
<simpara>
SIP is not the only protocol that the communicating devices will need. It is not
meant to be a general purpose protocol. Purpose of SIP is just to make the
communication possible, the communication itself must be achieved by another means
(and possibly another protocol). Two protocols that are most often used along with
SIP are RTP and SDP. RTP protocol is used to carry the real-time multimedia
data (including audio, video, and text), the protocol makes it possible to encode
and split the data into packets and transport such packets over the
Internet. Another important protocol is SDP, which is used to describe and encode
capabilities of session participants. Such a description is then used to negotiate
the characteristics of the session so that all the devices can participate (that
includes, for example, negotiation of codecs used to encode media so all the
participants will be able to decode it, negotiation of transport protocol used and
so on).
</simpara>
<simpara>
SIP has been designed in conformance with the Internet model. It is an end-to-end
oriented signaling protocol which means, that all the logic is stored in end
devices (except routing of SIP messages). State is also stored in end-devices
only, there is no single point of failure and networks designed this way scale
well. The price that we have to pay for the distributiveness and scalability is
higher message overhead, caused by the messages being sent end-to-end.
</simpara>
<simpara>
It is worth of mentioning that the end-to-end concept of SIP is a significant
divergence from regular PSTN (Public Switched Telephone Network) where all the
state and logic is stored in the network and end devices (telephones) are very
primitive. Aim of SIP is to provide the same functionality that the traditional
PSTNs have, but the end-to-end design makes SIP networks much more powerful and
open to the implementation of new services that can be hardly implemented in the
traditional PSTNs.
</simpara>
<simpara>
SIP is based on HTTP protocol. The HTTP protocol inherited format of message
headers from RFC822. HTTP is probably the most successful and widely used
protocol in the Internet. It tries to combine the best of the both. In fact, HTTP
can be classified as a signaling protocol too, because user agents use the protocol
to tell a HTTP server in which documents they are interested in. SIP is used to
carry the description of session parameters, the description is encoded into a
document using SDP. Both protocols (HTTP and SIP) have inherited encoding of
message headers from RFC822. The encoding has proven to be robust and flexible
over the years.
</simpara>
</section>
<section id="sip_uri">
<title>SIP URI</title>
<simpara>
SIP entities are identified using SIP URI (Uniform Resource Identifier). A
SIP URI has form of sip:username@domain, for instance,
sip:joe@company.com. As we can see, SIP URI consists of username part and
domain name part delimited by @ (at) character. SIP URIs are similar to
e-mail addresses, it is, for instance, possible to use the same URI for e-mail
and SIP communication, such URIs are easy to remember.
</simpara>
</section>
<section id="sip_network_elements">
<title>SIP Network Elements</title>
<simpara>
Although in the simplest configuration it is possible to use just two user agents
that send SIP messages directly to each other, a typical SIP network will
contain more than one type of SIP elements. Basic SIP elements are user agents,
proxies, registrars, and redirect servers. We will briefly describe them in this
section.
</simpara>
<simpara>
Note that the elements, as presented in this section, are often only logical
entities. It is often profitable to co-locate them together, for instance, to
increase the speed of processing, but that depends on a particular implementation
and configuration.
</simpara>
<section id="user_agents">
<title>User Agents</title>
<simpara>
Internet end points that use SIP to find each other and to negotiate a session
characteristics are called <emphasis>user agents</emphasis>. User agents
usually, but not necessarily, reside on a user's computer in form of an
application--this is currently the most widely used approach, but user agents
can be also cellular phones, PSTN gateways, <acronym>PDAs</acronym>, automated
<acronym>IVR</acronym> systems and so on.
</simpara>
<simpara>
User agents are often referred to as <emphasis>User Agent Server</emphasis>
(UAS) and <emphasis>User Agent Client</emphasis> (UAC). UAS and UAC are
logical entities only, each user agent contains a UAC and UAS. UAC is the
part of the user agent that sends requests and receives responses. UAS is the
part of the user agent that receives requests and sends responses.
</simpara>
<simpara>
Because a user agent contains both UAC and UAS, we often say that a user
agent behaves like a UAC or UAS. For instance, caller's user agent behaves
like UAC when it sends an INVITE requests and receives responses to the
request. Callee's user agent behaves like a UAS when it receives the INVITE
and sends responses.
</simpara>
<simpara>
But this situation changes when the callee decides to send a BYE and terminate
the session. In this case the callee's user agent (sending BYE) behaves like
UAC and the caller's user agent behaves like UAS.
</simpara>
<figure id="uac_and_uas">
<title>UAC and UAS</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/ua.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing UAC and UAS</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
<xref linkend="uac_and_uas"/> shows three user agents and one stateful forking
proxy. Each user agent contains UAC and UAS. The part of the proxy that
receives the INVITE from the caller in fact acts as a UAS. When forwarding the
request statefully the proxy creates two UACs, each of them is responsible for
one branch.
</simpara>
<simpara>
In our example callee B picked up and later when he wants to tear down the call
it sends a BYE. At this time the user agent that was previously UAS becomes a
UAC and vice versa.
</simpara>
</section>
<section id="proxy_servers">
<title>Proxy Servers</title>
<simpara>
In addition to that SIP allows creation of an infrastructure of network hosts
called <emphasis>proxy servers</emphasis>. User agents can send messages to a
proxy server. Proxy servers are very important entities in the SIP
infrastructure. They perform routing of a session invitations according to
invitee's current location, authentication, accounting and many other important
functions.
</simpara>
<simpara>
The most important task of a proxy server is to route session invitations
"closer" to callee. The session invitation will usually traverse a
set of proxies until it finds one which knows the actual location of the
callee. Such a proxy will forward the session invitation directly to the callee
and the callee will then accept or decline the session invitation.
</simpara>
<simpara>
There are two basic types of SIP proxy servers--stateless and stateful.
</simpara>
<section id="stateless_servers">
<title>Stateless Servers</title>
<simpara>
Stateless server are simple message forwarders. They forward messages
independently of each other. Although messages are usually arranged into
transactions (see <xref linkend="sip_transactions"/>), stateless proxies
do not take care of transactions.
</simpara>
<simpara>
Stateless proxies are simple, but faster than stateful proxy servers. They
can be used as simple load balancers, message translators and routers. One
of drawbacks of stateless proxies is that they are unable to absorb
retransmissions of messages and perform more advanced routing, for instance,
forking or recursive traversal.
</simpara>
</section>
<section id="stateful_servers">
<title>Stateful Servers</title>
<simpara>
Stateful proxies are more complex. Upon reception of a request, stateful
proxies create a state and keep the state until the transaction
finishes. Some transactions, especially those created by INVITE, can last
quite long (until callee picks up or declines the call). Because stateful
proxies must maintain the state for the duration of the transactions, their
performance is limited.
</simpara>
<simpara>
The ability to associate SIP messages into transactions gives stateful
proxies some interesting features. Stateful proxies can perform forking,
that means upon reception of a message two or more messages will be sent
out.
</simpara>
<simpara>
Stateful proxies can absorb retransmissions because they know, from the
transaction state, if they have already received the same message (stateless
proxies cannot do the check because they keep no state).
</simpara>
<simpara>
Stateful proxies can perform more complicated methods of finding a user. It
is, for instance, possible to try to reach user's office phone and when he
doesn't pick up then the call is redirected to his cell phone. Stateless
proxies can't do this because they have no way of knowing how the
transaction targeted to the office phone finished.
</simpara>
<simpara>
Most SIP proxies today are stateful because their configuration is usually
very complex. They often perform accounting, forking, some sort of NAT
traversal aid and all those features require a stateful proxy.
</simpara>
</section>
<section id="proxy_server_usage">
<title>Proxy Server Usage</title>
<simpara>
A typical configuration is that each centrally administered entity (a
company, for instance) has it's own SIP proxy server which is used by all
user agents in the entity. Let's suppose that there are two companies A and
B and each of them has it's own proxy server. <xref linkend="companies"/>
shows how a session invitation from employee Joe in company A will reach
employee Bob in company B.
</simpara>
<figure id="companies">
<title>Session Invitation</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/companies.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing a session invitation message flow</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
User Joe uses address sip:bob@b.com to call Bob. Joe's user agent doesn't
know how to route the invitation itself but it is configured to send all
outbound traffic to the company SIP proxy server proxy.a.com. The proxy
server figures out that user sip:bob@b.com is in a different company so it
will look up B's SIP proxy server and send the invitation there. B's proxy
server can be either pre-configured at proxy.a.com or the proxy will use
<acronym>DNS SRV</acronym> records to find B's proxy server. The invitation
reaches proxy.bo.com. The proxy knows that Bob is currently sitting in his
office and is reachable through phone on his desk, which has IP address
1.2.3.4, so the proxy will send the invitation there.
</simpara>
</section>
</section>
<section id="sip_intro.registrar">
<title>Registrar</title>
<simpara>
We mentioned that the SIP proxy at proxy.b.com knows current Bob's location
but haven't mentioned yet how a proxy can learn current location of a
user. Bob's user agent (SIP phone) must register with a
<emphasis>registrar</emphasis>. The registrar is a special SIP entity that
receives registrations from users, extracts information about their current
location (IP address, port and username in this case) and stores the
information into location database. Purpose of the location database is to map
sip:bob@b.com to something like sip:bob@1.2.3.4:5060. The location database is
then used by B's proxy server. When the proxy receives an invitation for
sip:bob@b.com it will search the location database. It finds
sip:bob@1.2.3.4:5060 and will send the invitation there. A registrar is very
often a logical entity only. Because of their tight coupling with proxies
registrars, are usually co-located with proxy servers.
</simpara>
<simpara>
<xref linkend="registrar_fig"/> shows a typical SIP registration. A REGISTER
message containing Address of Record sip:jan@iptel.org and contact address
sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP address of the phone, is sent to the
registrar. The registrar extracts this information and stores it into the
location database. If everything went well then the registrar sends a 200 OK
response to the phone and the process of registration is finished.
</simpara>
<figure id="registrar_fig">
<title>Registrar Overview</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/registrar.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing a typical registrar</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
Each registration has a limited lifespan. Expires header field or expires
parameter of Contact header field determines for how long is the registration
valid. The user agent must refresh the registration within the lifespan
otherwise it will expire and the user will become unavailable.
</simpara>
</section>
<section id="redirect_server">
<title>Redirect Server</title>
<simpara>
The entity that receives a request and sends back a reply containing a list of the
current location of a particular user is called <emphasis>redirect server</emphasis>. A
redirect server receives requests and looks up the intended recipient of the request in
the location database created by a registrar. It then creates a list of current
locations of the user and sends it to the request originator in a response within 3xx
class.
</simpara>
<simpara>
The originator of the request then extracts the list of destinations and sends
another request directly to them. <xref linkend="redirect"/> shows a typical
redirection.
</simpara>
<figure id="redirect">
<title>SIP Redirection</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/redirect.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing a redirection</phrase>
</textobject>
</mediaobject>
</figure>
</section>
</section>
<section id="sip_messages">
<title>SIP Messages</title>
<simpara>
Communication using SIP (often called signaling) comprises of series of
<emphasis>messages</emphasis>. Messages can be transported independently by the
network. Usually they are transported in a separate UDP datagram each. Each
message consist of "first line", message header, and message body. The
first line identifies type of the message. There are two types of
messages--<emphasis>requests</emphasis> and <emphasis>responses</emphasis>.
Requests are usually used to initiate some action or inform recipient of the request
of something. Replies are used to confirm that a request was received and processed
and contain the status of the processing.
</simpara>
<simpara>
A typical SIP request looks like this:
</simpara>
<programlisting>
<![CDATA[
INVITE sip:7170@iptel.org SIP/2.0
Via: SIP/2.0/UDP 195.37.77.100:5040;rport
Max-Forwards: 10
From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-d56e91fe104f
To: <sip:jiri@bat.iptel.org>
Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
CSeq: 2 INVITE
Contact: <sip:213.20.128.35:9315>
User-Agent: Windows RTC/1.0
Proxy-Authorization: Digest username="jiri", realm="iptel.org",
algorithm="MD5", uri="sip:jiri@bat.iptel.org",
nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c",
response="53fe98db10e1074
b03b3e06438bda70f"
Content-Type: application/sdp
Content-Length: 451
v=0
o=jku2 0 0 IN IP4 213.20.128.35
s=session
c=IN IP4 213.20.128.35
b=CT:1000
t=0 0
m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101
a=rtpmap:97 red/8000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:6 DVI4/16000
a=rtpmap:0 PCMU/8000
a=rtpmap:4 G723/8000
a=rtpmap: 3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
]]>
</programlisting>
<simpara>
The first line tells us that this is INVITE message which is used to establish a
session. The URI on the first line--sip:7170@iptel.org is called <emphasis>Request
URI</emphasis> and contains URI of the next hop of the message. In this case it
will be host iptel.org.
</simpara>
<simpara>
A SIP request can contain one or more Via header fields which are used to record
path of the request. They are later used to route SIP responses exactly the same
way. The INVITE message contains just one Via header field which was created by the
user agent that sent the request. From the Via field we can tell that the user agent
is running on host 195.37.77.100 and port 5060.
</simpara>
<simpara>
From and To header fields identify initiator (caller) and recipient (callee) of the
invitation (just like in SMTP where they identify sender and recipient of a
message). From header field contains a tag parameter which serves as a dialog
identifier and will be described in <xref linkend="sip_dialogs"/>.
</simpara>
<simpara>
Call-ID header field is a dialog identifier and it's purpose is to identify messages
belonging to the same call. Such messages have the same Call-ID identifier. CSeq is
used to maintain order of requests. Because requests can be sent over an unreliable
transport that can re-order messages, a sequence number must be present in the
messages so that recipient can identify retransmissions and out of order requests.
</simpara>
<simpara>
Contact header field contains IP address and port on which the sender is awaiting
further requests sent by callee. Other header fields are not important and will be
not described here.
</simpara>
<simpara>
Message header is delimited from message body by an empty line. Message body of the INVITE
request contains a description of the media type accepted by the sender and encoded in
SDP.
</simpara>
<section id="sip_requests">
<title>SIP Requests</title>
<simpara>
We have described how an INVITE request looks like and said that the request is
used to invite a callee to a session. Other important requests are:
</simpara>
<itemizedlist>
<listitem>
<simpara>
<emphasis>ACK</emphasis>--This message acknowledges receipt of a final
response to INVITE. Establishing of a session utilizes 3-way
hand-shaking due to asymmetric nature of the invitation. It may take a
while before the callee accepts or declines the call so the callee's
user agent periodically retransmits a positive final response until it
receives an ACK (which indicates that the caller is still there and
ready to communicate).
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>BYE</emphasis>--Bye messages are used to tear down multimedia
sessions. A party wishing to tear down a session sends a BYE to the
other party.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>CANCEL</emphasis>--Cancel is used to cancel not yet fully
established session. It is used when the callee hasn't replied with a
final response yet but the caller wants to abort the call (typically
when a callee doesn't respond for some time).
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>REGISTER</emphasis>--Purpose of REGISTER request is to let
registrar know of current user's location. Information about current
IP address and port on which a user can be reached is carried in
REGISTER messages. Registrar extracts this information and puts it into
a location database. The database can be later used by SIP proxy
servers to route calls to the user. Registrations are time-limited and
need to be periodically refreshed.
</simpara>
</listitem>
</itemizedlist>
<simpara>
The listed requests usually have no message body because it is not needed in
most situations (but can have one). In addition to that many other request types
have been defined but their description is out of the scope of this document.
</simpara>
</section>
<section id="sip_responses">
<title>SIP Responses</title>
<simpara>
When a user agent or proxy server receives a request it send a reply. Each
request must be replied except ACK requests which trigger no replies.
</simpara>
<simpara>
A typical reply looks like this:
</simpara>
<programlisting>
<![CDATA[
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68
From: sip:sip2@iptel.org
To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
Call-ID: 2443936363@192.168.1.30
CSeq: 63629 REGISTER
Contact: Msip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))
Content-Length: 0
Warning: 392 195.37.77.101:5060 "Noisy feedback tells:
pid=5110 req_src_ip=66.87.48.68 req_src_port=5060 in_uri=sip:iptel.org
out_uri=sip:iptel.org via_cnt==1"
]]>
</programlisting>
<simpara>
As we can see, responses are very similar to the requests, except for the first
line. The first line of response contains protocol version (SIP/2.0), reply
code, and reason phrase.
</simpara>
<simpara>
The <emphasis>reply code</emphasis> is an integer number from 100 to 699 and
indicates type of the response. There are 6 classes of responses:
</simpara>
<itemizedlist>
<listitem>
<simpara>
<emphasis>1xx</emphasis> are <emphasis>provisional</emphasis>
responses. A provisional response is response that tells to its
recipient that the associated request was received but result of the
processing is not known yet. Provisional responses are sent only when
the processing doesn't finish immediately. The sender must stop
retransmitting the request upon reception of a provisional response.
</simpara>
<simpara>
Typically proxy servers send responses with code 100 when they start
processing an INVITE and user agents send responses with code 180
(Ringing) which means that the callee's phone is ringing.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>2xx</emphasis> responses are <emphasis>positive
final</emphasis> responses. A final response is the ultimate response
that the originator of the request will ever receive. Therefore final
responses express result of the processing of the associated
request. Final responses also terminate transactions. Responses with
code from 200 to 299 are positive responses that means that the request
was processed successfully and accepted. For instance a 200 OK response
is sent when a user accepts invitation to a session (INVITE request).
</simpara>
<simpara>
A UAC may receive several 200 messages to a single INVITE
request. This is because a forking proxy (described later) can fork the
request so it will reach several UAS and each of them will accept the
invitation. In this case each response is distinguished by the tag
parameter in To header field. Each response represents a distinct dialog
with unambiguous dialog identifier.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>3xx</emphasis> responses are used to redirect a caller. A
redirection response gives information about the user's new location or
an alternative service that the caller might use to satisfy the
call. Redirection responses are usually sent by proxy servers. When a
proxy receives a request and doesn't want or can't process it for any
reason, it will send a redirection response to the caller and put
another location into the response which the caller might want to
try. It can be the location of another proxy or the current location of
the callee (from the location database created by a registrar). The
caller is then supposed to re-send the request to the new location. 3xx
responses are final.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>4xx</emphasis> are <emphasis>negative final</emphasis>
responses. a 4xx response means that the problem is on the sender's
side. The request couldn't be processed because it contains bad syntax
or cannot be fulfilled at that server.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>5xx</emphasis> means that the problem is on server's side. The
request is apparently valid but the server failed to fulfill it. Clients
should usually retry the request later.
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>6xx</emphasis> reply code means that the request cannot be
fulfilled at any server. This response is usually sent by a server that
has definitive information about a particular user. User agents usually
send a 603 Decline response when the user doesn't want to participate in
the session.
</simpara>
</listitem>
</itemizedlist>
<simpara>
In addition to the response class the first line also contains <emphasis>reason
phrase</emphasis>. The code number is intended to be processed by
machines. It is not very human-friendly but it is very easy to parse and
understand by machines. The reason phrase usually contains a human-readable
message describing the result of the processing. A user agent should render
the reason phrase to the user.
</simpara>
<simpara>
The request to which a particular response belongs is identified using the CSeq
header field. In addition to the sequence number this header field also contains
method of corresponding request. In our example it was REGISTER request.
</simpara>
</section>
</section>
<section id="sip_transactions">
<title>SIP Transactions</title>
<simpara>
Although we said that SIP messages are sent independently over the network, they
are usually arranged into <emphasis>transactions</emphasis> by user agents and
certain types of proxy servers. Therefore SIP is said to be a
<emphasis>transactional protocol</emphasis>.
</simpara>
<simpara>
A transaction is a sequence of SIP messages exchanged between SIP network
elements. A transaction consists of one request and all responses to that
request. That includes zero or more provisional responses and one or more final
responses (remember that an INVITE might be answered by more than one final response
when a proxy server forks the request).
</simpara>
<simpara>
If a transaction was initiated by an INVITE request then the same transaction also
includes ACK, but only if the final response was not a 2xx response. If the final
response was a 2xx response then the ACK is not considered part of the transaction.
</simpara>
<simpara>
As we can see this is quite asymmetric behavior--ACK is part of transactions with a
negative final response but is not part of transactions with positive final
responses. The reason for this separation is the importance of delivery of all 200
OK messages. Not only that they establish a session, but also 200 OK can be
generated by multiple entities when a proxy server forks the request and all of them
must be delivered to the calling user agent. Therefore user agents take
responsibility in this case and retransmit 200 OK responses until they receive an
ACK. Also note that only responses to INVITE are retransmitted !
</simpara>
<simpara>
SIP entities that have notion of transactions are called
<emphasis>stateful</emphasis>. Such entities usually create a state associated with
a transaction that is kept in the memory for the duration of the transaction. When a
request or response comes, a stateful entity tries to associate the request (or
response) to existing transactions. To be able to do it it must extract a unique
transaction identifier from the message and compare it to identifiers of all
existing transactions. If such a transaction exists then it's state gets updated
from the message.
</simpara>
<simpara>
In the previous SIP RFC2543 the transaction identifier was calculated as hash of
all important message header fields (that included To, From, Request-URI and
CSeq). This proved to be very slow and complex, during interoperability tests such
transaction identifiers used to be a common source of problems.
</simpara>
<simpara>
In the new RFC3261 the way of calculating transaction identifiers was completely
changed. Instead of complicated hashing of important header fields a SIP message now
includes the identifier directly. Branch parameter of Via header fields contains directly
the transaction identifier. This is significant simplification, but there still exist old
implementations that don't support the new way of calculating of transaction identifier so
even new implementations have to support the old way. They must be backwards compatible.
</simpara>
<simpara>
<xref linkend="transactions"/> shows what messages belong to what transactions
during a conversation of two user agents.
</simpara>
<figure id="transactions">
<title>SIP Transactions</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/transaction.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Message flow showing messages belonging to the same transaction.</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section id="sip_dialogs">
<title>SIP Dialogs</title>
<simpara>
We have shown what transactions are, that one transaction includes INVITE and it's
responses and another transaction includes BYE and it responses when a session is
being torn down. But we feel that those two transactions should be somehow
related--both of them belong to the same <emphasis>dialog</emphasis>. A dialog
represents a peer-to-peer SIP relationship between two user agents. A dialog
persists for some time and it is very important concept for user agents. Dialogs
facilitate proper sequencing and routing of messages between SIP endpoints.
</simpara>
<simpara>
Dialogs are identified using Call-ID, From tag, and To
tag. Messages that have these three identifiers same belong to the
same dialog. We have shown that CSeq header field is used to order
messages, in fact it is used to order messages within a dialog. The
number must be monotonically increased for each message sent within
a dialog otherwise the peer will handle it as out of order request
or retransmission. In fact, the CSeq number identifies a
transaction within a dialog because we have said that requests and
associated responses are called transaction. This means that only
one transaction in each direction can be active within a
dialog. One could also say that a <emphasis>dialog is a sequence of
transactions</emphasis>. <xref linkend="dialog"/> extends <xref
linkend="transactions"/> to show which messages belong to the
same dialog.
</simpara>
<figure id="dialog">
<title>SIP Dialog</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/dialog.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Message flow showing transactions belonging to the same dialog.</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
Some messages establish a dialog and some do not. This allows to explicitly express
the relationship of messages and also to send messages that are not related to other
messages outside a dialog. That is easier to implement because user agent don't have
to keep the dialog state.
</simpara>
<simpara>
For instance, INVITE message establishes a dialog, because it will be later followed
by BYE request which will tear down the session established by the INVITE. This BYE
is sent within the dialog established by the INVITE.
</simpara>
<simpara>
But if a user agent sends a MESSAGE request, such a request doesn't establish any
dialog. Any subsequent messages (even MESSAGE) will be sent independently of the
previous one.
</simpara>
<section id="dialogs_facilitate_routing">
<title>Dialogs Facilitate Routing</title>
<simpara>
We have said that dialogs are also used to route the messages between user
agents, let's describe this a little bit.
</simpara>
<simpara>
Let's suppose that user sip:bob@a.com wants to talk to user sip:pete@b.com. He
knows SIP address of the callee (sip:pete@b.com) but this address doesn't say
anything about current location of the user--i.e. the caller doesn't know to
which host to send the request. Therefore the INVITE request will be sent to a
proxy server.
</simpara>
<simpara>
The request will be sent from proxy to proxy until it reaches one that knows
current location of the callee. This process is called routing. Once the request
reaches the callee, the callee's user agent will create a response that will be
sent back to the caller. Callee's user agent will also put Contact header field
into the response which will contain the current location of the user. The
original request also contained Contact header field which means that both user
agents know the current location of the peer.
</simpara>
<simpara>
Because the user agents know location of each other, it is not necessary to send
further requests to any proxy--they can be sent directly from user agent to user
agent. That's exactly how dialogs facilitate routing.
</simpara>
<simpara>
Further messages within a dialog are sent directly from user agent to user
agent. This is a significant performance improvement because proxies do not see
all the messages within a dialog, they are used to route just the first request
that establishes the dialog. The direct messages are also delivered with much
smaller latency because a typical proxy usually implements complex routing
logic. <xref linkend="trapezoid"/> contains an example of a message
within a dialog (BYE) that bypasses the proxies.
</simpara>
<figure id="trapezoid">
<title>SIP Trapezoid</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/trapezoid.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Message flow showing SIP trapezoid.</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section id="dialogs_identifiers">
<title>Dialog Identifiers</title>
<simpara>
We have already shown that dialog identifiers consist of three parts, Call-Id,
From tag, and To tag, but it is not that clear why are dialog identifiers
created exactly this way and who contributes which part.
</simpara>
<simpara>
Call-ID is so called <emphasis>call identifier</emphasis>. It must be a unique
string that identifies a call. A call consists of one or more dialogs. Multiple
user agents may respond to a request when a proxy along the path forks the
request. Each user agent that sends a 2xx establishes a separate dialog with the
caller. All such dialogs are part of the same call and have the same Call-ID.
</simpara>
<simpara>
From tag is generated by the caller and it uniquely identifies the dialog in the
caller's user agent.
</simpara>
<simpara>
To tag is generated by a callee and it uniquely identifies, just like From tag,
the dialog in the callee's user agent.
</simpara>
<simpara>
This hierarchical dialog identifier is necessary because a single call
invitation can create several dialogs and caller must be able to distinguish
them.
</simpara>
</section>
</section>
<section id="typical_sip_scenarios">
<title>Typical SIP Scenarios</title>
<simpara>
This section gives a brief overview of typical SIP scenarios that usually make up the
SIP traffic.
</simpara>
<section id="registration">
<title>Registration</title>
<simpara>
Users must register themselves with a registrar to be reachable by other
users. A registration comprises a REGISTER message followed by a 200 OK sent by
registrar if the registration was successful. Registrations are usually
authorized so a 407 reply can appear if the user didn't provide valid
credentials. <xref linkend="register_fig"/> shows an example of registration.
</simpara>
<figure id="register_fig">
<title>REGISTER Message Flow</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/register.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Message flow of a registration.</phrase>
</textobject>
</mediaobject>
</figure>
</section>
<section id="session_invitation">
<title>Session Invitation</title>
<simpara>
A session invitation consists of one INVITE request which is usually sent to a
proxy. The proxy sends immediately a 100 Trying reply to stop retransmissions
and forwards the request further.
</simpara>
<simpara>
All provisional responses generated by callee are sent back to the caller. See
180 Ringing response in the call flow. The response is generated when callee's
phone starts ringing.
</simpara>
<figure id="invite1">
<title>INVITE Message Flow</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/invite1.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing a session invitation.</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
A 200 OK is generated once the callee picks up the phone and it is retransmitted
by the callee's user agent until it receives an ACK from the caller. The session
is established at this point.
</simpara>
</section>
<section id="session_termination">
<title>Session Termination</title>
<simpara>
Session termination is accomplished by sending a BYE request within dialog
established bye INVITE. BYE messages are sent directly from one user agent to
the other unless a proxy on the path of the INVITE request indicated that it
wishes to stay on the path by using record routing (see <xref
linkend="record_routing"/>.
</simpara>
<simpara>
Party wishing to tear down a session sends a BYE request to the other party
involved in the session. The other party sends a 200 OK response to confirm the
BYE and the session is terminated. See <xref linkend="bye"/>, left message
flow.
</simpara>
</section>
<section id="record_routing">
<title>Record Routing</title>
<simpara>
All requests sent within a dialog are by default sent directly from one user agent
to the other. Only requests outside a dialog traverse SIP proxies. This approach
makes SIP network more scalable because only a small number of SIP messages hit
the proxies.
</simpara>
<simpara>
There are certain situations in which a SIP proxy need to stay on the path of all
further messages. For instance, proxies controlling a NAT box or proxies doing
accounting need to stay on the path of BYE requests.
</simpara>
<simpara>
Mechanism by which a proxy can inform user agents that it wishes to stay on the path
of all further messages is called <emphasis>record routing</emphasis>. Such a proxy
would insert Record-Route header field into SIP messages which contain address of
the proxy. Messages sent within a dialog will then traverse all SIP proxies that
put a Record-Route header field into the message.
</simpara>
<simpara>
The recipient of the request receives a set of Record-Route header fields in the
message. It must mirror all the Record-Route header fields into responses because
the originator of the request also needs to know the set of proxies.
</simpara>
<figure id="bye">
<title>BYE Message Flow (With and without Record Routing)</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/bye.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing BYE message flow with and without record routing.</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
Left message flow of <xref linkend="bye"/> show how a BYE (request
within dialog established by INVITE) is sent directly to the other user agent
when there is no Record-Route header field in the message. Right message flow
show how the situation changes when the proxy puts a Record-Route header field
into the message.
</simpara>
<section id="strict_vs_loose">
<title>Strict versus Loose Routing</title>
<simpara>
The way how record routing works has evolved. Record routing according to
RFC2543 rewrote the Request-URI. That means the Request-URI always
contained URI of the next hop (which can be either next proxy server which
inserted Record-Route header field or destination user agent). Because of
that it was necessary to save the original Request-URI as the last Route
header field. This approach is called <emphasis>strict routing</emphasis>.
</simpara>
<simpara>
<emphasis>Loose routing</emphasis>, as specified in RFC3261, works in a
little bit different way. The Request-URI is no more overwritten, it always
contains URI of the destination user agent. If there are any Route header
field in a message, than the message is sent to the URI from the topmost
Route header field. This is significant change--Request-URI doesn't
necessarily contain URI to which the request will be sent. In fact, loose
routing is very similar to IP source routing.
</simpara>
<simpara>
Because transit from strict routing to loose routing would break backwards
compatibility and older user agents wouldn't work, it is necessary to make
loose routing backwards compatible. The backwards compatibility
unfortunately adds a lot of overhead and is often source of major problems.
</simpara>
</section>
</section>
<section id="sub_not">
<title>Event Subscription And Notification</title>
<simpara>
The SIP specification has been extended to support a general mechanism allowing
subscription to asynchronous events. Such evens can include SIP proxy statistics
changes, presence information, session changes and so on.
</simpara>
<simpara>
The mechanism is used mainly to convey information on presence (willingness to
communicate) of users. <xref linkend="event"/> shows the basic message
flow.
</simpara>
<figure id="event">
<title>Event Subscription And Notification</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/event.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing subscription and notification.</phrase>
</textobject>
</mediaobject>
</figure>
<simpara>
A user agent interested in event notification sends a SUBSCRIBE message to a
SIP server. The SUBSCRIBE message establishes a dialog and is immediately
replied by the server using 200 OK response. At this point the dialog is
established. The server sends a NOTIFY request to the user every time the event
to which the user subscribed changes. NOTIFY messages are sent within the dialog
established by the SUBSCRIBE.
</simpara>
<simpara>
Note that the first NOTIFY message in <xref linkend="event"/> is sent
regardless of any event that triggers notifications.
</simpara>
<simpara>
Subscriptions--as well as registrations--have limited lifespan and therefore must be
periodically refreshed.
</simpara>
</section>
<section id="im">
<title>Instant Messages</title>
<simpara>
Instant messages are sent using MESSAGE request. MESSAGE requests do not establish a
dialog and therefore they will always traverse the same set of proxies. This is the
simplest form of sending instant messages. The text of the instant message is
transported in the body of the SIP request.
</simpara>
<figure id="message">
<title>Instant Messages</title>
<mediaobject>
<imageobject>
<imagedata fileref="figures/message.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Picture showing a MESSAGE.</phrase>
</textobject>
</mediaobject>
</figure>
</section>
</section>
</section>