mirror of https://github.com/sipwise/kamailio.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
355 lines
12 KiB
355 lines
12 KiB
<?xml version="1.0" encoding='ISO-8859-1'?>
|
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
|
|
|
|
<!-- Include general documentation entities -->
|
|
<!ENTITY % docentities SYSTEM "../../../docbook/entities.xml">
|
|
%docentities;
|
|
|
|
]>
|
|
|
|
<!-- Module User's Guide -->
|
|
|
|
<chapter>
|
|
|
|
<title>&adminguide;</title>
|
|
|
|
<section>
|
|
<title>Overview</title>
|
|
<para>
|
|
Db_cassandra is one of the &kamailio; database modules. It does
|
|
not export any functions executable from the configuration scripts,
|
|
but it exports a subset of functions using the database API, and thus,
|
|
other modules can use it as a database driver, instead of, for
|
|
example, the Mysql module.
|
|
</para>
|
|
<para>
|
|
The storage backend is a <emphasis>Cassandra</emphasis> cluster and
|
|
this module provides an SQL interface to be used by other modules for
|
|
storing and retrieving data. Because Cassandra is a NoSQL distributed
|
|
system, there are limitations on the operations that can be performed.
|
|
The limitations concern the indexes on which queries are performed, as
|
|
it is only possible to have simple conditions (equality comparison only)
|
|
and only two indexing levels. These issues will be explained in an example
|
|
below.
|
|
</para>
|
|
<para>
|
|
Cassandra DB is especially suited for storing large amounts of data or data
|
|
that requires distribution, redundancy or replication. One usage example is
|
|
a distributed location system in a platform that has a cluster of &siprouter;
|
|
servers, with several proxies and registration servers accessing the same location
|
|
database. This was actually the main use case we had in mind when implementing
|
|
this module. Please NOTE that it has only been tested with the
|
|
<emphasis>usrloc</emphasis>, <emphasis>auth_db</emphasis> and <emphasis>domain</emphasis> modules.
|
|
</para>
|
|
<para>
|
|
You can find a configuration file example for this usage in the module - kamailio_cassa.cfg.
|
|
</para>
|
|
<para>
|
|
Because the module has to do the translation from SQL to Cassandra NoSQL
|
|
queries, the schemas for the tables must be known by the module.
|
|
You will find the schemas for location, subscriber and version tables in
|
|
utils/kamctl/dbcassandra directory. You have to provide the path to the
|
|
directory containing the table definitions by setting the module parameter
|
|
schema_path.
|
|
</para>
|
|
<para>
|
|
There is no need to configure a table metadata in Cassandra cluster.
|
|
You only need to define a keyspace with the name of the database and for each table
|
|
a column family inside that keyspace with the name of the table. The comparator
|
|
and validators should be either UTF8Type or ASCIIType.
|
|
Example:
|
|
</para>
|
|
<programlisting format="linespecific">
|
|
...
|
|
create keyspace kamailio;
|
|
use kamailio;
|
|
create column family 'location' with comparator='UTF8Type' and
|
|
default_validation_class='UTF8Type' and key_validation_class='UTF8Type';
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Special attention was given to performance in Cassandra. Therefore, the
|
|
implementation uses only the native row indexing in Cassandra and no secondary
|
|
indexes, because they are costly. Instead, we simulate a secondary index by
|
|
using the column names and putting information in them, which is very efficient.
|
|
Also, for deleting expired records, we let Cassandra take care of this with
|
|
its own mechanism (by setting the TTL for columns).
|
|
</para>
|
|
|
|
<para>
|
|
The module supports raw queries. However these queries must follow the
|
|
CQL (Cassandra Query Language) syntax. The queries can be issued
|
|
in the script by means of the AVPOPS module. Keep in mind that when passing back
|
|
the results from the database only the first row is used to set the AVP variables.
|
|
(default AVPOPS behaviour)
|
|
|
|
The script lines below can be used as an example for issuing the query towards
|
|
an cassandra instance.(This example will work once the column family `location`
|
|
is configured correctly in the cassandra keyspace)
|
|
</para>
|
|
<programlisting format="linespecific">
|
|
...
|
|
$var(dballowed)="select * from location where key = 'userx' limit 1;";
|
|
avp_db_query("$var(dballowed)");
|
|
xlog("L_INFO","Got result here: [$avp(i:1)] [$avp(i:2)] [$avp(i:3)].\n");
|
|
...
|
|
</programlisting>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Dependencies</title>
|
|
<section>
|
|
<title>&siprouter; Modules</title>
|
|
<para>
|
|
The following modules must be loaded before this module:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>No dependencies on other &siprouter; modules</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>External Libraries or Applications</title>
|
|
<para>
|
|
The following libraries or applications must be installed before running
|
|
&siprouter; with this module loaded:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Thrift library</emphasis>
|
|
(tested with version 0.6.1 and version 0.7.0).
|
|
You can download it from http://archive.apache.org/dist/thrift .
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para> The implementation was tested with Cassandra version 1.0.1
|
|
and version 1.1.6. I used the sourced from DataStax Community
|
|
Edition (http://www.datastax.com/download/community).
|
|
</para>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Parameters</title>
|
|
<section>
|
|
<title><varname>schema_path</varname> (string)</title>
|
|
<para>
|
|
The directory where the files with the table schemas are located.
|
|
This directory has to contain the subdirectories corresponding to the
|
|
database name (name of the directory = name of the database).
|
|
These directories, in turn, contain the files with the table schemas.
|
|
See the schemas in utils/kamctl/dbcassandra directory.
|
|
</para>
|
|
<example>
|
|
<title>Set <varname>schema_path</varname> parameter</title>
|
|
<programlisting format="linespecific">
|
|
...
|
|
modparam("db_cassandra", "schema_path",
|
|
"/usr/local/kamailio/etc/kamctl/dbcassandra")
|
|
...
|
|
</programlisting>
|
|
</example>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Functions</title>
|
|
<para>
|
|
NONE
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Installation</title>
|
|
<para>
|
|
Because the db_cassandra module dependes on an external library, it is not
|
|
compiled and installed by default. You can use one of these options:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
- edit the "Makefile" and remove "db_cassandra" from "excluded_modules"
|
|
list. Then follow the standard procedure to install &siprouter;:
|
|
"make all; make install".
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
- from command line, run: 'make all include_modules="db_cassandra";
|
|
make install include_modules="db_cassandra"'.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<title>Table schema</title>
|
|
<para>
|
|
The module must know the table schema for the tables that will be used.
|
|
You must configure the path to the schema directory by setting the
|
|
<emphasis>schema_path</emphasis> parameter.
|
|
</para>
|
|
<para>
|
|
A table schema document has the following structure:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
First row: <emphasis>the name and type of the columns</emphasis> in the form name(type)
|
|
separated by spaces. The possible types are: string, int, double and timestamp.
|
|
</para>
|
|
<para>
|
|
The<emphasis>timestamp</emphasis> type has a special meaning. Only one column of this type can
|
|
be defined for a table, and it should contain the expiry time for that record.
|
|
If defined this value will be used to compute the <emphasis>ttl</emphasis> for the columns
|
|
and Cassandra will automatically delete the columns when they expire. Because we want the
|
|
ttl to have meaning for the entire record, we must ensure that when the ttl is updated, it
|
|
is updated for all columns for that record. In other words, to update the expiration time
|
|
of a record, an insert operation must be performed from the point of view of the db_cassandra
|
|
module ("insert" in Cassandra means "replace if exists or insert new record otherwise"). So, if
|
|
you define a table with a timestamp column, the update operations on that table that also
|
|
update the timestamp must update all columns. So, these update operations must in fact be insert
|
|
operations.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Second row: <emphasis>the columns that form the row key</emphasis> separated by space.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Third row: <emphasis>the columns that form the secondary key</emphasis> separated by space.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Below you can see the schema for the <emphasis>location</emphasis>
|
|
table (when use_domain not set):
|
|
</para>
|
|
<para>
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
callid(string) cflags(int) contact(string) cseq(int) <emphasis>expires(timestamp)</emphasis> flags(int) last_modified(int) methods(int) path(string) q(double) received(string) socket(string) user_agent(string) username(string) ruid(string) instance(string) reg_id(int)
|
|
<emphasis>username</emphasis>
|
|
<emphasis>contact</emphasis>
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Observe first that the <emphasis>row key is the username</emphasis> and
|
|
the <emphasis>secondary index is the contact</emphasis>.
|
|
We have also defined a timestamp column - <emphasis>expires</emphasis>.
|
|
</para>
|
|
|
|
<para>
|
|
If you need to use the domain part of the AOR also (you have set use_domain
|
|
parameter for usrloc in the script), you should include the domain column in
|
|
the list of columns and in the primary key. The schema will then look like this:
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
callid(string) cflags(int) contact(string) cseq(int) domain(string) <emphasis>expires(timestamp)</emphasis> flags(int) last_modified(int) methods(int) path(string) q(double) received(string) socket(string) user_agent(string) username(string) ruid(string) instance(string) reg_id(int)
|
|
<emphasis>username domain</emphasis>
|
|
<emphasis>contact</emphasis>
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Notice that a key (primary or secondary) can be composed from more columns,
|
|
in which case you have to specify them separated by space.
|
|
</para>
|
|
<para>
|
|
To understand why the schema looks like this, we must first see which
|
|
queries are performed on the location table.
|
|
(The 'callid' condition was ignored as it doesn't really have a well defined role in the SIP RFC).
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
When Invite received, lookup location: select where <emphasis>username='..'</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When Register received, update registration:
|
|
update where <emphasis>username='..'</emphasis> and <emphasis>contact='..'</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
So, the relation between these keys is the following:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The unique key for a table is actually the combination of row key + secondary key.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A row defined by a row key will contain more records with different secondary keys.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
The timestamp column that leaves the Cassandra cluster to deal with deleting the expired
|
|
records. For this to work right we needed to modify a bit the behavior of usrloc module and
|
|
replace update sql query performed at re-registration with an insert sql query (so that all
|
|
columns are updated and the new timestamp is set for all columns).
|
|
This behavior is enabled by setting a parameter in the usrloc module
|
|
<emphasis>db_update_as_insert</emphasis>:
|
|
</para>
|
|
<para>
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
modparam("usrloc", "db_update_as_insert", 1)
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Also you should disable in usrloc module the timer routine that checks for expired records.
|
|
You can do this by setting the timer interval to 0.
|
|
<emphasis>timer_interval</emphasis>:
|
|
</para>
|
|
<para>
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
modparam("usrloc", "timer_interval", 0)
|
|
...
|
|
</programlisting>
|
|
|
|
|
|
<para>
|
|
The alternative would have been to define an index on the expire column and
|
|
run a external job to periodically delete the expired records. However,
|
|
obviously, this would be more costly.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Limitations</title>
|
|
<para>
|
|
The module can be used only when the queries use only one index, which is also the
|
|
unique key, or have two indexes that form the unique key like in the usrloc usage.
|
|
</para>
|
|
</section>
|
|
|
|
</chapter>
|