mirror of https://github.com/sipwise/kamailio.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
297 lines
10 KiB
297 lines
10 KiB
<?xml version="1.0" encoding='ISO-8859-1'?>
|
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
|
|
|
|
<!-- Include general documentation entities -->
|
|
<!ENTITY % docentities SYSTEM "../../../docbook/entities.xml">
|
|
%docentities;
|
|
|
|
]>
|
|
|
|
<!-- Module User's Guide -->
|
|
|
|
<chapter>
|
|
|
|
<title>&adminguide;</title>
|
|
|
|
<section>
|
|
<title>Overview</title>
|
|
<para>
|
|
Db_cassandra is one of the &siprouter; database modules. It does
|
|
not export any functions executable from the configuration scripts,
|
|
but it exports a subset of functions from the database API, and thus,
|
|
other modules can use it as a database driver, instead of, for
|
|
example, the Mysql module.
|
|
</para>
|
|
<para>
|
|
The storage backend is a <emphasis>Cassandra</emphasis> cluster and
|
|
this module provides an SQL interface to be used by other modules for
|
|
storing and retrieving data. Because Cassandra is a NoSQL distributed
|
|
system, there are limitations on the operations that can be performed.
|
|
The limitations concern the indexes on which queries are performed, as
|
|
it is only possible to have simple conditions (equality comparison only)
|
|
and only two indexing levels. These issues will be explained in an example
|
|
below.
|
|
</para>
|
|
<para>
|
|
Cassandra DB is especially suited for storing large data or data that requires
|
|
distribution, redundancy or replication. One usage example is
|
|
a distributed location system in a platform that has a cluster of &siprouter;
|
|
servers, with more proxies and registration servers accessing the same location
|
|
database. This was actually the main use case we had in mind when implementing
|
|
this module. Please NOTE that it has only been tested with the
|
|
<emphasis>usrloc</emphasis>, <emphasis>auth_db</emphasis> and <emphasis>domain</emphasis> modules.
|
|
</para>
|
|
<para>
|
|
You can find a configuration file example for this usage in the module - kamailio_cassa.cfg.
|
|
</para>
|
|
<para>
|
|
Because the module has to do the translation from SQL to Cassandra NoSQL
|
|
queries, the schemas for the tables must be known by the module.
|
|
You will find the schemas for location, subscriber and version tables in
|
|
utils/kamctl/dbcassandra directory. You have to provide the path to the
|
|
directory containing the table definitions by setting the module parameter
|
|
schema_path.
|
|
</para>
|
|
<para>
|
|
There is no need to configure a table metadata in Cassandra cluster.
|
|
You only need to define a keyspace with the name of the database and for each table
|
|
a column family inside that keyspace with the name of the table. The comparator
|
|
and validators should be either UTF8Type or ASCIIType.
|
|
Example:
|
|
</para>
|
|
<programlisting format="linespecific">
|
|
...
|
|
create keyspace openser;
|
|
use openser;
|
|
create column family 'location' with comparator='UTF8Type' and
|
|
default_validation_class='UTF8Type' and key_validation_class='UTF8Type';
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Special attention was given to performance in Cassandra. Therefore, the
|
|
implementation uses only the native row indexing in Cassandra and no secondary
|
|
indexes, because they are costly. Instead, we simulate a secondary index by
|
|
using the column names and putting information in them, which is very efficient.
|
|
Also, for deleting expired records, we let Cassandra take care of this with
|
|
its own mechanism (by setting the TTL for columns).
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Dependencies</title>
|
|
<section>
|
|
<title>&siprouter; Modules</title>
|
|
<para>
|
|
The following modules must be loaded before this module:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>No dependencies on other &siprouter; modules</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>External Libraries or Applications</title>
|
|
<para>
|
|
The following libraries or applications must be installed before running
|
|
&siprouter; with this module loaded:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Thrift library</emphasis> version 0.6.1 .
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para> The implementation was tested with Cassandra version 1.0.1 .</para>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Parameters</title>
|
|
<section>
|
|
<title><varname>schema_path</varname> (string)</title>
|
|
<para>
|
|
The directory where the files with the table schemas are located.
|
|
This directory has to contain the subdirectories corresponding to the
|
|
databases' names (name of the directory = name of the database).
|
|
These directories, in turn, contain the files with the table schemas.
|
|
See the schemas in utils/kamctl/dbcassandra directory.
|
|
</para>
|
|
<example>
|
|
<title>Set <varname>schema_path</varname> parameter</title>
|
|
<programlisting format="linespecific">
|
|
...
|
|
modparam("db_cassandra", "schema_path",
|
|
"/usr/local/sip-router/etc/kamctl/dbcassandra")
|
|
...
|
|
</programlisting>
|
|
</example>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Functions</title>
|
|
<para>
|
|
NONE
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Installation</title>
|
|
<para>
|
|
Because it dependes on an external library, the db_cassandra module is not
|
|
compiled and installed by default. You can use one of these options:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
- edit the "Makefile" and remove "db_cassandra" from "excluded_modules"
|
|
list. Then follow the standard procedure to install &siprouter;:
|
|
"make all; make install".
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
- from command line, run: 'make all include_modules="db_cassandra";
|
|
make install include_modules="db_cassandra"'.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<title>Table schema</title>
|
|
<para>
|
|
The module must know the table schema for the tables that will be used.
|
|
You must configure the path to the schema directory by setting the
|
|
<emphasis>schema_path</emphasis> parameter.
|
|
</para>
|
|
<para>
|
|
A table schema document has the following structure:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
First row: <emphasis>the name and type of the columns</emphasis> in the form name(type)
|
|
separated by spaces. The possible types are: string, int, double and timestamp.
|
|
</para>
|
|
<para>
|
|
The<emphasis>timestamp</emphasis> type has a special meaning. Only one column of this type can
|
|
be defined for a table, and it should contain the expiry time for that record.
|
|
If defined this value will be used to compute the <emphasis>ttl</emphasis> for the columns
|
|
and Cassandra will automatically delete the columns when they expire. Because we want the
|
|
ttl to have meaning for the entire record, we must ensure that when the ttl is updated, it
|
|
is updated for all columns for that record. In other words, to update the expiration time
|
|
of a record, an insert operation must be performed from the point of view of the db_cassandra
|
|
module ("insert" in Cassandra means "replace if exists or insert new record otherwise"). So, if
|
|
you define a table with a timestamp column, the update operations on that table that also
|
|
update the timestamp must update all columns. So, these update operations must in fact be insert
|
|
operations.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Second row: <emphasis>the columns that form the row key</emphasis> separated by space.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Third row: <emphasis>the columns that form the secondary key</emphasis> separated by space.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Bellow you can see the schema for the <emphasis>location</emphasis> table:
|
|
</para>
|
|
<para>
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
callid(string) cflags(int) contact(string) cseq(int) <emphasis>expires(timestamp)</emphasis> flags(int) last_modified(int) methods(int) path(string) q(double) received(string) socket(string) user_agent(string) username(string) ruid(string) instance(string) reg_id(int)
|
|
<emphasis>username</emphasis>
|
|
<emphasis>contact</emphasis>
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
Observe first that the <emphasis>row key is the username</emphasis> and the <emphasis>secondary index is the contact</emphasis>.
|
|
We have also defined a timestamp column - <emphasis>expires</emphasis>.
|
|
In this example, both the row key and the secondary index are defined by only one column,
|
|
but they can be formed out of more columns. You can list them separated by space.
|
|
</para>
|
|
|
|
<para>
|
|
To understand why the schema looks like this, we must first see which
|
|
queries are performed on the location table.
|
|
(The 'callid' condition was ignored as it doesn't really have a well defined role in the SIP RFC).
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
When Invite received, lookup location: select where <emphasis>username='..'</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When Register received, update registration:
|
|
update where <emphasis>username='..'</emphasis> and <emphasis>contact='..'</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
So, the relation between these keys is the following:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The unique key for a table is actually the combination of row key + secondary key.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A row defined by a row key will contain more records with different secondary keys.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
The timestamp column that leaves the Cassandra cluster to deal with deleting expired
|
|
record can be used only with a modification to the usrloc module that replaces the
|
|
update performed at re-registration with an insert operation (so that all columns
|
|
are updated).
|
|
This behavior can be enabled by setting a parameter in the usrloc module
|
|
<emphasis>db_update_as_insert</emphasis>:
|
|
</para>
|
|
<para>
|
|
</para>
|
|
|
|
<programlisting format="linespecific">
|
|
...
|
|
modparam("usrloc", "db_update_as_insert", 1)
|
|
...
|
|
</programlisting>
|
|
|
|
<para>
|
|
The alternative would have been to define an index on the expire column and
|
|
run a external job to periodically delete the expired records. However,
|
|
obviously, this would be more costly.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Limitations</title>
|
|
<para>
|
|
The module can be used only when the queries use only one index, which is also the
|
|
unique key, or have two indexes that form the unique key like in the usrloc usage.
|
|
</para>
|
|
</section>
|
|
|
|
</chapter>
|