The Server Startup

$Revision$ $Date$ The Server Startup The main function in file main.c is the first function called upon server startup. Its purpose is to initialize the server and enter main loop. The server initialization will be described in the following sections. Particular initialization steps are described in order in which they appear in main function.

Installation Of New Signal Handlers The first step in the initialization process is the installation of new signal handlers. We need our own signal handlers to be able to do graceful shutdown, print server statistics and so on. There is only one signal handler function which is function sig_usr in file main.c. The following signals are handled by the function: SIGINT, SIGPIPE, SIGUSR1, SIGCHLD, SIGTERM, SIGHUP and SIGUSR2.

Processing Command Line Parameters SER utilizes the getoptfunction to parse command line parameters. The function is extensively described in the man pages.

Parser Initialization SER contains a fast 32-bit parser. The parser uses pre-calculated hash table that needs to be filled in upon startup. The initialization is done here, there are two functions that do the job. Function init_hfname_parser initializes hash table in header field name parser and function init_digest_parser initializes hash table in digest authentication parser. The parser's internals will be described later.

Malloc Initialization To make SER even faster we decided to re-implement memory allocation routines. The new malloc better fits our needs and speeds up the server a lot. The memory management subsystem needs to be initialized upon server startup. The initialization mainly creates internal data structures and allocates memory region to be partitioned. The memory allocation code must be initialized BEFORE any of its function is called !

Timer Initialization Various subsystems of the server must be called periodically regardless of the incoming requests. That's what timer is for. Function init_timer initializes the timer subsystem. The function is called from main.c and can be found in timer.c The timer subsystem will be described later. Timer subsystem must be initialized before config file is parsed !

FIFO Initialization SER has built-in support for FIFO control. It means that the running server can accept commands over a FIFO special file (a named pipe). Function register_core_fifo initializes FIFO subsystem and registers basic commands, that are processed by the core itself. The function can be found in file fifo_server.c. The FIFO server will be described in another chapter.

Built-in Module Initialization Modules can be either loaded dynamically at runtime or compiled in statically. When a module is loaded at runtime, it is registered Module registration is a process when the core tries to find what functions and parameters are offered by the module. immediately with the core. When the module is compiled in statically, the registration must be performed during the server startup. Function register_builtin_modules does the job.

Server Configuration The server is configured through a configuration file. The configuration file is C-Shell like script which defines how incoming requests should be processed. The file cannot be interpreted directly because that would be very slow. Instead of that the file is translated into an internal binary representation. The process is called compilation and will be described in the following sections. The following sections only describe how the internal binary representation is being constructed from the config file. The way how the binary representation is used upon a request arrival will be described later. The compilation can be divided in several steps:

Lexical Analysis Lexical analysis is process of converting the input (the configuration file in this case) into a stream of tokens. A token is a set of characters that 'belong' together. A program that can turn the input into stream of tokens is called scanner. For example, when scanner encounters a number in the config file, it will produce token NUMBER. There is no need to implement the scanner from scratch, it can be done automatically. There is a utility called flex. Flex accepts a configuration file and generates scanner according to the configuration file. The configuration file for flex consists of several lines - each line describing one token. The tokens are described using regular expressions. For more details, see flex manual page or info documentation. Flex input file for the SER config file is in file cfg.lex. The file is processed by flex when the server is being compiled and the result is written in file lex.yy.c. The output file contains the scanner implemented in the C language.

Syntactical Analysis The second stage of configuration file processing is called syntactical analysis. Purpose of syntactical analysis is to check if the configuration file has been well formed, doesn't contain syntactical errors and perform various actions at various stages of the analysis. Program performing syntactical analysis is called parser. Structure of the configuration file is described using grammar. Grammar is a set of rules describing valid 'order' or 'combination' of tokens. If the file isn't conformable with its grammar, it is syntactically invalid and cannot be further processed. In that case an error will be issued and the server will be aborted. There is a utility called yacc. Input of the utility is a file containing the grammar of the configuration file, in addition to the grammar, you can describe what action the parser should do at various stages of parsing. For example, you can instruct the parser to create a structure describing an IP address every time it finds an IP address in the configuration file and convert the address to its binary representation. For more information see yacc documentation. yacc creates the parser when the server is being compiled from the sources. Input file for yacc is cfg.y. The file contains grammar of the config file along with actions that create the binary representation of the file. Yacc will write its result into file cfg.tab.c. The file contains function yyparse which will parse the whole configuration file and construct the binary representation. For more information about the bison input file syntax see bison documentation.

Config File Structure The configuration file consist of three sections, each of the sections will be described separately. Route Statement - The statement describes how incoming requests will be processed. When a request is received, commands in one or more "route" sections will be executed step by step. The config file must always contain one main "route" statement and may contain several additional "route" statements. Request processing always starts at the beginning of the main "route" statement. Additional "route" statements can be called from the main one or another additional "route" statements (It it similar to function calling). Assign Statement - There are many configuration variables across the server and this statement makes it possible to change their value. Generally it is a list of assignments, each assignment on a separate line. Module Statement - Additional functionality of the server is available through separate modules. Each module is a shared object that can be loaded at runtime. Modules can export functions, that can be called from the configuration file and variables, that can be configured from the config file. The module statement makes it possible to load modules and configure them. There are two commands in the statement - loadmodule and modparam. The first can load a module. The second one can configure module's internal variables. In the following sections we will describe in detail how the three sections are being processed upon server startup.

Route Statement The following grammar snippet describes how the route statement is constructed route_stm = "route" "{" actions "}" { $$ = push($3, &rlist[DEFAULT_RT]); } actions = actions action { $$ = append_action($1, $2}; } | action { $$ = $1; } action = cmd SEMICOLON { $$ = $1; } | SEMICOLON { $$ = 0; } cmd = "forward" "(" host ")" { $$ = mk_action(FORWARD_T, STRING_ST, NUMBER_ST, $3, 0) | ... A config file can contain one or more "route" statements. "route" statement without number will be executed first and is called the main route statement. There can be additional route statements identified by number, these additional route statements can be called from the main route statement or another additional route statements. Each route statement consists of a set of actions. Actions in the route statement are executed step by step in the same order in which they appear in the config file. Actions in the route statement are delimited by semicolon. Each action consists of one and only one command (cmd in the grammar). There are many types of commands defined. We don't list all of them here because the list would be too long and all the commands are processed in the same way. Therefore we show only one example (forward) and interested readers might look in cfg.y file for full list of available commands. Each rule in the grammar contains a section enclosed in curly braces. The section is the C code snippet that will be executed every time the parser recognizes that rule in the config file. For example, when the parser finds forward command, mk_action function (as specified in the grammar snippet above) will be called. The function creates a new structure with type field set to FORWARD_T representing the command. Pointer to the structure will be returned as the return value of the rule. The pointer propagates through action rule to actions rule. Actions rule will create linked list of all commands. The linked list will be then inserted into rlist table. (Function push in rule route_stm). Each element of the table represents one "route" statement of the config file. Each route statement of the configuration file will be represented by a linked list of all actions in the statement. Pointers to all the lists will be stored in rlist array. Additional route statements are identified by number. The number also serves as index to the array. When the core is about to execute route statement with number n, it will look in the array at position n. If the element at position n is not null then there is a linked list of commands and the commands will be executed step by step. Reply-Route statement is compiled in the same way. Main differences are: Reply-Route statement is executed when a SIP REPLY comes (not ,SIP REQUEST). Only subset of commands is allowed in the reply-route statement. (See file cfg.y for more details). Reply-route statement has its own array of linked-lists.

Assign Statement The server contains many configuration variables. There is a section of the config file in which the variables can be assigned new value. The section is called The Assign Statement. The following grammar snippet describes how the section is constructed (only one example will be shown): assign_stm = "children" '=' NUMBER { children_no=$3; } | "children" '=' error { yyerror("number expected"); } ... The number in the config file is assigned to children_no variable. The second statement will be executed if the parameter is not number or is in invalid format and will issue an error and abort the server.

Module Statement The module statement allows module loading and configuration. There are two commands: loadmodule - Load the specified module in form of a shared object. The shared object will be loaded using dlopen. modparam - It is possible to configure a module using this command. The command accepts 3 parameters: module name, variable name and variable value. The following grammar snippet describes the module statement: module_stm = "loadmodule" STRING { DBG("loading module %s\n", $2); if (load_module($2)!=0) { yyerror("failed to load module"); } } | "loadmodule" error { yyerror("string expected"); } | "modparam" "(" STRING "," STRING "," STRING ")" { if (set_mod_param($3, $5, PARAM_STR|PARAM_STRING, $7) != 0) { yyerror("Can't set module parameter"); } } | "modparam" "(" STRING "," STRING "," NUMBER ")" { if (set_mod_param($3, $5, PARAM_INT, (void*)$7) != 0) { yyerror("Can't set module parameter"); } } | MODPARAM error { yyerror("Invalid arguments"); } When the parser finds loadmodule command, it will execute statement in curly braces. The statement will call load_module function. The function will load the specified filename using dlopen. If dlopen was successful, the server will look for exports structure describing the module's interface and register the module. For more details see module section. If the parser finds modparam command, it will try to configure the specified variable in the specified module. The module must be loaded using loadmodule before modparam for the module can be used ! Function set_mod_param will be called and will configure the variable in the specified module.

Interface Configuration The server will try to obtain list of all configured interfaces of the host it is running on. If it fails the server tries to convert hostname to IP address and will use interface with the IP address only. Function add_interfaces will add all configured interfaces to the array. Try to convert all interface names to IP addresses, remove duplicates...

Turning into a Daemon When configured so, SER becomes a daemon during startup. A process is called daemon when it hasn't associated controlling terminal. See function daemonize in file main.c for more details. The function does the following: chroot is performed if necessary. That ensures that the server will have access to a particular directory and its subdirectories only. Server's working directory is changed if the new working directory was specified (usually it is /). If command line parameter -g was used, the server's group ID is changed to that value. If command line parameter -u was used, the server's user ID is changed to that value. Perform fork, let the parent process exit. This ensures that we are not a group leader. Perform setsid to become a session leader and drop the controlling terminal. Fork again to drop group leadership. Create a pid file. Close all opened file descriptors.

Module Initialization The whole config file was parsed, all modules were loaded already and can be initialized now. A module can tell the core that it needs to be initialized by exporting mod_init function. mod_init function of all loaded modules will be called now.

Routing List Fixing After the whole routing list was parsed, there might be still places that can be further processed to speed-up the server. For example, several commands accept regular expression as one of their parameters. The regular expression can be compiled too and processing of compiled expression will be much faster. Another example might be string as parameter of a function. For example if you call append_hf("Server: SIP Express Router\r\n") from the routing script, the function will append a new header field after the last one. In this case, the function needs to know length of the string parameter. It could call strlen every time it is called, but that is not a very good idea because strlen would be called every time a message is processed and that is not necessary. Instead of that the length of the string parameter could be pre-calculated upon server startup, saved and reused later. The processing of the request will be faster because append_hf doesn't need to call strlen every time, I can just reuse the saved value. This can be used also for string to int conversions, hostname lookups, expression evaluation and so on. This process is called Routing List Fixing and will be done as one of last steps of the server startup. Every loaded module can export one or more functions. Each such function can have associated a fixup function, which should do fixing as described in this section. All such fixups of all loaded modules will be called here. That makes it possible for module functions to fix their parameters too if necessary.

Statistics Initialization If compiled-in, the core can produce some statistics about itself and traffic processed. The statistics subsystem gets initialized here, see function init_stats.

Socket Initialization UDP socket initialization depends on dont_fork variable. If this variable is set (only one process will be processing incoming requests) and there are multiple listen interfaces, only the first one will be used. This mode is mainly for debugging. If the variable is not set, then sockets for all configured interfaces will be created and initialized. See function udp_init in file udp_server.c for more details.

Forking The rest of the initialization process depends on value of dont_fork variable. dont_fork is a global variable defined in main.c. We will describe both variants separately.

<varname>dont_fork</varname> variable is set (not zero) If dont_fork variable is set, the server will be operating in special mode. There will be only one process processing incoming requests. This is very slow and was intended mainly for debugging purposes. The main process will be processing all incoming requests itself. The server still needs additional children: One child is for the timer subsystem, the child will be processing timers independently of the main process. FIFO server will spawn another child if enabled. The child will be processing all commands coming through the fifo interface. If SNMP support was enabled, another child will be created. The following initialization will be performed in dont_fork mode. (look into function main_loop in file main.c. Another child will be forked for the timer subsystem. Initialize the FIFO server if enabled, this will fork another child. For more info about the FIFO server, see section The FIFO server. Call init_child(0). The function performs per-child specific initialization of all loaded modules. A module can be initialized though mod_init function. The function is called BEFORE the server forks and thus is common for all children. If there is anything, that needs to be initialized in every child separately (for example if each child needs to open its own file descriptor), it cannot be done in mod_init. To make such initialization possible, a module can export another initialization function called init_child. The function will be called in all children AFTER fork of the server. And since we are in "dont fork" mode and there will no children processing requests (remember the main process will be processing all requests), the init_child wouldn't be called. That would be bad, because child_init might do some initialization that must be done otherwise modules might not work properly. To make sure that module initialization is complete we will call init_child here for the main process even if we are not going to fork. That's it. Everything has been initialized properly and as the last step we will call udp_rcv_loop which is the main loop function. The function will be described later.

<varname>dont_fork</varname> is not set (zero) dont_fork is not set. That means that the server will fork children and the children will be processing incoming requests. How many children will be created depends on the configuration (children variable). The main process will be sleeping and handling signals only. The main process will then initialize the FIFO server. The FIFO server needs another child to handle communication over FIFO and thus another child will be created. The FIFO server will be described in more detail later. Then the main process will perform another fork for the timer attendant. The child will take care of timer lists and execute specified function when a timer hits. The main process is now completely initialized, it will sleep in pause function until a signal comes and call handle_sigs when such condition occurs. The following initialization will be performed by each child separately: Each child executes init_child function. The function will sequentially call child_init functions of all loaded modules. Because the function is called in each child separately, it can initialize per-child specific data. For example if a module needs to communicate with database, it must open a database connection. If the connection would be opened in mod_init function, all the children would share the same connection and locking would be necessary to avoid conflicts. On the other hand if the connection was opened in child_init function, each child will have its own connection and concurrency conflicts will be handled by the database server. And last, but not least, each child executes udp_rcv_loop function which contains the main loop logic.