mirror of https://github.com/sipwise/heartbeat.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
629 lines
32 KiB
629 lines
32 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type"
|
|
content="text/html; charset=iso-8859-1">
|
|
<meta name="Description"
|
|
content="This document is a short description of how to get started with Linux-HA (heartbeat), especially from the software perspective.">
|
|
<meta name="Author" content="Rudy Pawul - rpawul@iso-ne.com">
|
|
<meta name="GENERATOR"
|
|
content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]">
|
|
<title>Getting Started with Linux-HA (heartbeat)</title>
|
|
</head>
|
|
<body>
|
|
<h1> Getting Started with Linux-HA (heartbeat)</h1>
|
|
<h2> Intro</h2>
|
|
Let me preface this document by saying most of this is _not_ original
|
|
work. My purpose for writing this document is just trying to
|
|
contribute in some way to possibly help those who REALLY get things
|
|
done. The "work" I am contributing is mostly compiling bits and
|
|
pieces from other HA documents (such as Volker Wiegand's Hardware
|
|
Installation Guide) into a document that can help novices get started on
|
|
HA without pestering Alan (like I did!) and to cut down on repeat
|
|
questions on the mailing list. <br>
|
|
|
|
<h2> Getting Started</h2>
|
|
The first thing you'll need is two computers. You need not have
|
|
identical hardware in both machines (or amount of memory, etc.), but if
|
|
you did, it would make your life that much easier when a component
|
|
fails.
|
|
<p>Now you have to decide on some of your implementation. Your
|
|
"cluster" is established via a "heartbeat" between the two computers
|
|
(nodes) generated by the software package of the same name.
|
|
However, this heartbeat needs one or more media paths (serial via a null
|
|
modem cable, ethernet via a crossover cable, etc.) between the nodes. </p>
|
|
<p>At this point, you're actually ready to begin hardware-wise.
|
|
Of course, since you're looking into HA, you'll mostly likely want to
|
|
avoid having only one point of failure. In this case, that would
|
|
be your null modem cable/serial port or network interface
|
|
card(NIC)/crossover cable. So, you need to decide whether you wish
|
|
to add a second serial/null modem connection or a second network
|
|
interface card (NIC)/crossover connnection to each node. See
|
|
Appendix A for instructions on how to build a Cat-5 crossover
|
|
cable. My heartbeat path setup uses one serial port and one extra
|
|
NIC because I only had one null modem cable, had an extra of NIC on hand
|
|
and thought it was good to have two medium types for the heartbeats. </p>
|
|
<p>Once your hardware is in order, you must install your OS and
|
|
configure your networking (I used Red Hat). Assuming you have 2
|
|
NICs, one should be configured for your "normal" network and the other
|
|
as a private network between your clustered nodes (via the crossover
|
|
cable). For an example, we will assume that our cluster will have
|
|
the following addresses: </p>
|
|
<p>Node 1 (linuxha1): 192.168.85.1 (normal 192x net) <br>
|
|
|
|
10.0.0.1 (private 10x net for heartbeat) <br>
|
|
Node 2 (linuxha2): 192.168.85.2 (192x) <br>
|
|
|
|
10.0.0.2 (10x) <br>
|
|
<i><font color="#ff0000">Note: None of these addresses should be
|
|
your "cluster address" - the address handled by heartbeat and failed
|
|
over between nodes!</font></i><br>
|
|
</p>
|
|
<p>Most *nix distributions this easy during installation, however, if
|
|
you are having any problems, refer to either the Ethernet HOWTO, or the
|
|
documentation for your distribution. To check
|
|
your configuration, type: </p>
|
|
<p> <b><tt>ifconfig</tt></b> </p>
|
|
<p>This will show your network interfaces and their
|
|
configuration. You can obtain your network routing information
|
|
from "netstat -nr". </p>
|
|
<p>If it looks good, make sure you can ping between both nodes on all
|
|
interfaces. </p>
|
|
<p>Next, if you're using one, you'll need to test your serial
|
|
connection. On one node, which will be the receiver, type: <br>
|
|
<b><tt>cat
|
|
</dev/ttyS0</tt></b> </p>
|
|
<p>On the other node, type,: <br>
|
|
<b><tt>echo
|
|
hello >/dev/ttyS0</tt></b> </p>
|
|
<p>You should see the text on the receiver node. If it works,
|
|
change their roles and try again. If it doesn't, it may be as
|
|
simple as having the wrong device file. Volker's HA Hardware Guide
|
|
and the Serial HOWTO are two good resources for troubleshooting your
|
|
serial connection. </p>
|
|
<h2> Installing Heartbeat.</h2>
|
|
You can now install the heartbeat package. If you're reading
|
|
this, you already have it, but in any case it's available at:
|
|
<p> <a
|
|
href="http://linux-ha.org/download">http://linux-ha.org/download</a> </p>
|
|
<p>There are binary RPMs at the website, or you can build heartbeat
|
|
from source. Grab the tarball (or install the source RPM).
|
|
Untar it into your favorite source directory. From the
|
|
top of the source tree, type "<small><span style="font-weight: bold;">./ConfigureMe
|
|
configure</span></small>", followed by "<small><span
|
|
style="font-weight: bold;">make</span><big>" and "<small><span
|
|
style="font-weight: bold;">make install</span><big>". If you
|
|
have problems installing the RPMs found at the website and want a
|
|
way to make your own, there may be help in the <a
|
|
href="./faqntips.html">FAQ</a>. </big></small></big></small><span
|
|
style="font-weight: bold;"></span></p>
|
|
<h2> Configuring Heartbeat</h2>
|
|
<b><font size="+1">Configuring ha.cf</font></b> <br>
|
|
There are three files you will need to configure before starting up
|
|
heartbeat. First, is <i>ha.cf</i>. This will be placed in
|
|
the /etc/ha.d directory that is created after installation. It
|
|
tells heartbeat what types of media paths to use and how to configure
|
|
them. The ha.cf in the source directory contains all the
|
|
various options you can use, I'll go through it line by line...
|
|
<dl>
|
|
<dt> <b><tt><font size="+1">serial /dev/ttyS0</font></tt></b></dt>
|
|
<dd> Use a serial heartbeat - if you don't use a serial heartbeat, you
|
|
must use another medium, such as a bcast (ethernet) heartbeat.
|
|
Replace /dev/ttyS0 with the appropriate device file for your
|
|
required serial heartbeat.</dd>
|
|
<dt> <b><tt><font size="+1">watchdog /dev/watchdog</font></tt></b></dt>
|
|
<dd> Optional. The watchdog function provides a way to have a
|
|
system that is still minimally functioning, but not providing a
|
|
heartbeat, reboot itself after a minute of being sick. This could
|
|
help to avoid a scenario where the machine recovers its heartbeat after
|
|
being pronounced dead. If that happened and a disk mount failed
|
|
over, you could have two nodes mounting a disk simultaneously. If you
|
|
wish to use this feature, then in addition to this line, you will need
|
|
to load the "softdog" kernel module and create the actual device
|
|
file. To do this, first type "<b>insmod softdog</b>" to load the
|
|
module. Then, type "grep misc /proc/devices" and note the number it
|
|
reports (should be 10). Next, type "<b><tt>cat /proc/misc | grep
|
|
watchdog</tt></b>" and note that number (should be 130). Now you
|
|
can create the device file with that info typing, "<b><tt>mknod
|
|
/dev/watchdog c 10 130</tt></b>".</dd>
|
|
<dt> <b><tt><font size="+1">bcast eth1</font></tt></b></dt>
|
|
<dd> Specifies to use a broadcast heartbeat over the eth1 interface
|
|
(replace with eth0, eth2, or whatever you use).</dd>
|
|
<dt> <b><tt><font size="+1">keepalive 2</font></tt></b></dt>
|
|
<dd> Sets the time between heartbeats to 2 seconds.</dd>
|
|
<dt> <b><tt><font size="+1">warntime 10</font></tt></b></dt>
|
|
<dd>Time in seconds before issuing a "late heartbeat" warning in the
|
|
logs.</dd>
|
|
<dt> <b><tt><font size="+1">deadtime 30</font></tt></b></dt>
|
|
<dd> Node is pronounced dead after 30 seconds.</dd>
|
|
<dt> <b><tt><font size="+1">initdead 120</font></tt></b></dt>
|
|
<dd>With some configurations, the network takes some time to start
|
|
working after a reboot. This is a separate "deadtime" to handle
|
|
that case. It should be at least twice the normal deadtime.</dd>
|
|
<dt><b><tt><font size="+1">hopfudge 1</font></tt></b></dt>
|
|
<dd> <i>Optional</i>. For ring topologies, number of hops
|
|
allowed in addition to the number of nodes in the cluster.</dd>
|
|
<dt> <b><tt><font size="+1">baud 19200</font></tt></b></dt>
|
|
<dd> Speed at which to run the serial line (bps).</dd>
|
|
<dt> <b><tt><font size="+1">udpport 694</font></tt></b></dt>
|
|
<dd> Use port number 694 for bcast or ucast communication.
|
|
This is the default, and the official IANA registered port number.</dd>
|
|
<dt> <b><tt><font size="+1">auto_failback on</font></tt></b></dt>
|
|
<dl>
|
|
<dt> <i>Required.</i> For those familiar with Tru64 Unix,
|
|
heartbeat acts as if in "favored member" mode. The master listed
|
|
in the haresources file holds all
|
|
the resources until a failover, at which time the slave takes
|
|
over. When <i>auto_failback</i> is set to <b>on</b>
|
|
once the master comes back online, it will take everything
|
|
back from the slave.
|
|
When set to <b>off</b> this option will prevent the master node from
|
|
re-acquiring cluster resources after a failover.
|
|
This option is similar to to the obsolete <i>nice_failback</i> option.
|
|
If you want to upgrade from a cluster which had <i>nice_failback</i>
|
|
set <b>off</b>, to this or later versions, special considerations apply
|
|
in order to want to avoid requiring a flash cut. Please see the
|
|
<a href="http://linux-ha.org/download/faqnstuff.html">FAQ</a> for details
|
|
on how to deal with this situation.
|
|
</dt>
|
|
</dl>
|
|
<dt> <b><tt><font size="+1">node linuxha1.linux-ha.org</font></tt></b></dt>
|
|
<dd> <i>Mandatory</i>. Hostname of machine in cluster as
|
|
described by `uname -n`.</dd>
|
|
<dt> <b><tt><font size="+1">node linuxha2.linux-ha.org</font></tt></b></dt>
|
|
<dd> <i>Mandatory</i>. Hostname of machine in cluster as
|
|
described by `uname -n`.<br>
|
|
</dd>
|
|
<dt> <b><tt><font size="+1">respawn userid cmd</font></tt></b></dt>
|
|
<dd> <i>Optional</i>: Lists a command to be spawned and
|
|
monitored. Eg: To spawn ccm daemons the following line has
|
|
to be added:</dd>
|
|
<dd> <b> respawn hacluster
|
|
/usr/lib/heartbeat/ccm</b><br>
|
|
Informs heartbeat to spawn the command with the credentials of that of
|
|
userid (hacluster, in this example) and monitors the health of the
|
|
process, respawning it if dead. For ipfail, the line would be:<br>
|
|
<span
|
|
style="font-weight: bold;">respawn hacluster /usr/lib/heartbeat/ipfail</span><span
|
|
style="font-weight: bold;"><br>
|
|
NOTE</span>: If the process dies with exit code 100, the process
|
|
is not respawned.</dd>
|
|
<dd> <br>
|
|
</dd>
|
|
<dt> <b><tt><font size="+1">ping
|
|
ping1.linux-ha.org ping2.linux-ha.org ....</font></tt></b></dt>
|
|
<dd> <i>Optional</i>: Specify ping nodes. These nodes are not
|
|
considered as cluster nodes. They are used to check network
|
|
connectivity for modules like ipfail.</dd>
|
|
<br>
|
|
<dd><br>
|
|
</dd>
|
|
<dt> <b><tt><font size="+1">ping_group
|
|
name ping1.linux-ha.org ping2.linux-ha.org ....</font></tt></b></dt>
|
|
<dd> <i>Optional</i>: Specify a group ping nodes. These are the
|
|
similar to ping nodes, but if any node in a group is available
|
|
then the group is considered available. The group name can
|
|
be any string and is used to uniquely identify the group.
|
|
Each group must appear on a separate line.
|
|
Like ping nodes the group is not considered to be a cluster node.
|
|
They appear to be the same as ping nodes and are used to check network
|
|
connectivity for modules like ipfail.</dd>
|
|
<br>
|
|
<dd><br>
|
|
</dd>
|
|
</dl>
|
|
<b><font size="+1">Configuring haresources</font></b> <br>
|
|
Once you've got your ha.cf set up, you need to configure <i>haresources</i>.
|
|
This file specifies the services for the cluster and who the default
|
|
owner is. <br>
|
|
<br>
|
|
<big><b><i><font color="#ff0000">Note: This file must be the same
|
|
on both nodes!</font></i></b></big>
|
|
<p>For our example, we'll assume the high availability services are
|
|
Apache and Samba. The IP for the cluster is mandatory, and <b>don't
|
|
configure the cluster IP outside of the haresources file!</b>.
|
|
The haresources will need one line: </p>
|
|
<pre> <b><tt>linuxha1.linux-ha.org 192.168.85.3 httpd smb</tt></b></pre>
|
|
<tt>So, this line dictates that on startup, have linuxha1 serve the IP
|
|
192.168.85.3 and start apache and samba as well.</tt> <br>
|
|
<tt>On shutdown, heartbeat will first stop smb, then apache, then give
|
|
up the IP. This assumes that the command "uname -n" spits out
|
|
"linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if it
|
|
does, use that instead!</tt>
|
|
<p><tt><i>Note</i>: httpd and smb are the name of startup scripts
|
|
for Apache and Samba, respectively. Heartbeat will look for
|
|
startup scripts of the same name in the following paths:</tt> <br>
|
|
<tt> /etc/ha.d/resource.d</tt> <br>
|
|
<tt> /etc/rc.d/init.d</tt> </p>
|
|
<p><tt>These scripts must start services via "scriptname start" and
|
|
stop them via "scriptname stop".</tt> <br>
|
|
<tt>So you can use any services as long as they conform to the above
|
|
standard.</tt> </p>
|
|
<p>Should you need to pass arguments to a custom script, the format
|
|
would be: </p>
|
|
<pre> <b>scriptname::argument</b></pre>
|
|
So, if we added a service "maid" which needed the argument "vacuum",
|
|
our haresources line would modify to the following:
|
|
<pre><b> linuxha1 192.168.85.3 httpd smb maid::vacuum</b></pre>
|
|
<p><br>
|
|
<font size="+1">This brings us </font>to some added flexibility with
|
|
the service IP address. We are actually using a shorthand notation
|
|
above. The actual line could have read (we've canned the maid): </p>
|
|
<pre><b> linuxha1 IPaddr::192.168.85.3 httpd smb</b></pre>
|
|
Where <b><i>IPaddr</i></b> is the name of our service script, taking
|
|
the argument 192.168.85.3. Sure enough, if you look in the
|
|
directory /etc/ha.d/resource.d, you will find a script called
|
|
IPaddr. This script will also allow you to manipulate the netmask,
|
|
broadcast address and base interface of this IP service. To specify a subnet with
|
|
32 addresses, you could define the service as (leaving off the IPaddr
|
|
because we can!):
|
|
<pre><b> linuxha1 192.168.85.3/27 httpd smb</b></pre>
|
|
This sets the IP service address to 192.168.85.3, the netmask to
|
|
255.255.255.224 and the broadcast address would default to 192.168.85.31
|
|
(which is the highest address on the subnet). The last parameter
|
|
you can set is the broadcast address. To override the
|
|
default and set it to 192.168.85.16, your entry would read:
|
|
<pre><b> linuxha1 192.168.85.3/27/192.168.85.16 httpd smb</b></pre>
|
|
You may be wondering whether any of the above is necessary for
|
|
you. It depends. If you've properly established a net route
|
|
(independent of heartbeat) for the service's IP address, with the
|
|
correct netmask and broadcast address, then no, it's not necessary for
|
|
you. However, this case won't fit everybody and that's why the
|
|
option's there! In addition, you may have more than one possible
|
|
interface that could be used for the service IP. Read on to see
|
|
how heartbeat treats this...
|
|
<p>Once you straighten out your haresources file, copy ha.cf and
|
|
haresources to /etc/ha.d and you're ready to start! <br>
|
|
</p>
|
|
|
|
<b><font size="+1">Configuring ipfail</font></b><br>
|
|
The ipfail plugin attempts to provide detection of network failures, and
|
|
then intelligently react, directing the cluster to failover resources as
|
|
necessary. In order to accomplish this goal, it uses ping nodes or ping
|
|
groups which work as "dumb" third parties in the cluster. Provided both HA
|
|
nodes can communicate with each other, ipfail can reliably detect when one
|
|
of their network links has become unusable, and compensate.<br>
|
|
<br>
|
|
To configure ipfail, the following steps must be performed.
|
|
<ol>
|
|
<li><b>Select good ping node candidates.</b><br>
|
|
It is essential that good strategic ping nodes be selected. The better your
|
|
choices, the stronger your HA cluster becomes. Choosing solid network devices
|
|
like switches and routers is a good idea. Do not choose either of the
|
|
members of the HA cluster. Nor should you select someone's workstation. It
|
|
is also important to select ping nodes that reflect the connectivity of your
|
|
HA nodes. If you wish to monitor the connectivity of two interfaces, it is
|
|
wise to select a ping node for each interface, that is reachable exclusively
|
|
from said interface. Consult
|
|
<a href="ipfail-diagram.pdf">ipfail-diagram.pdf</a> for a graphical
|
|
representation of this idea.
|
|
<br><br></li>
|
|
<li><b>Set auto_failback to <i>on</i> or <i>off</i>.</b><br>
|
|
ipfail will only operate if heartbeat has been configured to something
|
|
other than <i>legacy</i>
|
|
In ha.cf, set the auto_failback option to "on" or "off" like so:
|
|
<blockquote>
|
|
<tt>auto_failback on</tt>
|
|
</blockquote>
|
|
or
|
|
<blockquote>
|
|
<tt>auto_failback off</tt>
|
|
</blockquote>
|
|
</li>
|
|
<li><b>Configure your ha.cf to start ipfail.</b><br>
|
|
Add a line like the following to ha.cf (assuming your compile PREFIX is /usr)
|
|
<blockquote>
|
|
respawn hacluster /usr/lib/heartbeat/ipfail
|
|
</blockquote>
|
|
</li>
|
|
<li><b>Add the ping nodes to ha.cf.</b><br>
|
|
The ping nodes can be added to the cluster by using a line like the following:
|
|
<blockquote>
|
|
ping pnode1 pnode2 pnodeN
|
|
</blockquote>
|
|
Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping
|
|
nodes.
|
|
</li>
|
|
</ol>
|
|
Ensure that the above configuration directives are added to the ha.cf on
|
|
both members of the cluster, and that they are identical.<br>
|
|
|
|
<blockquote>
|
|
<b>NOTE:</b> You will want to check on the availability of the ping nodes
|
|
prior to using them. If you cannot ping them from both of the HA nodes,
|
|
they are useless.
|
|
</blockquote>
|
|
|
|
<h2> Selecting an Interface</h2>
|
|
One important aspect of configuring the haresources file for a machine
|
|
which has multiple ethernet interfaces is to know how heartbeat selects
|
|
which interface will wind up supporting the service addresses that are
|
|
configured in haresources. After all, no interface was specified
|
|
in the haresources file.
|
|
<p>Heartbeat decides which interface will be used by looking at the
|
|
routing table. It tries to select the lowest cost route to the IP
|
|
address to be taken over. In the case of a tie, it chooses the
|
|
first route found. For most configurations this means the default
|
|
route will be least preferred. </p>
|
|
<p>If you don't specify a netmask for the IP address in the haresources
|
|
file, the netmask associated with the selected route will be used.
|
|
Simmilarly, if an interface is not specivied, then the virtual ip address
|
|
will be added to the interface associated with the selected route.
|
|
If the broadcast address is omitted then the hightest address in
|
|
the subnet is used.<br>
|
|
</p>
|
|
<p><b><font size="+2">Configuring Authkeys</font></b> </p>
|
|
<p>The third file to configure determines your authentication
|
|
keys. There are three types of authentication methods
|
|
available: crc, md5, and sha1. "Well, which should I use?",
|
|
you ask. Since this document is called "Getting <i>Started</i>",
|
|
we'll keep it simple...... </p>
|
|
<p>If your heartbeat runs over a secure network, such as the crossover
|
|
cable in our example, you'll want to use crc. This is the cheapest
|
|
method from a resources perspective. If the network is insecure,
|
|
but you're either not very paranoid or concerned about minimizing CPU
|
|
resources, use md5. Finally, if you want the best authentication
|
|
without regard for CPU resources, use sha1. It's the hardest to
|
|
crack. </p>
|
|
<p>The format of the file is as follows: <br>
|
|
auth <number> <br>
|
|
<number> <authmethod> [<authkey>] </p>
|
|
<p>SO, for sha1, a sample /etc/ha.d/authkeys could be: <br>
|
|
auth 1 <br>
|
|
1 sha1 key-for-sha1-any-text-you-want </p>
|
|
<p>For md5, you could use the same as the above, but replace "sha1"
|
|
with "md5". </p>
|
|
<p>Finally, for crc, a sample might be: <br>
|
|
auth 2 <br>
|
|
2 crc </p>
|
|
<p> Whatever index you put after the keyword <b>auth</b> must be found
|
|
below in the keys listed in the file. If you put "auth 4", then there
|
|
must be an "4 signaturetype" line in the list below. </p>
|
|
<p>Make sure its permissions are safe, like 600. And "any text
|
|
you want" is not <i>quite</i> right. There's a limit to the number
|
|
of characters you can use. <br>
|
|
That's it! </p>
|
|
<h2> Starting and testing heartbeat</h2>
|
|
From Red Hat, or other distributions which use /etc/init.d startup
|
|
files, simply type /etc/init.d/heartbeat start on both nodes. I
|
|
would recommend starting on the system master (in our example linuxha1)
|
|
first.
|
|
<p>If you want heartbeat to run on startup, what to do will differ on
|
|
your distribution. You may need to place links to the startup
|
|
script in the appropriate init level directories, but the RPM versions
|
|
will do this for you. I have heartbeat start at its default
|
|
sequential priority (75, which means it starts after services 74 and
|
|
lower and before services with priority 76-99), end at its default
|
|
sequential priority (05), and only care about the 0(halt), 6(reboot),
|
|
3(text-only), 5(X) run levels. </p>
|
|
<p>So, if I had to do it by hand, I'd need to type in the following (as
|
|
root, of course): </p>
|
|
<p><b> cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat
|
|
K05heartbeat</b> <br>
|
|
<b> cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat
|
|
S75heartbeat</b> <br>
|
|
<b> cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat
|
|
S75heartbeat</b> <br>
|
|
<b> cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat
|
|
K05heartbeat</b> </p>
|
|
<p>The last time I ran slackware, there was no /etc/rc.d/init.d
|
|
directory (may have changed by now) and to do the same thing, I would
|
|
have placed in /etc/rc.d/rc.local: <br>
|
|
<b>/etc/ha.d/heartbeat start</b> <br>
|
|
***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat.
|
|
If you can't find /etc/rc.d/init.d with your distribution and you're
|
|
unsure of how processes start, you can use the rc.local method.
|
|
But you're on your own for shutdown, I just don't remember... </p>
|
|
<p><i>Note: </i>If you use the watchdog function, you'll need to
|
|
load its module at bootup as well. You can put the following
|
|
command at the bottom of the /etc/rc.d/rc.sysinit file: <br>
|
|
<b>/sbin/insmod softdog</b> <br>
|
|
For the rc.local method, just put the same line right above where you
|
|
start heartbeat. <br>
|
|
</p>
|
|
<p>Once you've started heartbeat, take a peek at your log file (default
|
|
is /var/log/ha-log) before testing it. If all is peachy, the
|
|
service owner's log (linuxha1 in our example) should look something like
|
|
this: <br>
|
|
heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility
|
|
found.<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to
|
|
/var/log/ha-log<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: **************************<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting
|
|
heartbeat 0.4.9f<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty
|
|
/dev/ttyS0 (19200 baud)<br>
|
|
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on
|
|
port 694 (694) interface eth1<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device:
|
|
/dev/watchdog<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.<br>
|
|
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.<br>
|
|
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org
|
|
held no resources.<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from
|
|
linuxha2.linux-ha.org.<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status
|
|
status<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down:
|
|
nice_failback: acquiring foreign resources<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for
|
|
node linuxha2.linux-ha.org.<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group:
|
|
linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Running
|
|
/etc/ha.d/resource.d/IPaddr 192.168.85.3 start<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3
|
|
netmask 255.255.255.0 broadcast 192.168.85.255<br>
|
|
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for
|
|
192.168.85.3 on eth0:0 [eth0]<br>
|
|
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0
|
|
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
|
|
heartbeat: 2003/02/10_13:53:24 info: Running
|
|
/etc/ha.d/resource.d/datadisk drbd0 start<br>
|
|
heartbeat: 2003/02/10_13:53:24 info: Running
|
|
/etc/ha.d/resource.d/datadisk drbd1 start<br>
|
|
heartbeat: 2003/02/10_13:53:25 info: Running
|
|
/etc/ha.d/resource.d/mirror start<br>
|
|
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0
|
|
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
|
|
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.<br>
|
|
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0
|
|
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
|
|
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0
|
|
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
|
|
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0
|
|
192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff<br>
|
|
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition
|
|
completed. (none)<br>
|
|
heartbeat: 2003/02/10_13:53:33 info: local resource transition
|
|
completed.<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: Status update for node
|
|
linuxha2.linux-ha.org: status up<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
|
|
status<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: Status update for node
|
|
linuxha2.linux-ha.org: status active<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: remote resource transition
|
|
completed.<br>
|
|
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status
|
|
status<br>
|
|
heartbeat: 2003/02/10_13:56:31 info: Link
|
|
linuxha2.linux-ha.org:/dev/ttyS0 up.<br>
|
|
<b>NOTE:</b> Your log may differ depending on when you started
|
|
heartbeat on linuxha2!!! I started heartbeat on the linuxha2
|
|
@13:56:30...</p>
|
|
<p> </p>
|
|
<hr width="54%">
|
|
<p><b>OK, </b>now try to ping your cluster's IP (192.168.85.3 in the
|
|
example). If this works, ssh to it and verify you're on linuxha1. <br>
|
|
Next, make sure your services are tied to the .3 address. Bring
|
|
up netscape and type in 192.168.85.3 for the URL. For Samba, try
|
|
to map the drive "\\192.168.85.3\test" assuming you set up a share
|
|
called "test". See Samba docs to get that going. As an
|
|
aside, however, you'll want to use the "netbios name" parameter to have
|
|
your Samba share listed under the cluster name and not the hostname of
|
|
your cluster member! </p>
|
|
<p><b><font color="#ff0000">NOTE</font>: </b>If you can't bring up the
|
|
service IP address and you get ha-log entries similar to this: </p>
|
|
<blockquote>
|
|
<blockquote>
|
|
<blockquote>
|
|
<blockquote><i>
|
|
SIOCSIFADDR: No such device</i> <br>
|
|
<i> SIOCSIFFLAGS: No
|
|
such device</i> <br>
|
|
<i> SIOCSIFNETMASK:
|
|
No such device</i> <br>
|
|
<i> SIOCSIFBRDADDR:
|
|
No such device</i> <br>
|
|
<i> SIOCSIFFLAGS: No
|
|
such device</i> <br>
|
|
<i> SIOCADDRT: No
|
|
such device</i></blockquote>
|
|
</blockquote>
|
|
</blockquote>
|
|
It <i>may</i> mean that you need to enable IP aliasing in your kernel
|
|
build. Check /usr/src/linux/.config for "CONFIG_IP_ALIAS=y" if you
|
|
don't have it, you'll have the line "CONFIG_IP_ALIAS is not set".
|
|
Rebuild your kernel with IP aliasing enabled.</blockquote>
|
|
If this all works, you've got availability. Now let's see if we
|
|
have High Availability :-)
|
|
<p>Take down linuxha1. Kill power, kill heartbeat, whatever you
|
|
have the stomach for, but <b>don't just yank</b> both the serial and
|
|
eth1 heartbeat cables. If you do that, you'll have services
|
|
running on both nodes and when you re-connect the heartbeat, a bit of
|
|
chaos.... <br>
|
|
Now ping the cluster IP. Approximately 5-10 seconds later it should
|
|
start responding again. Telnet again and verify you're on
|
|
linuxha2. If it happens but takes more like 30 seconds, something
|
|
is wrong. </p>
|
|
<p>If you get this far, it's probably working, but you should probably
|
|
check all your heartbeats, too. <br>
|
|
First, check your serial heartbeat. Unplug the crossover cable
|
|
from your eth1 NIC that you're using for your bcast heartbeat. Wait
|
|
about 10 seconds. <br>
|
|
Now, look at /var/log/ha-log on linuxha2 and make sure there's no line
|
|
like this: <br>
|
|
<b>1999/08/16_12:40:58 node linuxha1.linux-ha.org:
|
|
is dead</b> <br>
|
|
If you get that, your serial heartbeat isn't working and your second
|
|
node is taking over. To avoid any problems, shut down heartbeat on
|
|
the first node, then test your null modem cable. Run the above
|
|
serial tests again. </p>
|
|
<p>If your log is clean, great. Re-connect the crossover
|
|
cable. Once that's done, disconnect the serial cable, wait 10
|
|
seconds and check the linuxha2 log again. <br>
|
|
If it's clean, congrats! If not, you can check /var/log/ha-log
|
|
and /var/log/ha-debug for more clues. <br>
|
|
</p>
|
|
<p><b><font size="+1">Appendix A - Ethernet Crossover Cable Construction</font></b> </p>
|
|
<p>Your cable diagram should be as follows: </p>
|
|
<p> Connector A Connector B <br>
|
|
<br>
|
|
|
|
<table border="1" cols="2" width="30%">
|
|
<tbody>
|
|
<tr align="center">
|
|
<td>Connector A</td>
|
|
<td>Connector B</td>
|
|
</tr>
|
|
<tr>
|
|
<td align="center">Pin #</td>
|
|
<td align="center">Pin #</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>1</td>
|
|
<td>3</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>2</td>
|
|
<td>6</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>3</td>
|
|
<td>1</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>6</td>
|
|
<td>2</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>4</td>
|
|
<td>7</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>5</td>
|
|
<td>8</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>7</td>
|
|
<td>4</td>
|
|
</tr>
|
|
<tr align="center">
|
|
<td>8</td>
|
|
<td>5</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</p>
|
|
<p>Rev 1.2.0 <br>
|
|
(c) 2003 Rudy Pawul <br>
|
|
<a href="mailto:rpawul@iso-ne.com">rpawul@iso-ne.com</a> </p>
|
|
</body>
|
|
</html>
|