We have an error "reset: standard error: Invalid argument" in header of
CE install CD, so replated reset with clear which worked well for long time
during the nightly builds.
Also removed all the additional calls for logo() function.
(I have no ideas what can be the reason of broken layout MT#4697,
spent a lot of time trying to catch it, it comes time-2-time,
so currently minimised amount of any functions calls during printing logo).
I hope to catch this one day and fix it in prtoper way.
Debian's BTS has all the details in #750212 which I just reported.
lvm2 version 2.02.106-1 includes a check for present signatures,
and without "--yes" option fai-setup-storage is hanging when
executing lvcreate on a device which includes such a (swap) signature.
We don't want to use the sources.list generation which
grml-debootstrap does, as we want to control 100% of
what we use/deploy, so provide sources.list as needed.
I think we finally tracked down the problem to its roots (oh boy… ٩(͡๏̯͡๏)۶)):
The problem exists only for PRO systems, where we ship the
heartbeat-2 package, which still uses and provides the /usr/lib64
directory (that's what you get for running antique software
*cough*). As soon as this directory exists the
VBoxLinuxAdditions.run script of virtualbox-guest-additions-iso
detects and uses this directory as library path. It installs the
mount.vboxsf symlink using this library path.
The symlink /sbin/mount.vboxsf points to the non-existing
/usr/lib64/VBoxGuestAdditions/mount.vboxsf file instead of
pointing to
/usr/lib/x86_64-linux-gnu/VBoxGuestAdditions/mount.vboxsf.
It works for CE systems, because we don't ship the heartbeat-2
package there. Without heartbeat-2 /usr/lib64 doesn't exist and
VBoxLinuxAdditions.run does the right™ thing by detecting
and using /usr/lib/x86_64-linux-gnu instead.
If that's finally working as expected™ I'll be able to sleep much
better again… ☻
We're still facing:
| root@sp1:~# mount -t vboxsf -o uid=`id -u sipwise`,gid=`getent group sipwise | cut -d: -f3` /vagrant /vagrant
| mount: Protocol error
on some VMs (noticed on e.g. 3.1 PRO system with 3.13-0.bpo.1-amd64).
The /sbin/mount.vboxsf file is missing for some reason we're not
aware of yet.
Let's get rid of unneeded AllowUnauthenticated usages with apt.
It would be even better if we could ship our own key with the
official Grml-Sipwise ISO (which would technically be no problem,
but we need a way for PXE boot for builder.mgm anyway).
Always writing to /etc/apt/sources.list is stupid
because it makes tracking changes harder, so let's
try to move debian specifica into debian.list and
sipwise specifica to sipwise.list instead.
Even if our own key is already installed the package list might
not be up2date yet (e.g. for wheezy-backports):
| The following NEW packages will be installed:
| linux-compiler-gcc-4.6-x86 linux-headers-3.13-0.bpo.1-amd64
| linux-headers-3.13-0.bpo.1-common linux-kbuild-3.13
| The following packages will be upgraded:
| linux-headers-amd64
| 1 upgraded, 4 newly installed, 0 to remove and 1 not upgraded.
| Need to get 5187 kB of archives.
| After this operation, 32.9 MB of additional disk space will be used.
| WARNING: The following packages cannot be authenticated!
| linux-compiler-gcc-4.6-x86 linux-headers-3.13-0.bpo.1-common
| linux-kbuild-3.13 linux-headers-3.13-0.bpo.1-amd64 linux-headers-amd64
This is an urgency bugfix to address the failing
libssl1.0.0 1.0.1e-2+deb7u6 upgrade which is prompting
via debconf and causing our builds to fail because of that.
On plain Debian installations we don't have the ngcp-keyring
package present, so we run into:
| WARNING: The following packages cannot be authenticated!
Using debian.sipwise.com which is a CNAME record pointing to
deb.sipwise.com prevents us from getting higher apt-pinning
for official Debian packages which are mirrored on our own
server.
Related commit history: 8832daa7ea1b3eb0fd8400101b6
The "Install sip:provider CE" boot entry in the Grml-Sipwise ISO
which doesn't provide a ngcpvers kernel cmdline option doesn't
work, this change is supposed to address that.
Otherwise files like
ngcp-installer-ce_0.13.0~20140117133338.395+wheezy_all.deb
as being the result of an UNRELEASED entry in the debian/changelog
are ignored.
dpkg-scanpackages is supposed to recognize the
0.10.2+0~1368529812.299+wheezy~1.gbp1691a0 being newer
than 0.10.2 anyway.
This is a re-design of the ngcp-installer version selection.
We want to avoid having to put every new mrX.Y release/build into
deployment.sh just to point a specific release installation to a
specific installer version.
Also this turned out to be a pitfall when releasing a new
ngcp-installer package and forgetting to update deployment.sh
accordingly.
So instead lets try a different approach:
We provide only *one* specific Debian package version of each
package inside each ngcp release repository already. This means
we can just check what's inside a specific ngcp-installer
directory, like:
* http://deb.sipwise.com/spce/2.8/pool/main/n/ngcp-installer/
* http://deb.sipwise.com/sppro/2.8/pool/main/n/ngcp-installer/
* http://deb.sipwise.com/spce/mr3.2/pool/main/n/ngcp-installer/
* http://deb.sipwise.com/sppro/mr3.2/pool/main/n/ngcp-installer/
* ...
and then assume that's the installer version we want to use for
installing the according ngcp release.
This is 100% UNTESTED yet!
The mediaproxy-ng kernel module can be installed successfully on
3.0 systems for kernel 3.2.0-4-rt-amd64 and therefore is reported
as "ngcp-mediaproxy-ng. kernel package already installed,
skipping". This is wrong for our needs, so let's ignore this
"-rt-amd64" in the dkms status output.
/home/ is an absolute symlink in PRO setups, therefore can't be
resolved when accessed from outside the installed system/chroot,
resulting in error message:
| + mkdir -p /mnt/home/sipwise/.ssh/
| mkdir: cannot create directory `/mnt/home': File exists
We don't have any other architectures besides the amd64 one on
our own repositories and nowadays with MultiArch people might have
e.g. i386 enabled (via 'dpkg --add-architecture i386'), resulting
in errors like:
| W: Failed to fetch http://deb.sipwise.com/spce/3.0/dists/wheezy/main/binary-i386/Packages [^] 404 Not Found
| W: Failed to fetch http://deb.sipwise.com/wheezy-backports/dists/wheezy-backports/main/binary-i386/Packages [^] 404 Not Found
| E: Some index files failed to download. They have been ignored, or old ones used instead.
Avoid retrieval of i386 Packages files by limiting the deb entry
to the amd64 architecture.
Now having MT#4463 resolved let's also get rid of "Applying
Vagrant performance optimisations for VM" steps from
daily-build-vagrant.
Implemented as separate boot option (independent from "vagrant"
boot option) to apply it only for the VM 2XX builds (to e.g. not
execute it on plain Debian systems).
gcc is already present in ngcp systems and therefore the
recommended package libc6-dev won't be pulled in anymore, the
uname fake code depends on stdio.h etc though
Virtualbox Guest Addition installation fails inside chroot
with different kernel version of live system vs installed one.
Sadly there doesn't seem to be a way to reliably control kernel
version/header location forVBoxLinuxAdditions.run (e.g. via
KERN_DIR=... as advertised) because there are several calls
to `uname -r` in use. :(
I sometimes get a failing network setup in Vagrant's ce-trunk VM
where the default route is missing then:
| # ip r
| 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
| 192.168.88.0/24 dev eth1 proto kernel scope link src 192.168.88.234
|
| # cat /etc/network/interfaces
| # This file describes the network interfaces available on your system
| # and how to activate them. For more information, see interfaces(5).
| # The loopback network interface
| auto lo
| iface lo inet loopback
|
| # The primary network interface
| allow-hotplug eth0
| iface eth0 inet dhcp
| #VAGRANT-BEGIN
| # The contents below are automatically generated by Vagrant. Do not modify.
| auto eth1
| iface eth1 inet dhcp
| post-up route del default dev $IFACE
| #VAGRANT-END
Of course the default route comes back on a manual refresh of eth0:
| # ifdown eth0
| # ifup eth0
| # ip r
| default via 10.0.2.2 dev eth0
| 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
| 192.168.88.0/24 dev eth1 proto kernel scope link src 192.168.88.234
The culprit seems to be the allow-hotplug where the device is handled
different in the startup process:
http://screeni.s3.amazonaws.com/screenshot.2013-09-17T11:43:37.png
The ordering is important because of Vagrant's
"post-up route del default dev $IFACE". While there's a workaround
available within Vagrant:
73c8299ecd
I don't think that's the proper way. Since we're using "auto"
everywhere else in our setups anyway let's make this change,
hopefully also fixing the Vagrant network issue.
ngcp-services-pro depends on linux-headers-3.2.0-4-all,
which depends on linux-headers-3.2.0-4-all-amd64,
which depends on linux-headers-3.2.0-4-amd64.
linux-headers-3.2.0-4-amd64 is installed already in version
3.2.46-1+deb7u1 on our installation, being the current version
from security.debian.org. Because this package depends on all the
other packages being in version 3.2.46-1+deb7u1 as well this
fails due to our apt-pinning. Our apt-pinning considers
security.debian.org to be at it default pinning level 500 whereas
deb.sipwise.com is pinned to 990. This results in package
dependencies that can't be resolved by using packages from
security.debian.org.
We have to think further regarding the wanted behaviour before we
actually make that MIRROR switch. :(
On Jenkins host there was this symlink present:
| cd /srv/repository
| ln -s . debian
To avoid unnecessary duplicates during repository search
I removed the symlink, let's see whether this change is
enough or if apt-get then still needs it.
dkms in Debian/wheezy fails when directly invoked via chroot(8):
| /usr/sbin/dkms: line 1868: /dev/fd/62: No such file or directory
| /usr/sbin/dkms: line 1799: /dev/fd/62: No such file or directory
and returns with exit code 0. By invoking grml-chroot we get the
according mount binds for devfs etc.
Current daily VMs are known to fail because of switching from
raw image to vmdk on proxmox. :( Abort installation immediately
if actions during setup-storage, parted and mkswap fail.
Otherwise grabbing a screenshot after deployment finished
will include to much noise and not the useful data...
From: Michael Prokop <mprokop@sipwise.com>
Make it obvious which platform releases use which underlying Debian release,
once we have wheezy support the default will be changed and we don't want
to have wrong assignments for existing releases then.
From: Michael Prokop <mprokop@sipwise.com>
The '5%-10G' sadly doesn't work as expected and 5% of the
*disk* size is a bad default, so let's see whether XX% of
RAM size makes a better choice. We might change the actual
percentage value but we need some testing anyway....
From: Michael Prokop <mprokop@sipwise.com>
When using the "ngcplvm" boot option then deployment automatically
sets up LVM (volume group "ngcp" with volumes "root" and "swap"),
otherwise it works just as it used to do so far (AKA no LVM).
Needs performance checks (Kirill volunteered to do that), once
we're happy with the results we can make it the new default
deployment method as planned.
Testing: https://bugtracker.sipwise.com/view.php?id=2465
From: Michael Prokop <mprokop@sipwise.com>
This works on a freshly deployed PRO system:
| ngcpcfg init-mgmt $MGMT_IP
Now let's try to include this step also during deployment.
From: Michael Prokop <mprokop@sipwise.com>
Boot option ngcpcpip2 might differ from default, so we can't hardcode
$DEFAULT_IP2 when deploying sp1. Instead try to autoconfigure it
when we're deploying sp2, maybe we don't need it while deploying
sp1. If this doesn't work then we'd have to provision the interface
using $IP2 in the meanwhile.
While at it try to pull changes from shared storage on sp1 when
deploying sp2 so both nodes are fully up2date after fresh
installation.
From: Michael Prokop <mprokop@sipwise.com>
This should give us a full-fledged network.yml as eth0 of
the 2nd system is the only missing NIC during deployment.
Testing!
From: Michael Prokop <mprokop@sipwise.com>
this *might* fail due to ssh key setup, but we have to avoid
merge conflicts and therefore need to get ngcpcfg pull working...
From: Michael Prokop <mprokop@sipwise.com>
With the way it is the second node gets the SSH keys from the first
node's glusterfs share. Let's try to set up SSH before actually
running ngcp-installer, then in ngcp-installer skip the SSH key
setup if keys already exist (will follow in upcoming svn commit
for ngcp-installer).
From: Michael Prokop <mprokop@sipwise.com>
As supported by ngcpcfg-api as of svn r11744.
If this works as expected then the only:
ssh-keyscan $MANAGEMENT_IP >> ~/.ssh/known_hosts
should be needed for automatic SSH login between nodes in carrier
environment. A working version of "ngcpcfg init_mgmt
$MANAGEMENT_IP" should be quite close then.
Let's see what's the opinion of Jenkins + our autodeploy jobs...
From: Michael Prokop <mprokop@sipwise.com>
Let's see whether this enough to close issue #2367.
Open question: do we want to have locales-all also on systems
we did not deploy on our own (AKA CE systems installed using
ngcp-installer)? Iff so then we'd have to add it as dependency
somewhere.
Testing: https://bugtracker.sipwise.com/view.php?id=2367
From: Michael Prokop <mprokop@sipwise.com>
This reverts commit 2c859c9940caa2599b6eb71f7a1d8832064a6133.
This doesn't fix the issue (as you might guess I know the
real source of the problem now, ha!)
From: Michael Prokop <mprokop@sipwise.com>
network.yml looks good on sp1 now. Something fishy is still
causing broken config files though, a manual 'ngcpcfg build'
fixes that, why the 'ngcpcfg build' in the deployment script
doesn't solve that yet needs to be investigated...
Sadly bootstrapping sp2 is still quite tricky. We might even
have to set initial network.yml on PRO systems to what we ship
with CE systems so we don't run into merge conflicts. Hmpf.
From: Michael Prokop <mprokop@sipwise.com>
Seems to work, at least for sp1 hosts in PRO installs. Let's see
whether PRO deployments work again with new templates,
network.yml and ngcp-network being in place now.
Happy PRO deploying dear Jenkins while I'm switching office->home. :)
From: Michael Prokop <mprokop@sipwise.com>
On Dell PowerEdge R310 servers we have the PERC H700, on the next
generation model R320 we have PERC H710. Don't be too picky
about the whitelist and just check for "PERC" in the model
information and "DELL" as vendor.
From: Michael Prokop <mprokop@sipwise.com>
acpi-support-base in squeeze depends on console-tools (or kbd)[1],
therefore we can't remove console-tools if we want to get
acpi-support-base installed by default
[1] http://packages.debian.org/squeeze/acpi-support-base
From: Michael Prokop <mprokop@sipwise.com>
VMs on Proxmox 2.1-14 (pve-qemu-kvm 1.1-8, 2.6.32-14-pve) have an
annoying problem with our installations, where the deployed
system freezes as soon as some action takes place on the network
stack. For example invoking "lsof -i -n" on the rebooted system
causes such a freeze.
The only solution to work around this issue seems to be stopping
and then restarting the VM. As we need to handle this
automatically we need some kind of API to retrieve system state.
By writing the current deployment state into some file and
providing it through http://$DEPLOYMENT_SYSTEM:4242/status this
should get us there.
From: Michael Prokop <mprokop@sipwise.com>
The udev rules need the real MAC addresses and we can't mess
with them before the rules are in place
Thanks Richard for helping with debugging
From: Michael Prokop <mprokop@sipwise.com>
logit() isn't available inside the subshell in the chroot so don't
use it there, we already have it at the according place...
From: Michael Prokop <mprokop@sipwise.com>
When doing automated pro deployments we also have to stop monit
to be able to unmounted the disk in a clean way...
From: Michael Prokop <mprokop@sipwise.com>
Fix what has been broken in r10823: the "Dump completed on"
is present iff "--skip-comments" option is not used.
From: Michael Prokop <mprokop@sipwise.com>
We write data into the DB in revision scripts, like e.g.:
| $SVN/dev/ngcp/db-schema/trunk$ cat db_scripts/diff/9675.up
| INSERT INTO provisioning.voip_preferences (attribute, type, dom_pref, usr_pref, peer_pref, data_type, max_occur, description) VALUES('mobile_push_enable', 1, 1, 1, 0, 'boolean', 1, 'Send inbound call to Mobile Push server when called subscriber is not registered. This can not be used together with CFNA as call will be then simply forwarded.');
As suggested by Daniel let's also put data into the DB dumps.
From: Michael Prokop <mprokop@sipwise.com>
Might become useful for low-mem VMs once, implemented while
figuring out installing an etch system for Andi :)
From: Michael Prokop <mprokop@sipwise.com>
Starting with recent trunk versions we no longer ship
kamailio init script but just our kamailio-lb + kamailio-proxy.
From: Michael Prokop <mprokop@sipwise.com>
Inside the pool there might be versions which have been released inside a
maintenance branch but which don't cover recent changes in trunk.
This caused current trunk installations to fail with:
| mv: cannot stat `/etc/ngcp-config/config.yml': No such file or directory
So get rid of every file without "svn" in the filename, so e.g.:
ngcp-installer-ce_0.7.2+0~1339173026.svn9034.165_all.deb (trunk version)
is preferred over:
ngcp-installer-ce_0.7.3_all.deb (release into 2.5 repository)
When we're installing trunk we don't care about released versions,
so get rid of them.
Tricky.
From: Michael Prokop <mprokop@sipwise.com>
This avoids the svn commits with the only change being:
| --- Dump completed on 2012-06-06 6:09:22
| +-- Dump completed on 2012-06-06 16:35:16
From: Michael Prokop <mprokop@sipwise.com>
The:
| ERROR 2002 (HY000): Can't connect to local MySQL server through socket ...
gets displayed several times, though it's not an issue.
Seems to be some async foo or race condition I couldn't identify
yet, until it's resolved display a message to the user so nobody
thinks that's a real error.
From: Michael Prokop <mprokop@sipwise.com>
Hopefully this solves our failing daily builds where we get:
| 06:09:57 curl: (28) Operation timed out after 30000 milliseconds with 0 bytes received
like in https://jenkins.mgm.sipwise.com/job/vmbuilder-ce/192/
From: Michael Prokop <mprokop@sipwise.com>
By default we use eth1 for the crosslink / interconnect device,
but there are VMs which use eth0 - so support this in the deployment
process so we get a happy heartbeat service...
From: Michael Prokop <mprokop@sipwise.com>
Pro looks different, at least for the 2nd node,
we might have to investigate on that, but for the time
being just stay at CE to keep svn logs sane...
From: Michael Prokop <mprokop@sipwise.com>
I don't see any reason why we shouldn't have openssh-server on
each system we install, if manually testing installations of
ngcp-installer on plain Debian systems with the deployment ISO
it's annoying to not be able to login after initial deployment,
so let's change this...
From: Michael Prokop <mprokop@sipwise.com>