deployment-iso

Commit Graph

Author	SHA1	Message	Date
Michael Prokop	236cb2d1a7	MT#58926 Vagrant: ensure to have libxmu6 available We get the following error message in /var/log/vboxadd-install.log, /var/log/deployment-installer-debug.log, /var/log/daemon.log + /var/log/syslog: \| /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory This is caused by missing libxmu6: \| [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help \| /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory \| [sipwise-lab-trunk] sipwise@spce:~$ sudo apt install libxmu6 \| Reading package lists... Done \| Building dependency tree... Done \| Reading state information... Done \| The following NEW packages will be installed: \| libxmu6 \| 0 upgraded, 1 newly installed, 0 to remove and 83 not upgraded. \| Need to get 60.1 kB of archives. \| After this operation, 143 kB of additional disk space will be used. \| Get:1 https://debian.sipwise.com/debian bookworm/main amd64 libxmu6 amd64 2:1.1.3-3 [60.1 kB] \| Fetched 60.1 kB in 0s (199 kB/s) \| [...] \| [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help \| Oracle VM VirtualBox VBoxClient 7.0.6 \| Copyright (C) 2005-2023 Oracle and/or its affiliates \| \| Usage: VBoxClient --clipboard\|--draganddrop\|--checkhostversion\|--seamless\|--vmsvga\|--vmsvga-session \| [-d\|--nodaemon] \| \| Options: \| [...] It looks like lack of libxmu6 doesn't cause any actual problems for our use case (we don't use X.org at all), though given that libxmu6 is a small library package, let's try to get it working as expected and avoid the alarming errors on the logs. Thanks Guillem Jover for spotting and reporting Change-Id: I65f3dd496a4026f04fd9944fd7cc43d6abbdf336	1 year ago
Michael Prokop	8c3ab6b241	MT#57559 Always include zstd when bootstrapping systems During initial deployment of a system, we get warnings about lack of zstd: \| Setting up linux-image-6.1.0-13-amd64 (6.1.55-1) ... \| I: /vmlinuz.old is now a symlink to boot/vmlinuz-6.1.0-13-amd64 \| I: /initrd.img.old is now a symlink to boot/initrd.img-6.1.0-13-amd64 \| I: /vmlinuz is now a symlink to boot/vmlinuz-6.1.0-13-amd64 \| I: /initrd.img is now a symlink to boot/initrd.img-6.1.0-13-amd64 \| /etc/kernel/postinst.d/initramfs-tools: \| update-initramfs: Generating /boot/initrd.img-6.1.0-13-amd64 \| W: No zstd in /usr/bin:/sbin:/bin, using gzip \| [...] The initramfs generation and update overall runs four times within the initial bootstrapping of a system (we'll try to do something about this, but this is outside the scope of this). As of initramfs-tools v0.141, initramfs-tools uses zstd as default compression for initramfs. Version 0.142 is shipped with Debian/bookworm, and therefore it makes sense to have it available upfront. Note that also the initrd generation is faster with zstd (~10sec for zstd vs. ~13sec for gzip) and also the resulting initrd is smaller (~33MB for zstd vs ~39MB for gzip). By making sure that zstd is available straight from the very beginning and before ngcp-installer pulls it in later, we can avoid the warning message but also save >10 seconds of install time. Given that zstd is available even in Debian oldoldstable, let's install it unconditionally in all our systems. Thanks: Volodymyr Fedorov for reporting Change-Id: I56674c3c213f7c7a6e6cbce3c8e2e00a4cfbdbd4	1 year ago
Guillem Jover	9cceb8d655	MT#58356 ntp: Use ntpsec.service instead of ntp.service Even though the ntpsec.service contains an Alias for ntp.service, that does not work for us when the service has not yet been installed, so the first run will fail. Use the actual name to avoid this issue. Change-Id: I8f0ee3b38390a7e58c3bbee65fd96bfd4b717dfa	2 years ago
Michael Prokop	793a93bc43	MT#57453 vagrant_configuration: remove fake systemd presence after execution Let's restore system state of /run/systemd/system for VBoxLinuxAdditions, to avoid any unexpected side effects. Followup for git rev `8601193` Change-Id: I632c7d60ebb627c3a80d4c1f9b264d6d0a13b4f1	2 years ago
Michael Prokop	561303359e	MT#57453 Use tty1 for stdin when running under grml-autoconfig service Recent Grml ISOs, including our Grml-Sipwise ISO (v2023-06-01), include grml-autoconfig v0.20.3 which execute the grml-autoconfig service under `StandardInput=null`. This is necessary to not conflict with tty usage, like used with serial console. See `1e268ffe4f` Now that we run with /dev/null for stdin, we can't interact with the user, so let's try to detect when running from within grml-autoconfig's systemd unit, and if so assume that we're executing on /dev/tty1 and use/reopen that for stdin. Change-Id: Id55283c7f862487a6ef8acb8ab01f67a05bd8dd7	2 years ago
Michael Prokop	8601193128	MT#57453 vagrant_configuration: fake systemd presence As of git rev `6c960afee4` we're using the virtualbox-guest-additions-iso from bookworm. Previous versions of VBoxGuestAdditions had a simple test to check for present of systemd, quoting from /opt/VBoxGuestAdditions-6.1.22/routines.sh: \| use_systemd() \| { \| test ! -f /sbin/init \|\| test -L /sbin/init \| } Now in more recent versions of VBoxGuestAdditions[1], the systemd check was modified, quoting from /opt/VBoxGuestAdditions-7.0.6/routines.sh: \| use_systemd() \| { \| # First condition is what halfway recent systemd uses itself, and the \| # other two checks should cover everything back to v1. \| test -e /run/systemd/system \|\| test -e /sys/fs/cgroup/systemd \|\| test -e /cgroup/systemd \| } So if we're running inside a chroot as with our deployment.sh, it looks like a non-systemd system for VBoxGuestAdditions's installer, and we end up with installation and presence of /etc/init.d/vboxadd, leading to: \| root@spce:~# ls -lah /run/systemd/generator.late/ \| total 4.0K \| drwxr-xr-x 4 root root 100 Jul 18 00:20 . \| drwxr-xr-x 23 root root 580 Jul 18 00:20 .. \| drwxr-xr-x 2 root root 60 Jul 18 00:20 graphical.target.wants \| drwxr-xr-x 2 root root 60 Jul 18 00:20 multi-user.target.wants \| -rw-r--r-- 1 root root 537 Jul 18 00:20 vboxadd.service \| \| root@spce:~# systemctl cat vboxadd.service \| # /run/systemd/generator.late/vboxadd.service \| # Automatically generated by systemd-sysv-generator \| \| [Unit] \| Documentation=man:systemd-sysv-generator(8) \| SourcePath=/etc/init.d/vboxadd \| Description=LSB: VirtualBox Linux Additions kernel modules \| Before=multi-user.target \| Before=multi-user.target \| Before=multi-user.target \| Before=graphical.target \| Before=display-manager.service \| \| [Service] \| Type=forking \| Restart=no \| TimeoutSec=5min \| IgnoreSIGPIPE=no \| KillMode=process \| GuessMainPID=no \| RemainAfterExit=yes \| SuccessExitStatus=5 6 \| ExecStart=/etc/init.d/vboxadd start \| ExecStop=/etc/init.d/vboxadd stop We don't expect any init scripts to be present, as all our services must have systemd unit files. Therefore we check for absence of systemd's /run/systemd/generator.late in our system-tests, which started to fail with the upgrade to VBoxGuestAdditions-v7.0.6 due to the systemd presence detection mentioned above. Let's fake presence of systemd before invoking VBoxGuestAdditions's installer, to avoid ending up with unexpected vbox* init scripts. [1] See svn rev 92682: https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Installer/linux/routines.sh?rev=92682 https://www.virtualbox.org/changeset?old=92681&old_path=vbox%2Ftrunk%2Fsrc%2FVBox%2FInstaller%2Flinux%2Froutines.sh&new=92682&new_path=vbox%2Ftrunk%2Fsrc%2FVBox%2FInstaller%2Flinux%2Froutines.sh Change-Id: Ifd11460e3a8fd4f4c1269453a9b8376065861b8e	2 years ago
Victor Seva	6c960afee4	TT#104221 Use bookworm repos in ensure_packages_installed appropriately Support bookworm option in DEBIAN_RELEASE selection. We have support for it already. Use bookworm as fallback since nowadays we jumped to it. Change-Id: I118c1b5cf81fe57394495b5f745fc81032406c78	2 years ago
Michael Prokop	37163532ee	MT#56773 Use bullseye puppetlabs repository for bookworm To be able to upgrade our internal systems to Debian/bookworm we need to have puppet packages available. Upstream still doesn't provide any Debian packages (see https://tickets.puppetlabs.com/browse/PA-4995), though their AIO (All In One) packages for Debian/bullseye seem to be working on Debian/bookworm as well (at least for puppet-agent). So until we either migrated to puppet-agent as present in Debian/bookworm or upstream provides according AIO packages, let's use the puppet-agent packages we already use for our Debian/bullseye systems. Change-Id: I2211ffd79f70a2a79873e737b0b512bfb7492328	2 years ago
Michael Prokop	0fedba6144	MT#57643 Ensure /var/lib/dpkg/available exists on Debian releases <=buster Since version 1.20.0, dpkg no longer creates /var/lib/dpkg/available (see #647911). Now that we upgraded our Grml-Sipwise deployment system to bookworm, we have dpkg v1.21.22 on our live system, and mmdebstrap relies on dpkg of the host system for execution. But on Debian releases until and including buster, dpkg fails to operate with e.g. `dpkg --set-selections`, if /var/lib/dpkg/available doesn't exist: \| The following NEW packages will be installed: \| nullmailer \| [...] \| debconf: delaying package configuration, since apt-utils is not installed \| dpkg: error: failed to open package info file '/var/lib/dpkg/available' for reading: No such file or directory We could also switch from mmdebstrap to debootstrap for deploying Debian releases <=buster, but this would be slower and we use mmdebstrap since quite some time for everything. So instead let's create /var/lib/dpkg/available after bootstrapping the system. Reported towards mmdebstrap as #1037946. Change-Id: I0a87ca255d5eb7144a9c093051c0a6a3114a3c0b	2 years ago
Michael Prokop	eccdc586ae	MT#57644 puppet/git: allow ssh-rsa pubkey usage Now that our deployment system is based on Debian/bookworm, but our gerrit/git server still runs on Debian/bullseye, we run into the OpenSSH RSA issue (RSA signatures using the SHA-1 hash algorithm got disabled by default), see https://michael-prokop.at/blog/2023/06/11/what-to-expect-from-debian-bookworm-newinbookworm/ and https://www.jhanley.com/blog/ssh-signature-algorithm-ssh-rsa-error/ We need to enable ssh-rsa usage, otherwise deployment fails with: \| Warning: Permanently added '[gerrit.mgm.sipwise.com]:29418' (ED25519) to the list of known hosts. \| sign_and_send_pubkey: no mutual signature supported \| puppet-r10k@gerrit.mgm.sipwise.com: Permission denied (publickey). \| fatal: Could not read from remote repository. Change-Id: I5894170dab033d52a2612beea7b6f27ab06cc586	2 years ago
Michael Prokop	8cfb8c8392	MT#57630 Check online connectivity to work around Intel E810 / ice issue Deploying the Debian/bookworm based NGCP system fails on a Lenovo sr250 v2 node with an Intel E810 network card: \| # lshw -c net -businfo \| Bus info Device Class Description \| ======================================================= \| pci@0000:01:00.0 eth0 network Ethernet Controller E810-XXV for SFP \| pci@0000:01:00.1 eth1 network Ethernet Controller E810-XXV for SFP \| # lshw -c net \| -network:0 \| description: Ethernet interface \| product: Ethernet Controller E810-XXV for SFP \| vendor: Intel Corporation \| physical id: 0 \| bus info: pci@0000:01:00.0 \| logical name: eth0 \| version: 02 \| serial: [...] \| size: 10Gbit/s \| capacity: 25Gbit/s \| width: 64 bits \| clock: 33MHz \| capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd \| configuration: autonegotiation=off broadcast=yes driver=ice driverversion=1.11.14 duplex=full firmware=2.25 0x80007027 1.2934.0 ip=192.168.90.51 latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s \| resources: iomemory:400-3ff iomemory:400-3ff irq:16 memory:4002000000-4003ffffff memory:4006010000-400601ffff memory:a1d00000-a1dfffff memory:4005000000-4005ffffff memory:4006220000-400641ffff We set up the /etc/network/interfaces file by invoking Grml's netcardconfig script in automated mode, like: NET_DEV=eth0 METHOD=static IPADDR=192.168.90.51 NETMASK=255.255.255.248 GATEWAY=192.168.90.49 /usr/sbin/netcardconfig The resulting /etc/network/interfaces gets used as base for usage inside the NGCP chroot/target system. netcardconfig shuts down the network interface (eth0 in the example above) via ifdown, then sleeps for 3 seconds and re-enables the interface (via ifup) with the new configuration. This used to work fine so far, but with the Intel e810 network card and kernel version 6.1.0-9-amd64 from Debian/bookworm we see a link failure and it takes ~10 seconds until the network device is up and running again. The following vagrant_configuration() execution from deployment.sh then fails: \| +11:41:01 (netscript.grml:1022): vagrant_configuration(): wget -O /var/tmp/id_rsa_sipwise.pub http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| --2023-06-11 11:41:01-- http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| Resolving builder.mgm.sipwise.com (builder.mgm.sipwise.com)... failed: Name or service not known. \| wget: unable to resolve host address 'builder.mgm.sipwise.com' However, when we retry it again just a bit later, the network works fine again. During investigation we identified that the network card flips the port, quoting the related log from the connected Cisco nexus 5020 switch (with fast stp learning mode): \| nexus5k %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/33 is down (Link failure) It seems to be related to some autonegotiation problem, as when we execute `ethtool -A eth0 rx on tx on` (no matter whether with `on` or `off`), we see: \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: Autoneg did not complete so changing settings may not result in an actual change. \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: NIC Link is Down \| [Tue Jun 13 08:51:45 2023] ice 0000:01:00.0 eth0: NIC Link is up 10 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: NONE, Autoneg Advertised: On, Autoneg Negotiated: False, Flow Control: Rx/Tx FTR: \| root@sp1 ~ # ethtool -A eth0 autoneg off \| netlink error: Operation not supported \| 76 root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # ethtool -A eth0 autoneg on \| root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # dmesg -T \| tail -1 \| [Tue Jun 13 08:53:26 2023] ice 0000:01:00.0 eth0: To change autoneg please use: ethtool -s <dev> autoneg <on\|off> \| root@sp1 ~ # ethtool -s eth0 autoneg off \| root@sp1 ~ # ethtool -s eth0 autoneg on \| netlink error: link settings update failed \| netlink error: Operation not supported \| 75 root@sp1 ~ # As a workaround, at least until we have a better fix/solution, we try to reach the default gateway (or fall back to the repository host if gateway couldn't be identified) via ICMP/ping, and once that works we we continue as usual. But even if that should fail we continue execution, to minimize behavior change but have a workaround for this specific situation available. FTR, broken system: \| root@sp1 ~ # ethtool -i eth0 \| driver: ice \| version: 6.1.0-9-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Whereas with kernel 5.10.0-23-amd64 from Debian/bullseye we don't seem to see that behavior: \| root@sp1:~# ethtool -i neth0 \| driver: ice \| version: 5.10.0-23-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Also using latest available ice v1.11.14 (from https://sourceforge.net/projects/e1000/files/ice%20stable/1.11.14/) on Kernel version 6.1.0-9-amd64 doesn't bring any change: \| root@sp1 ~ # modinfo ice \| filename: /lib/modules/6.1.0-9-amd64/updates/drivers/net/ethernet/intel/ice/ice.ko \| firmware: intel/ice/ddp/ice.pkg \| version: 1.11.14 \| license: GPL v2 \| description: Intel(R) Ethernet Connection E800 Series Linux Driver \| author: Intel Corporation, <linux.nics@intel.com> \| srcversion: 818E9C817731C98A25470C0 \| alias: pci:v00008086d00001888svsdbcsci \| [...] \| alias: pci:v00008086d00001591svsdbcsci* \| depends: ptp \| retpoline: Y \| name: ice \| vermagic: 6.1.0-9-amd64 SMP preempt mod_unload modversions \| parm: debug:netif level (0=none,...,16=all) (int) \| parm: fwlog_level:FW event level to log. All levels <= to the specified value are enabled. Values: 0=none, 1=error, 2=warning, 3=normal, 4=verbose. Invalid values: >=5 \| (ushort) \| parm: fwlog_events:FW events to log (32-bit mask) \| (ulong) \| root@sp1 ~ # ethtool -i eth0 \| head -3 \| driver: ice \| version: 1.11.14 \| firmware-version: 2.25 0x80007027 1.2934.0 \| root@sp1 ~ # Change-Id: Ieafe648be4e06ed0d936611ebaf8ee54266b6f3c	2 years ago
Michael Prokop	f4da3e094e	MT#57049 Ensure SW-RAID device is inactive before re-reading partition table Re-reading of disks fails if the mdadm SW-RAID device is still active: \| root@sp1 ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| md0 : active raid1 sdb3[1] sda3[0] \| 468218880 blocks super 1.2 [2/2] [UU] \| [========>............] resync = 42.2% (197855168/468218880) finish=22.4min speed=200756K/sec \| bitmap: 3/4 pages [12KB], 65536KB chunk \| \| unused devices: <none> \| root@sp1 ~ # blockdev --rereadpt /dev/sdb \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # blockdev --rereadpt /dev/sda \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # Only if we stop the mdadm SW-RAID device, then we can re-read the partition table: \| root@sp1 ~ # mdadm --stop /dev/md0 \| mdadm: stopped /dev/md0 \| root@sp1 ~ # blockdev --rereadpt /dev/sda \| root@sp1 ~ # This behavior isn't new and unrelated to Debian/bookworm but was spotted while debugging an unrelated issue. FTR: we re-read the partition table (via `blockdev --rereadpt`) to ensure that /etc/fstab of the live system is up2date and matches the current system state. While this isn't stricly needed, we preserve existing behavior and also try to avoid a hard "cut" of a possibly ongoing SW-RAID sync. Change-Id: I735b00423e6efa932f74b78a38ed023576e5d306	2 years ago
Michael Prokop	2ad306c465	MT#57556 Prompt for reboot/halt only in interactive mode With our newer Grml-Sipwise ISO (v2023-06-01) being based on Debian/bookworm and recent Grml packages, our automated deployment suddenly started to fail for us: \| +04:28:12 (netscript.grml:2453): echo 'Successfully finished deployment process [Fri Jun 2 04:28:12 UTC 2023 - running 576 seconds]' \| ++04:28:12 (netscript.grml:2455): get_deploy_status \| ++04:28:12 (netscript.grml:95): get_deploy_status(): '[' -r /srv/deployment//status ']' \| ++04:28:12 (netscript.grml:96): get_deploy_status(): cat /srv/deployment//status \| Successfully finished deployment process [Fri Jun 2 04:28:12 UTC 2023 - running 576 seconds] \| +04:28:12 (netscript.grml:2455): '[' copylogfiles '!=' error ']' \| +04:28:12 (netscript.grml:2456): set_deploy_status finished \| +04:28:12 (netscript.grml:103): set_deploy_status(): '[' -n finished ']' \| +04:28:12 (netscript.grml:104): set_deploy_status(): echo finished \| +04:28:12 (netscript.grml:2459): false \| +04:28:12 (netscript.grml:2463): status_wait \| +04:28:12 (netscript.grml:329): status_wait(): [[ -n 0 ]] \| +04:28:12 (netscript.grml:329): status_wait(): [[ 0 != 0 ]] \| +04:28:12 (netscript.grml:2466): false \| +04:28:12 (netscript.grml:2471): false \| +04:28:12 (netscript.grml:2476): echo 'Do you want to [r]eboot or [h]alt the system now? (Press any other key to cancel.)' \| Do you want to [r]eboot or [h]alt the system now? (Press any other key to cancel.) \| +04:28:12 (netscript.grml:2477): unset a \| +04:28:12 (netscript.grml:2478): read -r a \| ++04:28:12 (netscript.grml:2478): wait_exit \| ++04:28:12 (netscript.grml:339): wait_exit(): local e_code=1 \| ++04:28:12 (netscript.grml:340): wait_exit(): [[ 1 -ne 0 ]] \| ++04:28:12 (netscript.grml:341): wait_exit(): set_deploy_status error \| ++04:28:12 (netscript.grml:103): set_deploy_status(): '[' -n error ']' \| ++04:28:12 (netscript.grml:104): set_deploy_status(): echo error \| ++04:28:12 (netscript.grml:343): wait_exit(): trap '' 1 2 3 6 15 ERR EXIT \| ++04:28:12 (netscript.grml:344): wait_exit(): status_wait \| ++04:28:12 (netscript.grml:329): status_wait(): [[ -n 0 ]] \| ++04:28:12 (netscript.grml:329): status_wait(): [[ 0 != 0 ]] \| ++04:28:12 (netscript.grml:345): wait_exit(): exit 1 As of grml-autoconfig v0.20.3 and newer, the grml-autoconfig systemd service that invokes the deployment netscript uses `StandardInput=null` instead of `StandardInput=tty` (see https://github.com/grml/grml/issues/176). Thanks to this, a logic error in our deployment script showed up. We exit the script in interactive mode, though only afterwards prompting for reboot/halt with `read -r a` - which of course fails if stdin is missing. As a result, we end up in our signal handler `trap 'wait_exit;' 1 2 3 6 15 ERR EXIT` and then fail the deployment. So instead prompt for "Do you want to [r]eboot or [h]alt ..." only in interactive mode, and while at it drop the "if "$INTERACTIVE" ; then exit 0 ; fi" so the prompt is actually presented to the user. Change-Id: Ia89beaf3c446f3701cc30ab21cfdff7b5808a6d3	2 years ago
Michael Prokop	98d11bfc28	MT#57280 Run deployment status server under systemd Manual execution of python's http.server has multiple drawbacks, like no proper logging and no service tracking/restart options, but most notably the deployment status server no longer runs when our deployment script fails. While /srv/deployment/status then still might contain "error", no one is serving that information on port 4242 any longer[1], and our daily-build-install-vm Jenkins job might then report: \| VM '192.168.209.162' current state is '' - retrying up to another 1646 times, sleeping for a second \| VM '192.168.209.162' current state is '' - retrying up to another 1645 times, sleeping for a second \| [...] It then runss for ~1/2 hour without doing anything useful, until the Jenkins job itself gives up. By running our deployment status server under systemd, we keep the service alive also when the deployment script terminates. In case of errors we get immediate feedback: \| VM '192.168.209.162' current state is 'puppet' - retrying up to another 1648 times, sleeping for a second \| VM '192.168.209.162' current state is 'puppet' - retrying up to another 1647 times, sleeping for a second \| VM '192.168.209.162' current state is 'error' - retrying up to another 1646 times, sleeping for a second \| + '[' error '!=' finished ']' \| + echo 'Failed to install Proxom VM '\''162'\'' (IP '\''192.168.209.162'\'')' [1] For our NGCP based installations we use the ngcpstatus boot option, where its status_wait trap kicks in and avoids premature exit of deployment status server. But e.g. our non-NGCP systems don't use that boot option and with this change we could get rid of the status_wait overall. Change-Id: Ibaa799358caedf31c64c37b48e3c5e889808086a	2 years ago
Michael Prokop	e6819fe674	MT#55944 Use ngcp-initialize-udev-rules-net to deploy 70-persistent-net.rules Use system-tools' ngcp-initialize-udev-rules-net script to deploy the /etc/udev/rules.d/70-persistent-net.rules, no need to maintain code at multiple places. Change-Id: I81925262a8c687aa9976cbc1113568989fa53281	2 years ago
Michael Prokop	ae7db13232	MT#55944 Fix networking for plain Debian systems When building our Debian boxes for buster, bullseye + bookworm (via daily-build-matrix-debian-boxes Jenkins job), we get broken networking, so e.g. `vagrant up debian-bookworm doesn't work. This is caused by /etc/network/interfaces (using e.g. "neth0", being our naming schema which we use in NGCP, as adjusted by the deployment script) not matching the actual system network devices (like enp0s3). TL;DR: no behavior change for NGCP systems, only when building non-NGCP systems then enable net.ifnames=0 (via set_custom_grub_boot_options), but do not generate /etc/udev/rules.d/70-persistent-net.rules (via invoke generate_udev_network_rules) nor rename eth->neth in /etc/network/interfaces. More verbose version: * rename the "eth" networking interfaces into "neth" in /etc/network/interfaces only when running in ngcp-installer mode (this is the behavior we rely on in NGCP, but it doesn't matter for plain Debian systems) * generate /etc/udev/rules.d/70-persistent-net.rules only when running in ngcp-installer mode. While our jenkins-configs.git's jobs/daily-build/scripts/vm_clean-fs.sh removes the file anyways (for the VM use case), between the initial deployment run and the next reboot the configuration inside the PVE VM still applies, so we end up with an existing /etc/udev/rules.d/70-persistent-net.rules, referring to neth0, while our /etc/network/interfaces configures eth0 instead. * when not running in ngcp-installer mode, enable net.ifnames=0 usage in GRUB to disable persistent network interface naming. FTR, this change is not needed for NGCP, as on NGCP systems we use /etc/udev/rules.d/70-persistent-net.rules, generated by ngcp-system-tools' ngcp-initialize-udev-rules-net script also in VM use case This is a fixup for a change in git commit `a50903a30c` (see also commit message of git commit `ab62171`), that should have been adjusted for ngcp-installer-only mode instead. Change-Id: I6d0021dbdc2c1587127f0e115c6ff9844460a761	2 years ago
Michael Prokop	6412814e6b	MT#55949 Ensure we have proper date/time configuration If the date of the running system isn't appropriate enough, then apt runs might fail with somehint like: \| E: Release file for https://deb/sipwise/com/spce/mr10.5.2/dists/bullseye/InRelease is not valid yet (invalid for another 6h 19min 2s) So let's try to sync date/time of the system via NTP. Given that chrony is a small (only 650 kB disk space) and secure replacement for ntp, let's ship chrony with the Grml deployment ISO (and fall back to ntp usage in deployment script if chrony shouldn't be available). Also, if the system is configured to read the RTC time in the local time zone, this is known as another source of problems, so let's make sure to use the RTC in UTC. Change-Id: I747665d1cee3b6f835c62812157d0203bcfa96e2	2 years ago
Michael Prokop	245c7ef702	MT#55861 Update Grml ISO + update to Debian/bookworm For deploying Debian/bookworm (see MT#55524), we'd like to have an updated Grml ISO. With such a Debian/bookworm based live system, we can still deploy older target systems (like Debian/bullseye). Relevant changes: 1) Ad jo as new build-dependency, to generate build information in conf/buildinfo.json (new dependency of grml-live) 2) Always include ca-certificates, as this is required with more recent mmdebstrap versions (>=0.8.0), when using apt repositories with https, otherwise bootstrapping Debian fails. 3) Update to latest stable grml-live version v0.42.0, which: a) added support for "bookworm" as suite name `cff66073a7` b) provides corresponding templates for memtest support: `c01a86b3fc` c) and a workaround for a kmod/initramfs-tools issue with PXE/NFS boot: `ea1e5ea330` 4) Update memtest86+ to v6.00-1 as present in Debian/bookworm and add corresponding UEFI support (based on grml-live's upstream change, though as we don't support i386, dropped the 32bit related bits) Change-Id: I327c0e25c28f46e097212ef4329d75fc8d34767c	2 years ago
Guillem Jover	ad9e94efb6	MT#55861 Load the fake-uname.so pre-loaded library from within the chroot We build the pre-loaded library targeting a specific Debian release, which might be different (and newer) to the release Grml was built for. This can cause missing versioned symbols (and a loading failure) if the libc in the outer system is older than the inner system. Change-Id: I84f4f307863e534fe0fff85274ae1d5db809012c	2 years ago
Michael Prokop	d1d0e61512	MT#55379 Use usrmerge for Debian/bookworm based systems The transition to usrmerge has started in Debian, see https://lists.debian.org/debian-devel-announce/2022/09/msg00001.html Debian/bookworm AKA v12 will only support the merged-/usr layout. Systemd is also dropping support for unmerged-usr systems (see https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html). Deploy the expected filesystem layout accordingly, as in: 1) no-merged-usr for Debian release up and including bullseye, and 2) merged-usr starting with bookworm and newer Change-Id: I7b7b294ce12ca245cf978a787bcc20aa9753e73d	3 years ago
Michael Prokop	b372471a20	TT#15305 Fix ngcp-deployment-scripts usage for daily-build-matrix-debian-boxes Git commit `6661b04af0` broke all our bullseye based builds (debian, sipwise + docker), see https://jenkins.mgm.sipwise.com/view/All/job/daily-build-matrix-debian-boxes/ For plain Debian installations we don't have SP_VERSION available, so default to what was used before supporting trunk-weekly next to trunk. Change-Id: I61958f0c67d165d2f6dcb059fe4991ed24a328c9	3 years ago
Victor Seva	1d4f08b7ed	TT#15305 development.sh: support trunk-weekly, take two Change-Id: I83e635dc5916833d0699fd0be5a8a742ef7b40c8	3 years ago
Victor Seva	6661b04af0	TT#15305 deployment.sh: support trunk-weekly Change-Id: Ie98ac5fa0de848cf54a96039af5532eb8012bab9	3 years ago
Michael Prokop	8e063362ef	TT#173500 Create tmpfiles with template name We want to be able to track down any left-behind tmp files, so ensure we're creating them with according file names. Change-Id: I4eb44047f2eb86ba9f0a8aeeb8d6555290f60c00	3 years ago
Mykola Malkov	15aaad8edb	TT#161150 Replace ngcpsp* with ngcpnodename option It's needed for support of spN nodes. Sort options in deployment.sh. Remove unused boot options ngcpnonwrecfg and ngcpfillcache. Change-Id: I300e533c15b71d65e768ca2ed4b3a73eb7ec6954	3 years ago
Mykola Malkov	be237917d7	TT#161150 Refactor options parsing Merge all options parsing to single point. Move options parsing to the top of the script. Parse boot options first then cmd options if they exist. Simplify some checks. Remove unused options. Change-Id: Ibcb099d9bb2ba26ffed9904c8e5065b392ecb78a	3 years ago
Mykola Malkov	a99d9ff6e2	TT#161150 Refactoring default values and parameter parsing Sort default values. Rework cmd parameters parsing - remove some reassign, reformat to be more clear, etc. Add some default options CROLE, EADDR, EXTERNAL_NETMASK, ROLE. Change-Id: I287facafeb53dc5390517424935c8a50932246dc	3 years ago
Volodymyr Fedorov	7b53916c30	TT#157450 Add extra logging entries and copy logs later Add extra deployment statuses for grub-install and try to have more data logged. Change-Id: Id06dfad1264f781157631c51035ab219cfc30070	3 years ago
Michael Prokop	3073c27a40	TT#118659 EFI support: ensure to always have a proper FAT filesystem available If grml-debootstrap detects an existing FAT filesystem on the EFI partition, it doesn't modify/re-create it: \| EFI partition /dev/nvme0n1p2 seems to have a FAT filesystem, not modifying. The underlying check is execution of `fsck.vfat -bn $DEVICE`. Now with fsck.fat from dosfstools v4.1-2 as present in Debian/buster we got: \| root@grml ~ # fsck.vfat -bn /dev/nvme0n1p2 \| fsck.fat 4.1 (2017-01-24) \| 0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. \| Automatically removing dirty bit. \| There are differences between boot sector and its backup. \| This is mostly harmless. Differences: (offset:original/backup) \| 0:00/eb, 82:00/46, 83:00/41, 84:00/54, 85:00/33, 86:00/32, 87:00/20 \| , 88:00/20, 89:00/20, 510:00/55, 511:00/aa \| Not automatically fixing this. \| Leaving filesystem unchanged. \| 1 root@grml ~ # Now with dosfstools v4.2-1 as present in Debian/bullseye, this might become: \| root@grml ~ # fsck.vfat -bn /dev/nvme0n1p2 \| fsck.fat 4.2 (2021-01-31) \| There are differences between boot sector and its backup. \| This is mostly harmless. Differences: (offset:original/backup) \| 0:00/eb, 65:01/00, 82:00/46, 83:00/41, 84:00/54, 85:00/33, 86:00/32 \| , 87:00/20, 88:00/20, 89:00/20, 510:00/55, 511:00/aa \| Not automatically fixing this. In such situations we end up with an incomplete/broken EFI partition, which breaks within our efivarfs post-script: \| Mounting /dev/nvme0n1p2 on /boot/efi \| mount: /boot/efi: wrong fs type, bad option, bad superblock on /dev/nvme0n1p2, missing codepage or helper program, or other error. \| ESC[31;01m-> Failed (rc=1)ESC[0m \| ESC[32;01mESC[0m Removing chroot-script again \| ESC[32;01mESC[0m Executing post-script /etc/debootstrap/post-scripts//efivarfs \| Executing /etc/debootstrap/post-scripts//efivarfs \| Mounting /dev (via bind mount) \| Mounting /boot/efi \| mount: /boot/efi: special device UUID= does not exist. Change-Id: I46939b4e191982a84792f3aca27c6cc415dbdaf4	4 years ago
Michael Prokop	9ec2c3d459	TT#118659 EFI support: provide workaround for grml-debootstrap versions <=0.96 When we run current versions of deployment.sh, which include the fix from commit `f9aea18c`, in combination with grml-debootstrap <=0.96 (as shipped by our Grml deployment ISO version sipwise20210511), deployments using EFI might fail with: \| Mounting /dev/nvme0n1p2 on /boot/efi \| Invoking efibootmgr \| EFI variables are not supported on this system. \| -> Failed (rc=1) \| [...] \| Mounting /dev (via bind mount) \| Mounting efivarfs on /sys/firmware/efi/efivars \| Invoking grub-install with proper EFI environment \| chroot: failed to run command 'grub-install': No such file or directory \| -> Failed (rc=127) This is caused by a failing invocation of efibootmgr from within grml-debootstrap (versions <=0.96 and running with Debian kernel >=5.10), causing grml-debootstrap to exit then. As a result, the EFI specific GRUB steps in grml-debootstrap's grub_install() from within chroot-script doesn't get executed. Therefor the grub-efi-amd64 package is missing for usage by our efivarfs post-script. By re-introducing the efivarfs pre-script from commit `535e6df3` we can work around this bug. Furthermore, when /boot/efi should be mounted within the target system by our efivarfs post-script, it might fail when /proc isn't available, like: \| # chroot /mnt mount /boot/efi \| mount: /boot/efi: can't find UUID=FE60-5B75. This can be fixed by ensuring to mount /proc, /sys etc before /boot/efi. Then scanning for the UUID device (as configured in /etc/fstab) works as expected. While at it fix a comment regarding grml-debootstrap >=v0.97 vs >=v0.99, as only v0.99 behaves as expected with our EFI requirements. Change-Id: I9db677a06f7e161f971743fc18b034ad3191a449	4 years ago
Michael Prokop	cf01ec9257	TT#118659 Ensure that wiping disk signatures works more reliably Noticed while debugging the EFI situation, that wipefs calls might fail, like: \| # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy Using the force option, we could get past this error: \| # wipefs -af /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa But quoting from wipe2fs(8): \| -f, --force \| Force erasure, even if the filesystem is mounted. This is required in order to erase a partition-table signature on a block device. So while this would work, there might be unexpected side effects. Instead let's use a different approach: if we remove the LVM signatures before running wipefs, it behaves as expected: \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g \| root@grml ~ # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy \| 1 root@grml ~ # vgremove -ff ngcp \| Logical volume "root" successfully removed \| Volume group "ngcp" successfully removed \| root@grml ~ # pvremove /dev/nvme0n1p3 --force --force --yes \| Labels on physical volume "/dev/nvme0n1p3" successfully wiped. \| root@grml ~ # wipefs -a /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme0n1: calling ioctl to re-read partition table: Success FTR, when using wipefs' --force option, it still leaves behind the LVM signatures anyways: \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g \| root@grml ~ # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy \| 1 root@grml ~ # wipefs -af /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g So we'd still have to wipe the LVM signatures, while enabling wipefs' --force option could lead to unexpected behaviors. Verified with: \| root@grml ~ # wipefs --version \| wipefs from util-linux 2.36.1 \| root@grml ~ # uname -a \| Linux web01a 5.10.0-6-amd64 #1 SMP Debian 5.10.28-1 (2021-04-09) x86_64 GNU/Linux \| root@grml ~ # lvm version \| head -3 \| LVM version: 2.03.11(2) (2021-01-08) \| Library version: 1.02.175 (2021-01-08) \| Driver version: 4.43.0 Change-Id: Ie4f7b2797d2dcfc27601792d6102a765e4c60c47	4 years ago
Michael Prokop	f9aea18c19	TT#118659 Fixup for efivarfs handling with grml-debootstrap v0.98 This is a followup fixup for commit `535e6df` / Change-Id: I5374322cb0a39cfed6563df6c4c30f1eafe560c1 We had to apply fixes due to efivars vs efivarfs in kernel versions >=5.10, and addressed them in commit `535e6df`. Those changes were incomplete though, as the fix included in grml-debootstrap v0.97 is incomplete: while efibootmgr was properly invoked and working, invocation of grub-install doesn't reliably work (as at that time /sys/firmware/efi/efivars is no longer accessible). GRUB installation on EFI systems without /sys/firmware/efi/efivars present warns with "EFI variables are not supported on this system" (see https://sources.debian.org/src/grub2/2.04-20/debian/patches/efi-variable-storage-minimise-writes.patch/?hl=650#L650), though returns with exit code 0. This leaves us with an incomplete and therefore not booting GRUB EFI environment. This used to work with mr9.5.1 only, because there we install(ed) systems using grml-debootstrap v0.96, which is older than the version v0.97 (which included the EFI workaround) we check for in deployment.sh. Since the grml-debootstrap version v0.96 isn't recent enough there, we applied the fallback to our local scripts, which took care of proper installation of GRUB in EFI environments. On the other side, in recent trunk deployments we have grml-debootstrap v0.98 available, which includes the EFI workaround - therefore our local scripts aren't applied. The resulting installation is incomplete, and recent trunk deployments fail to boot in EFI environments. The according fix for grml-debootstrap has been made and is going to be released in the next few days as v0.99. But to ensure that it's working also with older grml-debootstrap versions (and we don't have to rebuild our squashfs environments), the local scripts have been adjusted. We don't even need any pre-script at all, instead we handle all of the GRUB EFI installation through /etc/debootstrap/post-scripts/efivarfs. FTR: this issue didn't show up on certain test systems of us, because SW-RAID is used there. In deployment.sh we have special handling of SW-RAID regarding efibootmgr and grub-install, see line 2330 ff. Change-Id: Ifa90fbfab7d69bc331acfec15a6cc9318c84ee8f	4 years ago
Manuel Montecelo	a56c4454a3	TT#105151 Do the renaming eth->neth outside of the "if $NGCP_INSTALLER" block Jobs like daily-build-matrix-debian-boxes build plain Debian machines, not NGCP-based ones. At the moment we're generating the udev-rules for network renaming unconditionally, so we have to do it consistently, either both conditionally and not for "plain" systems, or both unconditionally, so network can be brought up by a correct /etc/network/interfaces after the devices are brought up with the new names. There is a good-ish argument for keeping using eth0, as it is more of a default, but we're already deviating from the default for several years and Debian stable releases by having these names and not ones like "ens18" or "enp4s0f2" which is the default in Debian nowadays, at least since buster. So it is probably better to keep it consistent with our other machines and use "neth*" naming for those too. Change-Id: I6b3b49a1769894580df768abb817ae5196e65963	4 years ago
Manuel Montecelo	eaecf474c2	TT#105151 Stop removing just-generated udev-rules for network in VMs The code removed was enabled when $VAGRANT=true, and this happened when passing "vagrant" parameter to deployment.sh, which is done in places like proxmox-vm-clone job, the base of many of our tests machines. VMs do not necessarily have the same hardware configuration, so removing udev-rules for network devices makes sense in principle. Especially when since the beginning we were using network devices named "eth" everywhere, even if in the last years we had to use net.ifnames=0 and udev-rules files in hardware to keep using "eth" names. However, now with mr9.5 and the move to Debian bullseye we have to start using different names, and we settled on the direct translation to "neth". So we need a way to assign whatever network devices the machines come with, including VMs, to names "neth". (If we used the new-permanent device names like ens18 or enp3s0f1 we would have to adapt network.yml and files like network interface, and they would be different across all the different machines (HW and VM) so this is not a better or faster solution to the problem.) So, back to the topic of removal of this udev-rules file: in many cases in our test infra, the machines are built "in place" and then rebooted for upgrades or tests, in princicple with the same hardware configuration, so there is no need to remove these files. In cases where the underlying (virtualized) hardware changes, e.g. to use like local VirtualBox-based vagrant machines, we will need to adapt the rules for the existing devices. Change-Id: I57e39a2ec6849f3b5bb8f6cf518e2a2923ec19cb	4 years ago
Manuel Montecelo	44750996be	TT#105151 Rename network interfaces eth->neth Using "eth*" names was discouraged for many years, we've been finding problems here and there and working around them with the help of udev-rules (/etc/udev/rules.d/70-persistent-net.rules) to map address interfaces according to PCIIDs, using "net.ifnames=0" as Linux kernel boot parameter when booting in GRUB, etc. Finally we found unsurmountable problems when moving to Debian bullseye (mr9.5), because as we attempt to rename interfaces in some hardware systems that we use, we got race conditions and clashes with renaming that we could not solve in other ways. We had different alternatives: - Use names purely deterministic, based on PCI paths (for example "enp4s0f1"), MAC address or other of the alternatives, which would be "definitive", but given that we have a diversity of hardware and VM installations in customers the devices in different systems would be different, and the fact that it would be easier to mistype or confuse them makes this not ideal. - Use names purely based on functionality, like for example "ha0", "ext0" or "int0". The problem in this case is that we would have to find names that would satisfy everyone (and there's no time for doing this at this point), that different of our system types are quite different (e.g. Pro without bonds, Carrier with bonds and many vlans by default; using the same hardware), and some customers with different installations or needs (e.g. using VMs) have also totally different network configuration -- so any attempt to unify this to make good use of the functionality-based names would be very challenging. - Finally, there's the option to use some symbolic names similar to traditional names like "eth0", but without being exactly this. Popular names in general, although there's no wide consensus, are names like "net0" and "lan0". Talking with groups involved in deploying and maintaining the system, the decision was taken to move to names not purely deterministic, and there's no time for purely symbolic (they also didn't express much interest on them), and prefer something more traditional that they are already used too. Instead of names like "net0" or "lan0", they prefer the more direct mapping to existing interfaces like "neth0". This is ugly or slighly discomforting to use for some, but since the main users (among us) of these names prefer them, so be it. It has the advantage of having a very simple and mechanichal translation based on the current names, which is an advantage especially at the critical time of upgrading existing systems to the new name. Change-Id: I4a168c7d81e40f609749f77a509d2acb72d3a9d3	4 years ago
Manuel Montecelo	a50903a30c	TT#105151 Stop adding "net.ifnames=0" to grub config This is commit `cd50e4934c` applied again. As explained in `ab62171c49`, the original change had to be reverted because even if things work perfectly fine, in the case of Vagrant machines (or when passing "vagrant" parameter to the script) the udev-rules for persistent-net devices get removed, so then the network interfaces get "random" names and the configuration in /etc/network/interfaces doesn't match, the network is not brought up. This removal happens in the case of {ce,pro,carrier}-trunk.mgm machines of our tests, which shouldn't be needed, and also in the images created for Vagrant machines, which is understandable because the machines could be brought up with different PCIIDs in different versions of VirtualBox, or due to some other difference -- not sure how we can ensure that the PCIIDs as written in the udev-rules files will work in that case. But in principle this change must go ahead when we solve these problems, so submitting it again to be ready. Change-Id: Ib39481a2608aa56e6ec6c9255e290787a6ce3af7	4 years ago
Manuel Montecelo	d6b5097a86	TT#105151 Run installer under "eatmydata", unless disabled by parameter Run the installer under "eatmydata" to speed up the process. Also add some more information about timing. In some VMs that we install daily ({ce,pro,carrier}-trunk.mgm) we have the following timings: ce-runner, no eatmydata: 162 seconds, 2 mins 42 secs ce-runner, with eatmydata: 142 seconds, 2 mins 22 secs pro-runner, no eatmydata: 246 seconds, 4 mins 06 secs pro-runner, with eatmydata: 217 seconds, 3 mins 37 secs So in these machines, for CE we save about 20 seconds, which is not much in total but it's about 12.5% saving; and in Pro about 30 seconds (and twice, once per machine, so about a minute in total), which is about 12.2% as well. In Carrier, which is mostly equivalent to Pro in this respect and typically at least 8 machines, it would mean about 4 mins in total. When installing in hardware in previous days, maybe due to the disks being slower, the total installation time was slightly slower: pro-hardware (Lenovo ThinkSystem SR250), with eatmydata: 226 seconds, 3 mins 46 secs Installing without eatmydata was not measured yet in hardware, but given that the time to install is similar to the case of pro-runner, probably the performance gain is similar too. This looks like a relevant saving, the risk of things going wrong are minimal, so enable it by default. Change-Id: I8267fad08ff337c02801fb8fad0433d9b6d9f4c2	4 years ago
Manuel Montecelo	ab62171c49	TT#105151 Revert "TT#105151 Stop adding "net.ifnames=0" to grub config" This reverts commit `cd50e4934c`. In principle this works fine when using /etc/udev/rules.d/70-persistent-net.rules, but it turns out that in the test infrastructure (including {ce,pro,carrier}-trunk.mgm machines and build-matrix) we remove the generated rules in many places: if $VAGRANT; then ... # MACs are different on buildbox and on local VirtualBox # see http://ablecoder.com/b/2012/04/09/vagrant-broken-networking-when-packaging-ubuntu-boxes/ echo "Removing '${TARGET_UDEV_PERSISTENT_NET_RULES}'" rm -f "${TARGET_UDEV_PERSISTENT_NET_RULES:?}" So in this way, the interfaces that we get are ens18 in our infra for {ce,pro,carrier}-trunk.mgm machines, and so the generated /etc/network/interfaces usint the fixed names "eth" (in process to be renamed "neth") cannot be found in those systems, and all build-install-vm jobs fail. In a local vagrant machine (ce-trunk from just before the change) we have names like these for the network devices: root@spce:~# dmesg \| grep rename [ 2.051263] e1000 0000:00:09.0 enp0s9: renamed from eth1 [ 2.065876] e1000 0000:00:03.0 enp0s3: renamed from eth0 [ 3.950540] e1000 0000:00:03.0 eth0: renamed from enp0s3 [ 4.049842] e1000 0000:00:09.0 eth1: renamed from enp0s9 In this boot session from which the logs above are taken, was booted with grub without "net.ifnames=0", and udev "70-persistent-net.rules" generated in place with the right infromation, and then of course things work fine. So we need some solution this before moving on with the change now reverted. Change-Id: I25d3b9c175b92214670ebb63a7916b60e0e4e5f9	4 years ago
Manuel Montecelo	cd50e4934c	TT#105151 Stop adding "net.ifnames=0" to grub config Change-Id: I9a2af93c31f7bd4ab93f4e629c3faa2624291be0	4 years ago
Manuel Montecelo	0c746e0515	TT#104381 '-' is a valid character that appears in PCIID sometimes Change-Id: Id94023afa1df8377f023e69f21601d07b15f2fd4	4 years ago
Michael Prokop	535e6df392	TT#118659 Use "efivarfs" instead of "efivars" + mount /sys/firmware/efi/efivars for efibootmgr Current trunk installations based on bullseye using recent Grml environments are broken, as EFI environments running with recent kernel versions (>=5.10) aren't properly detected anymore. This is caused by the missing efivars kernel module. CONFIG_EFI_VARS is no longer available since `20146398c4` (tagged initially as debian/5.10.1-1_exp1 + shipped with kernel package 5.10.1-1~exp1 and newer, incl. 5.10.38-1 as present in current Debian/unstable). Therefore the kernel module efivars is no longer available on more recent Debian kernel systems. Quoting from https://wiki.debian.org/UEFI: \| The older interface was efivars, showing files under \| /sys/firmware/efi/vars, and this is what was used by default in both \| Wheezy and Jessie. \| \| The new interface is efivarfs, which will expose things in a slightly \| different format under /sys/firmware/efi/efivars. This is the new \| preferred way of using UEFI configuration variables, and Debian switched \| to it by default from Stretch onwards. CONFIG_EFI_VARS is no longer required, instead efivarfs seems to be available starting with kernel v3.10 and newer (see linux.git): \| commit a9499fa7cd3fd4824a7202d00c766b269fa3bda6 \| Author: Tom Gundersen teg@jklm.no \| Date: Fri Feb 8 15:37:06 2013 +0000 \| \| efi: split efisubsystem from efivars \| \| This registers /sys/firmware/efi/{,systab,efivars/} whenever EFI is enabled \| and the system is booted with EFI. \| \| This allows \| ) userspace to check for the existence of /sys/firmware/efi as a way \| to determine whether or it is running on an EFI system. \| ) 'mount -t efivarfs none /sys/firmware/efi/efivars' without manually \| loading any modules. \| \| [ Also, move the efivar API into vars.c and unconditionally compile it. \| This allows us to move efivars.c, which now only contains the sysfs \| variable code, into the firmware/efi directory. Note that the efivars.c \| filename is kept to maintain backwards compatability with the old \| efivars.ko module. With this patch it is now possible for efivarfs \| to be built without CONFIG_EFI_VARS - Matt ] and: \| commit d68772b7c83f4b518be15ae96f4827c8ed02f684 \| Author: Matt Fleming matt.fleming@intel.com \| Date: Fri Feb 8 16:27:24 2013 +0000 \| \| efivarfs: Move to fs/efivarfs \| \| Now that efivarfs uses the efivar API, move it out of efivars.c and \| into fs/efivarfs where it belongs. This move will eventually allow us \| to enable the efivarfs code without having to also enable \| CONFIG_EFI_VARS built, and vice versa. \| \| Furthermore, things like, \| \| mount -t efivarfs none /sys/firmware/efi/efivars \| \| will now work if efivarfs is built as a module without requiring the \| use of MODULE_ALIAS(), which would have been necessary when the \| efivarfs code was part of efivars.c. But we also need to ensure /sys/firmware/efi/efivars is mounted, otherwise efibootmgr fails to execute: \| # efibootmgr \| EFI variables are not supported on this system. \| # lsmod\| grep efi \| efi_pstore 16384 0 \| efivarfs 16384 1 \| # mount -t efivarfs none /sys/firmware/efi/efivars \| # efibootmgr \| BootCurrent: 0002 \| Timeout: 3 seconds \| BootOrder: 0001,0002,0003,0000,0004 \| Boot0000* UiApp \| Boot0001* UEFI QEMU QEMU HARDDISK \| Boot0002* UEFI PXEv4 (MAC:02B31C8CA0AA) \| Boot0003* UEFI PXEv4 (MAC:92097BD02A48) \| Boot0004* EFI Internal Shell FTR: we can't test only for existence of directory /sys/firmware/efi/efivars, as it exists but is empty by default, so we need to look inside the directory instead. See https://github.com/grml/grml-debootstrap/pull/174 for the related grml-debootstrap upstream change, which is supposed to be released as of grml-debootstrap v0.97. But as a) grml-debootstrap v0.97 isn't released yet, b) it's unclear whether grml-debootstrap v0.97 will make it into bullseye (soonish, or if at all) and c) we don't have the Grml repositories available via our approx Debian mirror (as used in our PRO/Carrier environments) and don't want to update our Grml squashfs system for this change neither, we need to apply a workaround for this efivars vs efivarfs situation. Otherwise Debian installation fails in EFI environments using Debian kernel >=5.10. Thankfully we can work around this using according pre/post scripts in grml-debootstrap, that's what efivars_workaround() is all about. Thanks: Manuel Montecelo <mmontecelo@sipwise.com> for the initial patch and Volodymyr Fedorov <vfedorov@sipwise.com> for underlying research Change-Id: I5374322cb0a39cfed6563df6c4c30f1eafe560c1	4 years ago
Michael Prokop	93209fb893	TT#122950 Disable building database of manual pages The "Building database of manual pages ..." of mandb(8) is invoked during Debian package installations, and takes a considerable amount of time[1]. By disabling this, we can speed up our installation process, similar to what we already do with all our build environments. If someone really needs the man-db database (for apropos(1) or whatis(1) usage), then invoking `systemctl restart man-db.service` provides that on demand. FTR: there are also /etc/cron.daily/man-db + /etc/cron.weekly/man-db, though they don't do anything when running under systemd. There's also man-db.timer, though we don't have it enabled by default on our NGCP systems. [1] Demo from a running PRO system: \| root@sp2:~# rm -rf /var/cache/man \| root@sp2:~# time systemctl restart man-db.service \| \| real 1m18.357s \| user 0m0.000s \| sys 0m0.009s Change-Id: If98007860490adc5ad954e8c36000abd7281931b	4 years ago
Manuel Montecelo	c73a063f52	TT#118659 Add options to install bullseye Add options to install bullseye in all places where buster is used, use it as default when possible, and keep these for the moment. Switch to bullseye in Dockerfile. Change-Id: I2f693982ba92a671a6f2254c5a245a1d05231404	4 years ago
Mykola Malkov	6e1c841305	TT#119602 Hide errexit on VBoxLinuxAdditions.run call The call: UTS_RELEASE="${KERNELVERSION}" LD_PRELOAD="${FAKE_UNAME}" \ grml-chroot "${TARGET}" /media/cdrom/VBoxLinuxAdditions.run --nox11 fails with: Running in chroot, ignoring request: daemon-reload Before `8a54cd1374` it was skipped so hide it with '\|\| true'. Use 'grml-chroot' instead of 'chroot' as 'grml-chroot' is a wrapper which also cares about required mountpoints. Use single style for "${TARGET}" variable. Change-Id: Icc625c9a58b114f62350fc1e540ddac8a4147f28	4 years ago
Michael Prokop	8a54cd1374	TT#119602 Properly handle trap also in case of errors in functions Quoting from "man bash" about `-E` (AKA errtrace): \| If set, any trap on ERR is inherited by shell functions, command \| substitutions, and commands executed in a subshell environment. \| The ERR trap is normally not inherited in such cases. To demonstrate the problem see this short shell script: \| % cat foo \| set -eu -o pipefail \| \| bailout() { \| echo "Bailing out because of error" >&2 \| exit 1 \| } \| trap bailout 1 2 3 6 9 14 15 ERR \| \| foo() { \| echo "Executing magic" \| magic \| } \| \| foo \| echo end If "magic" can't be executed, then this fails as follows: \| % bash ./foo \| Executing magic \| ./foo: line 11: magic: command not found But it doesn't invoke the bailout function via trap. When using `set -eE` (AKA errexit + errtrace), instead of only `set -e` (errexit), then it behaves as expected though: \| % bash ./foo \| Executing magic \| ./foo: line 11: magic: command not found \| Bailing out because of error Change-Id: I26396b87d4a391a75997c061e866709daa57870e	4 years ago
Michael Prokop	91e047a486	TT#105407 Ensure lvm2 is present before grub-install is executed grub-pc >=2.04-11 has a new behavior regarding /boot/grub/i386-pc/ handling, where we end up with an empty /boot/grub/i386-pc/ after successful grub-install execution: \| root@grml ~ # vgchange -ay \| 3 logical volume(s) in volume group "ngcp" now active \| root@grml ~ # mount /dev/mapper/ngcp-root /mnt \| root@grml ~ # grml-chroot /mnt /bin/bash \| Writing /etc/debian_chroot ... \| (spce)root@grml:/# cd \| (spce)root@grml:~# grub-install /dev/sda \| Installing for i386-pc platform. \| Installation finished. No error reported. \| (spce)root@grml:~# ls -la /boot/grub/i386-pc/ \| total 16 \| drwxr-xr-x 2 root root 12288 Dec 16 12:04 . \| drwxr-xr-x 4 root root 4096 Dec 16 12:07 .. This causes the installed system to fail to boot with: \| GRUB loading.. \| Welcome to GRUB! \| \| error: file `/boot/grub/i386-pc/normal.mod' not found. \| grub rescue> _ The underlying issue is that recent grub versions unlink the files inside /boot/grub/i386-pc, though it doesn't report anything about it (even under `--verbose` execution). This is triggered in our situation, as lvm2's vgs binary isn't present yet. In earlier versions of grub this wasn't causing any problems and grub-install happily installed the files inside /boot/grub/i386-pc, even though we installed lvm2 only afterwards via our metapackages. To ensure lvm2 is available during installation time within grml-debootstrap, explicitly add to it list of packages to be installed. See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=977544 for further details regarding the grub bug. Change-Id: I27a1cd18777526eb26b838fae88d4d87b6e93467	4 years ago
Michael Prokop	6ce51a8c0d	TT#104221 Ensure to have fake-uname.so available also for plain images We install virtualbox-guest-additions in the target system for usage with VirtualBox and shared folders via Vagrant. We invoke the VBoxLinuxAdditions.run machinery from the running Grml live system. But the target systems usually has a different kernel package and version installed, so we have to apply some tricks to get it working. This is where we rely on fake-uname.so. Since commit `a91baa2` (TT#48647 Ship fake_uname lib in package) we're relying on fake-uname.so from ngcp-deployment-scripts, instead of building and shipping it via deployment.sh itself. But we have ngcp-deployment-scripts available only when installing NGCP - as we're installing it there and only afterwards invoke vagrant_configuration() - whereas it's missing when we install a plain Debian system (like with our debian_bullseye_plain_vagrant.box), therefore failing with: \| cp: cannot stat '/mnt/usr/lib/ngcp-deployment-scripts/fake-uname.so': No such file or directory \| ERROR: ld.so: object '/usr/lib/ngcp-deployment-scripts/fake-uname.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Change-Id: I639a43c3deafd2fc188350936e15f48482103209	4 years ago
Michael Prokop	ed52e8fe7a	TT#104221 Use bullseye repos in ensure_packages_installed appropriately The ensure_packages_installed function ensures that specified packages are present during runtime. This is used e.g. for installation of virtualbox-guest-additions-iso Debian package from within vagrant_configuration(), which is used to execute /media/cdrom/VBoxLinuxAdditions.run inside the target system. We can't use random Debian repositories though, as the package dependencies need to match the running live system. So far we only used the buster repository, as our current grml-sipwise ISOs are based on something close to buster. On the other hand we can't use virtualbox-guest-additions-iso from Debian/buster in our Debian/bullseye Vagrant boxes, as /sbin/mount.vboxsf doesn't work then. So use the bullseye repository if the release of the target system is bullseye, which seems to work with our current Grml ISOs and current state of bullseye. Change-Id: Iaf965daa6ff7a62e2b3bd8c55b8f761abd94c241	4 years ago
Michael Prokop	3a5149e01c	TT#100201 Support Debian/bullseye by dropping stretch+buster checks Nowadays we only deploy stretch + buster based Debian systems, so drop those release specific checks to also support bullseye and newer Debian releases. Change-Id: Ibf3d1527ccaeba60526a730e6886e6521c08d20e	5 years ago
Michael Prokop	862fb155f0	TT#83753 Port status server to py3 The /usr/bin/python symlink/binary no longer exists in recent Grml-Sipwise ISOs and python3 doesn't ship SimpleHTTPServer but http.server instead. Change-Id: I6677e8a416b142034d99d5b1d2b11ba74d87a6ec	5 years ago

1 2 3 4 5

225 Commits (236cb2d1a76624c8ac9446470e07f318299b9298)