deployment-iso

Commit Graph

Author	SHA1	Message	Date
Michael Prokop	c476cca52d	MT#57453 Use tty1 for stdin when running under grml-autoconfig service Recent Grml ISOs, including our Grml-Sipwise ISO (v2023-06-01), include grml-autoconfig v0.20.3 which execute the grml-autoconfig service under `StandardInput=null`. This is necessary to not conflict with tty usage, like used with serial console. See `1e268ffe4f` Now that we run with /dev/null for stdin, we can't interact with the user, so let's try to detect when running from within grml-autoconfig's systemd unit, and if so assume that we're executing on /dev/tty1 and use/reopen that for stdin. Change-Id: Id55283c7f862487a6ef8acb8ab01f67a05bd8dd7 (cherry picked from commit `561303359e`)	2 years ago
Sipwise Jenkins Builder	0e14b09149	Release new version 11.4.2.0+0~mr11.4.2.0	2 years ago
Mykola Malkov	f8569926c2	MT#57453 Switch docker image to bookworm Change-Id: I9cfc7f0f6062d5e4916c7ba18b72cbc3e8c8ebbb (cherry picked from commit `3a942b1b8c`)	2 years ago
Sipwise Jenkins Builder	7fed77e813	Release new version 11.4.1.0+0~mr11.4.1.0	2 years ago
Michael Prokop	0fedba6144	MT#57643 Ensure /var/lib/dpkg/available exists on Debian releases <=buster Since version 1.20.0, dpkg no longer creates /var/lib/dpkg/available (see #647911). Now that we upgraded our Grml-Sipwise deployment system to bookworm, we have dpkg v1.21.22 on our live system, and mmdebstrap relies on dpkg of the host system for execution. But on Debian releases until and including buster, dpkg fails to operate with e.g. `dpkg --set-selections`, if /var/lib/dpkg/available doesn't exist: \| The following NEW packages will be installed: \| nullmailer \| [...] \| debconf: delaying package configuration, since apt-utils is not installed \| dpkg: error: failed to open package info file '/var/lib/dpkg/available' for reading: No such file or directory We could also switch from mmdebstrap to debootstrap for deploying Debian releases <=buster, but this would be slower and we use mmdebstrap since quite some time for everything. So instead let's create /var/lib/dpkg/available after bootstrapping the system. Reported towards mmdebstrap as #1037946. Change-Id: I0a87ca255d5eb7144a9c093051c0a6a3114a3c0b	2 years ago
Michael Prokop	eccdc586ae	MT#57644 puppet/git: allow ssh-rsa pubkey usage Now that our deployment system is based on Debian/bookworm, but our gerrit/git server still runs on Debian/bullseye, we run into the OpenSSH RSA issue (RSA signatures using the SHA-1 hash algorithm got disabled by default), see https://michael-prokop.at/blog/2023/06/11/what-to-expect-from-debian-bookworm-newinbookworm/ and https://www.jhanley.com/blog/ssh-signature-algorithm-ssh-rsa-error/ We need to enable ssh-rsa usage, otherwise deployment fails with: \| Warning: Permanently added '[gerrit.mgm.sipwise.com]:29418' (ED25519) to the list of known hosts. \| sign_and_send_pubkey: no mutual signature supported \| puppet-r10k@gerrit.mgm.sipwise.com: Permission denied (publickey). \| fatal: Could not read from remote repository. Change-Id: I5894170dab033d52a2612beea7b6f27ab06cc586	2 years ago
Michael Prokop	8cfb8c8392	MT#57630 Check online connectivity to work around Intel E810 / ice issue Deploying the Debian/bookworm based NGCP system fails on a Lenovo sr250 v2 node with an Intel E810 network card: \| # lshw -c net -businfo \| Bus info Device Class Description \| ======================================================= \| pci@0000:01:00.0 eth0 network Ethernet Controller E810-XXV for SFP \| pci@0000:01:00.1 eth1 network Ethernet Controller E810-XXV for SFP \| # lshw -c net \| -network:0 \| description: Ethernet interface \| product: Ethernet Controller E810-XXV for SFP \| vendor: Intel Corporation \| physical id: 0 \| bus info: pci@0000:01:00.0 \| logical name: eth0 \| version: 02 \| serial: [...] \| size: 10Gbit/s \| capacity: 25Gbit/s \| width: 64 bits \| clock: 33MHz \| capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd \| configuration: autonegotiation=off broadcast=yes driver=ice driverversion=1.11.14 duplex=full firmware=2.25 0x80007027 1.2934.0 ip=192.168.90.51 latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s \| resources: iomemory:400-3ff iomemory:400-3ff irq:16 memory:4002000000-4003ffffff memory:4006010000-400601ffff memory:a1d00000-a1dfffff memory:4005000000-4005ffffff memory:4006220000-400641ffff We set up the /etc/network/interfaces file by invoking Grml's netcardconfig script in automated mode, like: NET_DEV=eth0 METHOD=static IPADDR=192.168.90.51 NETMASK=255.255.255.248 GATEWAY=192.168.90.49 /usr/sbin/netcardconfig The resulting /etc/network/interfaces gets used as base for usage inside the NGCP chroot/target system. netcardconfig shuts down the network interface (eth0 in the example above) via ifdown, then sleeps for 3 seconds and re-enables the interface (via ifup) with the new configuration. This used to work fine so far, but with the Intel e810 network card and kernel version 6.1.0-9-amd64 from Debian/bookworm we see a link failure and it takes ~10 seconds until the network device is up and running again. The following vagrant_configuration() execution from deployment.sh then fails: \| +11:41:01 (netscript.grml:1022): vagrant_configuration(): wget -O /var/tmp/id_rsa_sipwise.pub http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| --2023-06-11 11:41:01-- http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| Resolving builder.mgm.sipwise.com (builder.mgm.sipwise.com)... failed: Name or service not known. \| wget: unable to resolve host address 'builder.mgm.sipwise.com' However, when we retry it again just a bit later, the network works fine again. During investigation we identified that the network card flips the port, quoting the related log from the connected Cisco nexus 5020 switch (with fast stp learning mode): \| nexus5k %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/33 is down (Link failure) It seems to be related to some autonegotiation problem, as when we execute `ethtool -A eth0 rx on tx on` (no matter whether with `on` or `off`), we see: \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: Autoneg did not complete so changing settings may not result in an actual change. \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: NIC Link is Down \| [Tue Jun 13 08:51:45 2023] ice 0000:01:00.0 eth0: NIC Link is up 10 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: NONE, Autoneg Advertised: On, Autoneg Negotiated: False, Flow Control: Rx/Tx FTR: \| root@sp1 ~ # ethtool -A eth0 autoneg off \| netlink error: Operation not supported \| 76 root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # ethtool -A eth0 autoneg on \| root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # dmesg -T \| tail -1 \| [Tue Jun 13 08:53:26 2023] ice 0000:01:00.0 eth0: To change autoneg please use: ethtool -s <dev> autoneg <on\|off> \| root@sp1 ~ # ethtool -s eth0 autoneg off \| root@sp1 ~ # ethtool -s eth0 autoneg on \| netlink error: link settings update failed \| netlink error: Operation not supported \| 75 root@sp1 ~ # As a workaround, at least until we have a better fix/solution, we try to reach the default gateway (or fall back to the repository host if gateway couldn't be identified) via ICMP/ping, and once that works we we continue as usual. But even if that should fail we continue execution, to minimize behavior change but have a workaround for this specific situation available. FTR, broken system: \| root@sp1 ~ # ethtool -i eth0 \| driver: ice \| version: 6.1.0-9-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Whereas with kernel 5.10.0-23-amd64 from Debian/bullseye we don't seem to see that behavior: \| root@sp1:~# ethtool -i neth0 \| driver: ice \| version: 5.10.0-23-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Also using latest available ice v1.11.14 (from https://sourceforge.net/projects/e1000/files/ice%20stable/1.11.14/) on Kernel version 6.1.0-9-amd64 doesn't bring any change: \| root@sp1 ~ # modinfo ice \| filename: /lib/modules/6.1.0-9-amd64/updates/drivers/net/ethernet/intel/ice/ice.ko \| firmware: intel/ice/ddp/ice.pkg \| version: 1.11.14 \| license: GPL v2 \| description: Intel(R) Ethernet Connection E800 Series Linux Driver \| author: Intel Corporation, <linux.nics@intel.com> \| srcversion: 818E9C817731C98A25470C0 \| alias: pci:v00008086d00001888svsdbcsci \| [...] \| alias: pci:v00008086d00001591svsdbcsci* \| depends: ptp \| retpoline: Y \| name: ice \| vermagic: 6.1.0-9-amd64 SMP preempt mod_unload modversions \| parm: debug:netif level (0=none,...,16=all) (int) \| parm: fwlog_level:FW event level to log. All levels <= to the specified value are enabled. Values: 0=none, 1=error, 2=warning, 3=normal, 4=verbose. Invalid values: >=5 \| (ushort) \| parm: fwlog_events:FW events to log (32-bit mask) \| (ulong) \| root@sp1 ~ # ethtool -i eth0 \| head -3 \| driver: ice \| version: 1.11.14 \| firmware-version: 2.25 0x80007027 1.2934.0 \| root@sp1 ~ # Change-Id: Ieafe648be4e06ed0d936611ebaf8ee54266b6f3c	2 years ago
Michael Prokop	f4da3e094e	MT#57049 Ensure SW-RAID device is inactive before re-reading partition table Re-reading of disks fails if the mdadm SW-RAID device is still active: \| root@sp1 ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| md0 : active raid1 sdb3[1] sda3[0] \| 468218880 blocks super 1.2 [2/2] [UU] \| [========>............] resync = 42.2% (197855168/468218880) finish=22.4min speed=200756K/sec \| bitmap: 3/4 pages [12KB], 65536KB chunk \| \| unused devices: <none> \| root@sp1 ~ # blockdev --rereadpt /dev/sdb \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # blockdev --rereadpt /dev/sda \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # Only if we stop the mdadm SW-RAID device, then we can re-read the partition table: \| root@sp1 ~ # mdadm --stop /dev/md0 \| mdadm: stopped /dev/md0 \| root@sp1 ~ # blockdev --rereadpt /dev/sda \| root@sp1 ~ # This behavior isn't new and unrelated to Debian/bookworm but was spotted while debugging an unrelated issue. FTR: we re-read the partition table (via `blockdev --rereadpt`) to ensure that /etc/fstab of the live system is up2date and matches the current system state. While this isn't stricly needed, we preserve existing behavior and also try to avoid a hard "cut" of a possibly ongoing SW-RAID sync. Change-Id: I735b00423e6efa932f74b78a38ed023576e5d306	2 years ago
Michael Prokop	2ad306c465	MT#57556 Prompt for reboot/halt only in interactive mode With our newer Grml-Sipwise ISO (v2023-06-01) being based on Debian/bookworm and recent Grml packages, our automated deployment suddenly started to fail for us: \| +04:28:12 (netscript.grml:2453): echo 'Successfully finished deployment process [Fri Jun 2 04:28:12 UTC 2023 - running 576 seconds]' \| ++04:28:12 (netscript.grml:2455): get_deploy_status \| ++04:28:12 (netscript.grml:95): get_deploy_status(): '[' -r /srv/deployment//status ']' \| ++04:28:12 (netscript.grml:96): get_deploy_status(): cat /srv/deployment//status \| Successfully finished deployment process [Fri Jun 2 04:28:12 UTC 2023 - running 576 seconds] \| +04:28:12 (netscript.grml:2455): '[' copylogfiles '!=' error ']' \| +04:28:12 (netscript.grml:2456): set_deploy_status finished \| +04:28:12 (netscript.grml:103): set_deploy_status(): '[' -n finished ']' \| +04:28:12 (netscript.grml:104): set_deploy_status(): echo finished \| +04:28:12 (netscript.grml:2459): false \| +04:28:12 (netscript.grml:2463): status_wait \| +04:28:12 (netscript.grml:329): status_wait(): [[ -n 0 ]] \| +04:28:12 (netscript.grml:329): status_wait(): [[ 0 != 0 ]] \| +04:28:12 (netscript.grml:2466): false \| +04:28:12 (netscript.grml:2471): false \| +04:28:12 (netscript.grml:2476): echo 'Do you want to [r]eboot or [h]alt the system now? (Press any other key to cancel.)' \| Do you want to [r]eboot or [h]alt the system now? (Press any other key to cancel.) \| +04:28:12 (netscript.grml:2477): unset a \| +04:28:12 (netscript.grml:2478): read -r a \| ++04:28:12 (netscript.grml:2478): wait_exit \| ++04:28:12 (netscript.grml:339): wait_exit(): local e_code=1 \| ++04:28:12 (netscript.grml:340): wait_exit(): [[ 1 -ne 0 ]] \| ++04:28:12 (netscript.grml:341): wait_exit(): set_deploy_status error \| ++04:28:12 (netscript.grml:103): set_deploy_status(): '[' -n error ']' \| ++04:28:12 (netscript.grml:104): set_deploy_status(): echo error \| ++04:28:12 (netscript.grml:343): wait_exit(): trap '' 1 2 3 6 15 ERR EXIT \| ++04:28:12 (netscript.grml:344): wait_exit(): status_wait \| ++04:28:12 (netscript.grml:329): status_wait(): [[ -n 0 ]] \| ++04:28:12 (netscript.grml:329): status_wait(): [[ 0 != 0 ]] \| ++04:28:12 (netscript.grml:345): wait_exit(): exit 1 As of grml-autoconfig v0.20.3 and newer, the grml-autoconfig systemd service that invokes the deployment netscript uses `StandardInput=null` instead of `StandardInput=tty` (see https://github.com/grml/grml/issues/176). Thanks to this, a logic error in our deployment script showed up. We exit the script in interactive mode, though only afterwards prompting for reboot/halt with `read -r a` - which of course fails if stdin is missing. As a result, we end up in our signal handler `trap 'wait_exit;' 1 2 3 6 15 ERR EXIT` and then fail the deployment. So instead prompt for "Do you want to [r]eboot or [h]alt ..." only in interactive mode, and while at it drop the "if "$INTERACTIVE" ; then exit 0 ; fi" so the prompt is actually presented to the user. Change-Id: Ia89beaf3c446f3701cc30ab21cfdff7b5808a6d3	2 years ago
Michael Prokop	98d11bfc28	MT#57280 Run deployment status server under systemd Manual execution of python's http.server has multiple drawbacks, like no proper logging and no service tracking/restart options, but most notably the deployment status server no longer runs when our deployment script fails. While /srv/deployment/status then still might contain "error", no one is serving that information on port 4242 any longer[1], and our daily-build-install-vm Jenkins job might then report: \| VM '192.168.209.162' current state is '' - retrying up to another 1646 times, sleeping for a second \| VM '192.168.209.162' current state is '' - retrying up to another 1645 times, sleeping for a second \| [...] It then runss for ~1/2 hour without doing anything useful, until the Jenkins job itself gives up. By running our deployment status server under systemd, we keep the service alive also when the deployment script terminates. In case of errors we get immediate feedback: \| VM '192.168.209.162' current state is 'puppet' - retrying up to another 1648 times, sleeping for a second \| VM '192.168.209.162' current state is 'puppet' - retrying up to another 1647 times, sleeping for a second \| VM '192.168.209.162' current state is 'error' - retrying up to another 1646 times, sleeping for a second \| + '[' error '!=' finished ']' \| + echo 'Failed to install Proxom VM '\''162'\'' (IP '\''192.168.209.162'\'')' [1] For our NGCP based installations we use the ngcpstatus boot option, where its status_wait trap kicks in and avoids premature exit of deployment status server. But e.g. our non-NGCP systems don't use that boot option and with this change we could get rid of the status_wait overall. Change-Id: Ibaa799358caedf31c64c37b48e3c5e889808086a	2 years ago
Sipwise Jenkins Builder	583ab91c89	Release new version 11.4.0.0+0~mr11.4.0.0	2 years ago
Michael Prokop	54d48f2716	MT#55861 Update grml-live version to 0.43.0 Packages like 'firmware-linux', 'firmware-linux-nonfree', 'firmware-misc-nonfree' and further 'firmware-*' got moved from non-free to the new non-free-firmware component/repository (related to https://www.debian.org/vote/2022/vote_003). grml-live v0.43.0 provides supports for this new component, so let's make sure we have proper support for firmware related packages by updating to the corresponding grml-live version. Change-Id: I4704e8be051ab6b5496021f07f42208b34963739	2 years ago
Sipwise Jenkins Builder	7a783ce25c	Release new version 11.3.0.0+0~mr11.3.0.0	3 years ago
Michael Prokop	e6819fe674	MT#55944 Use ngcp-initialize-udev-rules-net to deploy 70-persistent-net.rules Use system-tools' ngcp-initialize-udev-rules-net script to deploy the /etc/udev/rules.d/70-persistent-net.rules, no need to maintain code at multiple places. Change-Id: I81925262a8c687aa9976cbc1113568989fa53281	3 years ago
Michael Prokop	ae7db13232	MT#55944 Fix networking for plain Debian systems When building our Debian boxes for buster, bullseye + bookworm (via daily-build-matrix-debian-boxes Jenkins job), we get broken networking, so e.g. `vagrant up debian-bookworm doesn't work. This is caused by /etc/network/interfaces (using e.g. "neth0", being our naming schema which we use in NGCP, as adjusted by the deployment script) not matching the actual system network devices (like enp0s3). TL;DR: no behavior change for NGCP systems, only when building non-NGCP systems then enable net.ifnames=0 (via set_custom_grub_boot_options), but do not generate /etc/udev/rules.d/70-persistent-net.rules (via invoke generate_udev_network_rules) nor rename eth->neth in /etc/network/interfaces. More verbose version: * rename the "eth" networking interfaces into "neth" in /etc/network/interfaces only when running in ngcp-installer mode (this is the behavior we rely on in NGCP, but it doesn't matter for plain Debian systems) * generate /etc/udev/rules.d/70-persistent-net.rules only when running in ngcp-installer mode. While our jenkins-configs.git's jobs/daily-build/scripts/vm_clean-fs.sh removes the file anyways (for the VM use case), between the initial deployment run and the next reboot the configuration inside the PVE VM still applies, so we end up with an existing /etc/udev/rules.d/70-persistent-net.rules, referring to neth0, while our /etc/network/interfaces configures eth0 instead. * when not running in ngcp-installer mode, enable net.ifnames=0 usage in GRUB to disable persistent network interface naming. FTR, this change is not needed for NGCP, as on NGCP systems we use /etc/udev/rules.d/70-persistent-net.rules, generated by ngcp-system-tools' ngcp-initialize-udev-rules-net script also in VM use case This is a fixup for a change in git commit `a50903a30c` (see also commit message of git commit `ab62171`), that should have been adjusted for ngcp-installer-only mode instead. Change-Id: I6d0021dbdc2c1587127f0e115c6ff9844460a761	3 years ago
Michael Prokop	d44bcef4e6	MT#55988 Update kernel command line for installing Debian w/wo puppet The public name servers resolve deb.sipwise.com to our public OVH IP address 164.132.119.186, while internally we want to use its cname haproxy.mgm.sipwise.com. This only works with using our internal nameservers (like 192.168.212.30 and 192.168.88.20). Default to 192.168.212.30, so deployments work as expected, otherwise we're failing during deployment with: \| Err:5 https://deb.sipwise.com/autobuild release-trunk-bookworm InRelease \| 403 Forbidden [IP: 164.132.119.186 443] While at it also update the ip=... kernel option, to use 168.192.91.XX/24 by default, and also use a FQDN for the hostname (since that's our current policy for puppet hostname/certificates). Change-Id: I1ce6541f7a31baa437e679b67056bb7851b1b33d	3 years ago
Michael Prokop	338ba4fab3	MT#55861 Update Grml ISO + latest grml-live version Relevant changes: * GRMLBASE/39-modprobe: avoid usage of /lib/modprobe.d/50-nfs.conf * GRMLBASE/39-modprobe: do not expect all files in /etc/modprobe.d to be used This gives us working netboot images and avoids sysctl errors during bootup, if nfs-kernel-server should be present on the ISO. Change-Id: I0012199658c186b69c45ac51bc249ce75b8d81ce	3 years ago
Michael Prokop	6412814e6b	MT#55949 Ensure we have proper date/time configuration If the date of the running system isn't appropriate enough, then apt runs might fail with somehint like: \| E: Release file for https://deb/sipwise/com/spce/mr10.5.2/dists/bullseye/InRelease is not valid yet (invalid for another 6h 19min 2s) So let's try to sync date/time of the system via NTP. Given that chrony is a small (only 650 kB disk space) and secure replacement for ntp, let's ship chrony with the Grml deployment ISO (and fall back to ntp usage in deployment script if chrony shouldn't be available). Also, if the system is configured to read the RTC time in the local time zone, this is known as another source of problems, so let's make sure to use the RTC in UTC. Change-Id: I747665d1cee3b6f835c62812157d0203bcfa96e2	3 years ago
Michael Prokop	245c7ef702	MT#55861 Update Grml ISO + update to Debian/bookworm For deploying Debian/bookworm (see MT#55524), we'd like to have an updated Grml ISO. With such a Debian/bookworm based live system, we can still deploy older target systems (like Debian/bullseye). Relevant changes: 1) Ad jo as new build-dependency, to generate build information in conf/buildinfo.json (new dependency of grml-live) 2) Always include ca-certificates, as this is required with more recent mmdebstrap versions (>=0.8.0), when using apt repositories with https, otherwise bootstrapping Debian fails. 3) Update to latest stable grml-live version v0.42.0, which: a) added support for "bookworm" as suite name `cff66073a7` b) provides corresponding templates for memtest support: `c01a86b3fc` c) and a workaround for a kmod/initramfs-tools issue with PXE/NFS boot: `ea1e5ea330` 4) Update memtest86+ to v6.00-1 as present in Debian/bookworm and add corresponding UEFI support (based on grml-live's upstream change, though as we don't support i386, dropped the 32bit related bits) Change-Id: I327c0e25c28f46e097212ef4329d75fc8d34767c	3 years ago
Guillem Jover	ad9e94efb6	MT#55861 Load the fake-uname.so pre-loaded library from within the chroot We build the pre-loaded library targeting a specific Debian release, which might be different (and newer) to the release Grml was built for. This can cause missing versioned symbols (and a loading failure) if the libc in the outer system is older than the inner system. Change-Id: I84f4f307863e534fe0fff85274ae1d5db809012c	3 years ago
Michael Prokop	d1d0e61512	MT#55379 Use usrmerge for Debian/bookworm based systems The transition to usrmerge has started in Debian, see https://lists.debian.org/debian-devel-announce/2022/09/msg00001.html Debian/bookworm AKA v12 will only support the merged-/usr layout. Systemd is also dropping support for unmerged-usr systems (see https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html). Deploy the expected filesystem layout accordingly, as in: 1) no-merged-usr for Debian release up and including bullseye, and 2) merged-usr starting with bookworm and newer Change-Id: I7b7b294ce12ca245cf978a787bcc20aa9753e73d	3 years ago
Sipwise Jenkins Builder	7bb58e612a	Release new version 11.2.0.0+0~mr11.2.0.0	3 years ago
Michael Prokop	b372471a20	TT#15305 Fix ngcp-deployment-scripts usage for daily-build-matrix-debian-boxes Git commit `6661b04af0` broke all our bullseye based builds (debian, sipwise + docker), see https://jenkins.mgm.sipwise.com/view/All/job/daily-build-matrix-debian-boxes/ For plain Debian installations we don't have SP_VERSION available, so default to what was used before supporting trunk-weekly next to trunk. Change-Id: I61958f0c67d165d2f6dcb059fe4991ed24a328c9	3 years ago
Sipwise Jenkins Builder	2c9d783498	Release new version 11.1.0.0+0~mr11.1.0.0	3 years ago
Victor Seva	1d4f08b7ed	TT#15305 development.sh: support trunk-weekly, take two Change-Id: I83e635dc5916833d0699fd0be5a8a742ef7b40c8	3 years ago
Victor Seva	6661b04af0	TT#15305 deployment.sh: support trunk-weekly Change-Id: Ie98ac5fa0de848cf54a96039af5532eb8012bab9	3 years ago
Sipwise Jenkins Builder	07a556c4dd	Release new version 11.0.0.0+0~mr11.0.0.0	3 years ago
Mykola Malkov	c177a98100	TT#179354 Add mr10.5 LTS key to bootstrap Now it contains: pub rsa4096 2015-03-05 [SC] [expires: 2029-10-12] 68A702B1FD8E422AAAA1ADA3773236EFF411A836 uid [ unknown] Sipwise GmbH (Sipwise Repository Key) <support@sipwise.com> sub rsa4096 2015-03-05 [E] [expires: 2029-10-12] pub rsa4096 2011-06-06 [SC] F7B8A739CE638D719A078C9859104633EE5E097D uid [ unknown] Sipwise autobuilder (Used to sign packages for autobuild) <development@sipwise.com> sub rsa4096 2011-06-06 [E] pub rsa4096 2021-05-04 [SCEA] [expires: 2031-05-02] AB7FE3DCD53767F6160406442A5CA71B542B9A22 uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2022-05-31 [SCEA] [expires: 2032-05-28] 39EB73D5B54870181632E48786C3B4395CB844A2 uid [ unknown] Sipwise autobuilder <development@sipwise.com> Change-Id: Ic851724f3580a4f6addbba41b42d97c02acf4ff2	3 years ago
Michael Prokop	8e063362ef	TT#173500 Create tmpfiles with template name We want to be able to track down any left-behind tmp files, so ensure we're creating them with according file names. Change-Id: I4eb44047f2eb86ba9f0a8aeeb8d6555290f60c00	3 years ago
Mykola Malkov	15aaad8edb	TT#161150 Replace ngcpsp* with ngcpnodename option It's needed for support of spN nodes. Sort options in deployment.sh. Remove unused boot options ngcpnonwrecfg and ngcpfillcache. Change-Id: I300e533c15b71d65e768ca2ed4b3a73eb7ec6954	3 years ago
Mykola Malkov	be237917d7	TT#161150 Refactor options parsing Merge all options parsing to single point. Move options parsing to the top of the script. Parse boot options first then cmd options if they exist. Simplify some checks. Remove unused options. Change-Id: Ibcb099d9bb2ba26ffed9904c8e5065b392ecb78a	3 years ago
Sipwise Jenkins Builder	b87e6c0efe	Release new version 10.5.0.0+0~mr10.5.0.0	3 years ago
Michael Prokop	f27f51c6c8	TT#165600 Add support for NVMe disks The logic to detect disks via /proc/partitions didn't cover NVMe disks, as the regex '[a-z]$' fails for the "nvme0n1" pattern: \| % cat /proc/partitions \| major minor #blocks name \| \| 259 0 500107608 nvme0n1 \| 259 1 524288 nvme0n1p1 \| 259 2 499582279 nvme0n1p2 \| [...] \| 8 0 384638976 sda \| 8 1 384606208 sda1 Instead, let's use lsblk to detect present disks, which works fine for all kinds of disks, incl. NVMe devices. Change-Id: I586877da8b4fadf3d05b4e6c8e88bfdeae6d7f15	3 years ago
Mykola Malkov	a99d9ff6e2	TT#161150 Refactoring default values and parameter parsing Sort default values. Rework cmd parameters parsing - remove some reassign, reformat to be more clear, etc. Add some default options CROLE, EADDR, EXTERNAL_NETMASK, ROLE. Change-Id: I287facafeb53dc5390517424935c8a50932246dc	3 years ago
Sipwise Jenkins Builder	4413beff39	Release new version 10.4.0.0+0~mr10.4.0.0	4 years ago
Volodymyr Fedorov	7b53916c30	TT#157450 Add extra logging entries and copy logs later Add extra deployment statuses for grub-install and try to have more data logged. Change-Id: Id06dfad1264f781157631c51035ab219cfc30070	4 years ago
Sipwise Jenkins Builder	50d8966747	Release new version 10.3.0.0+0~mr10.3.0.0	4 years ago
Guillem Jover	6b9820eaa2	TT#124273 Use $(MAKE) instead of make Otherwise things like parallel execution will not take effect. Change-Id: Ie63260693c1f03462cb7346d96cf799875e26a0b	4 years ago
Guillem Jover	13bde60c6d	TT#124273 Update packaging for bullseye - Switch to debhelper compat level 13. - Switch to Standards-Version 4.5.1. - Update copyright years. Change-Id: I9bd03ec6fef5c8249194d145000fbbad9d853e0f	4 years ago
Sipwise Jenkins Builder	a29f0919d9	Release new version 10.2.0.0+0~mr10.2.0.0	4 years ago
Michael Prokop	3073c27a40	TT#118659 EFI support: ensure to always have a proper FAT filesystem available If grml-debootstrap detects an existing FAT filesystem on the EFI partition, it doesn't modify/re-create it: \| EFI partition /dev/nvme0n1p2 seems to have a FAT filesystem, not modifying. The underlying check is execution of `fsck.vfat -bn $DEVICE`. Now with fsck.fat from dosfstools v4.1-2 as present in Debian/buster we got: \| root@grml ~ # fsck.vfat -bn /dev/nvme0n1p2 \| fsck.fat 4.1 (2017-01-24) \| 0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. \| Automatically removing dirty bit. \| There are differences between boot sector and its backup. \| This is mostly harmless. Differences: (offset:original/backup) \| 0:00/eb, 82:00/46, 83:00/41, 84:00/54, 85:00/33, 86:00/32, 87:00/20 \| , 88:00/20, 89:00/20, 510:00/55, 511:00/aa \| Not automatically fixing this. \| Leaving filesystem unchanged. \| 1 root@grml ~ # Now with dosfstools v4.2-1 as present in Debian/bullseye, this might become: \| root@grml ~ # fsck.vfat -bn /dev/nvme0n1p2 \| fsck.fat 4.2 (2021-01-31) \| There are differences between boot sector and its backup. \| This is mostly harmless. Differences: (offset:original/backup) \| 0:00/eb, 65:01/00, 82:00/46, 83:00/41, 84:00/54, 85:00/33, 86:00/32 \| , 87:00/20, 88:00/20, 89:00/20, 510:00/55, 511:00/aa \| Not automatically fixing this. In such situations we end up with an incomplete/broken EFI partition, which breaks within our efivarfs post-script: \| Mounting /dev/nvme0n1p2 on /boot/efi \| mount: /boot/efi: wrong fs type, bad option, bad superblock on /dev/nvme0n1p2, missing codepage or helper program, or other error. \| ESC[31;01m-> Failed (rc=1)ESC[0m \| ESC[32;01mESC[0m Removing chroot-script again \| ESC[32;01mESC[0m Executing post-script /etc/debootstrap/post-scripts//efivarfs \| Executing /etc/debootstrap/post-scripts//efivarfs \| Mounting /dev (via bind mount) \| Mounting /boot/efi \| mount: /boot/efi: special device UUID= does not exist. Change-Id: I46939b4e191982a84792f3aca27c6cc415dbdaf4	4 years ago
Michael Prokop	9ec2c3d459	TT#118659 EFI support: provide workaround for grml-debootstrap versions <=0.96 When we run current versions of deployment.sh, which include the fix from commit `f9aea18c`, in combination with grml-debootstrap <=0.96 (as shipped by our Grml deployment ISO version sipwise20210511), deployments using EFI might fail with: \| Mounting /dev/nvme0n1p2 on /boot/efi \| Invoking efibootmgr \| EFI variables are not supported on this system. \| -> Failed (rc=1) \| [...] \| Mounting /dev (via bind mount) \| Mounting efivarfs on /sys/firmware/efi/efivars \| Invoking grub-install with proper EFI environment \| chroot: failed to run command 'grub-install': No such file or directory \| -> Failed (rc=127) This is caused by a failing invocation of efibootmgr from within grml-debootstrap (versions <=0.96 and running with Debian kernel >=5.10), causing grml-debootstrap to exit then. As a result, the EFI specific GRUB steps in grml-debootstrap's grub_install() from within chroot-script doesn't get executed. Therefor the grub-efi-amd64 package is missing for usage by our efivarfs post-script. By re-introducing the efivarfs pre-script from commit `535e6df3` we can work around this bug. Furthermore, when /boot/efi should be mounted within the target system by our efivarfs post-script, it might fail when /proc isn't available, like: \| # chroot /mnt mount /boot/efi \| mount: /boot/efi: can't find UUID=FE60-5B75. This can be fixed by ensuring to mount /proc, /sys etc before /boot/efi. Then scanning for the UUID device (as configured in /etc/fstab) works as expected. While at it fix a comment regarding grml-debootstrap >=v0.97 vs >=v0.99, as only v0.99 behaves as expected with our EFI requirements. Change-Id: I9db677a06f7e161f971743fc18b034ad3191a449	4 years ago
Michael Prokop	cf01ec9257	TT#118659 Ensure that wiping disk signatures works more reliably Noticed while debugging the EFI situation, that wipefs calls might fail, like: \| # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy Using the force option, we could get past this error: \| # wipefs -af /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa But quoting from wipe2fs(8): \| -f, --force \| Force erasure, even if the filesystem is mounted. This is required in order to erase a partition-table signature on a block device. So while this would work, there might be unexpected side effects. Instead let's use a different approach: if we remove the LVM signatures before running wipefs, it behaves as expected: \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g \| root@grml ~ # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy \| 1 root@grml ~ # vgremove -ff ngcp \| Logical volume "root" successfully removed \| Volume group "ngcp" successfully removed \| root@grml ~ # pvremove /dev/nvme0n1p3 --force --force --yes \| Labels on physical volume "/dev/nvme0n1p3" successfully wiped. \| root@grml ~ # wipefs -a /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme0n1: calling ioctl to re-read partition table: Success FTR, when using wipefs' --force option, it still leaves behind the LVM signatures anyways: \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g \| root@grml ~ # wipefs -a /dev/nvme0n1 \| wipefs: error: /dev/nvme0n1: probing initialization failed: Device or resource busy \| 1 root@grml ~ # wipefs -af /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x3a38b2de00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| root@grml ~ # pvs \| PV VG Fmt Attr PSize PFree \| /dev/nvme0n1p3 ngcp lvm2 a-- <232.41g <222.41g \| root@grml ~ # vgs \| VG #PV #LV #SN Attr VSize VFree \| ngcp 1 1 0 wz--n- <232.41g <222.41g \| root@grml ~ # lvs \| LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert \| root ngcp -wi-a----- 10.00g So we'd still have to wipe the LVM signatures, while enabling wipefs' --force option could lead to unexpected behaviors. Verified with: \| root@grml ~ # wipefs --version \| wipefs from util-linux 2.36.1 \| root@grml ~ # uname -a \| Linux web01a 5.10.0-6-amd64 #1 SMP Debian 5.10.28-1 (2021-04-09) x86_64 GNU/Linux \| root@grml ~ # lvm version \| head -3 \| LVM version: 2.03.11(2) (2021-01-08) \| Library version: 1.02.175 (2021-01-08) \| Driver version: 4.43.0 Change-Id: Ie4f7b2797d2dcfc27601792d6102a765e4c60c47	4 years ago
Sipwise Jenkins Builder	431a1b6dfa	Release new version 10.1.0.0+0~mr10.1.0.0	4 years ago
Michael Prokop	f9aea18c19	TT#118659 Fixup for efivarfs handling with grml-debootstrap v0.98 This is a followup fixup for commit `535e6df` / Change-Id: I5374322cb0a39cfed6563df6c4c30f1eafe560c1 We had to apply fixes due to efivars vs efivarfs in kernel versions >=5.10, and addressed them in commit `535e6df`. Those changes were incomplete though, as the fix included in grml-debootstrap v0.97 is incomplete: while efibootmgr was properly invoked and working, invocation of grub-install doesn't reliably work (as at that time /sys/firmware/efi/efivars is no longer accessible). GRUB installation on EFI systems without /sys/firmware/efi/efivars present warns with "EFI variables are not supported on this system" (see https://sources.debian.org/src/grub2/2.04-20/debian/patches/efi-variable-storage-minimise-writes.patch/?hl=650#L650), though returns with exit code 0. This leaves us with an incomplete and therefore not booting GRUB EFI environment. This used to work with mr9.5.1 only, because there we install(ed) systems using grml-debootstrap v0.96, which is older than the version v0.97 (which included the EFI workaround) we check for in deployment.sh. Since the grml-debootstrap version v0.96 isn't recent enough there, we applied the fallback to our local scripts, which took care of proper installation of GRUB in EFI environments. On the other side, in recent trunk deployments we have grml-debootstrap v0.98 available, which includes the EFI workaround - therefore our local scripts aren't applied. The resulting installation is incomplete, and recent trunk deployments fail to boot in EFI environments. The according fix for grml-debootstrap has been made and is going to be released in the next few days as v0.99. But to ensure that it's working also with older grml-debootstrap versions (and we don't have to rebuild our squashfs environments), the local scripts have been adjusted. We don't even need any pre-script at all, instead we handle all of the GRUB EFI installation through /etc/debootstrap/post-scripts/efivarfs. FTR: this issue didn't show up on certain test systems of us, because SW-RAID is used there. In deployment.sh we have special handling of SW-RAID regarding efibootmgr and grub-install, see line 2330 ff. Change-Id: Ifa90fbfab7d69bc331acfec15a6cc9318c84ee8f	4 years ago
Michael Prokop	51b4ba2444	TT#82852 Update grml-live version to latest upstream grml-live version v0.38.5 is available for Grml 2021.07(-rc1) usage Change-Id: Ic6ff7b98fff2ce32a07914a975967eec2d5726f2	4 years ago
Michael Prokop	4a5e73a6f6	TT#82852 Update grml2usb version to latest upstream Update to version 0.18.5, as available with Grml 2021.07-rc1 and in Debian/bullseye Change-Id: Idc4ddf633e67e21b4a955b0fede1d81c815e9dc1	4 years ago
Manuel Montecelo	a56c4454a3	TT#105151 Do the renaming eth->neth outside of the "if $NGCP_INSTALLER" block Jobs like daily-build-matrix-debian-boxes build plain Debian machines, not NGCP-based ones. At the moment we're generating the udev-rules for network renaming unconditionally, so we have to do it consistently, either both conditionally and not for "plain" systems, or both unconditionally, so network can be brought up by a correct /etc/network/interfaces after the devices are brought up with the new names. There is a good-ish argument for keeping using eth0, as it is more of a default, but we're already deviating from the default for several years and Debian stable releases by having these names and not ones like "ens18" or "enp4s0f2" which is the default in Debian nowadays, at least since buster. So it is probably better to keep it consistent with our other machines and use "neth*" naming for those too. Change-Id: I6b3b49a1769894580df768abb817ae5196e65963	4 years ago
Manuel Montecelo	eaecf474c2	TT#105151 Stop removing just-generated udev-rules for network in VMs The code removed was enabled when $VAGRANT=true, and this happened when passing "vagrant" parameter to deployment.sh, which is done in places like proxmox-vm-clone job, the base of many of our tests machines. VMs do not necessarily have the same hardware configuration, so removing udev-rules for network devices makes sense in principle. Especially when since the beginning we were using network devices named "eth" everywhere, even if in the last years we had to use net.ifnames=0 and udev-rules files in hardware to keep using "eth" names. However, now with mr9.5 and the move to Debian bullseye we have to start using different names, and we settled on the direct translation to "neth". So we need a way to assign whatever network devices the machines come with, including VMs, to names "neth". (If we used the new-permanent device names like ens18 or enp3s0f1 we would have to adapt network.yml and files like network interface, and they would be different across all the different machines (HW and VM) so this is not a better or faster solution to the problem.) So, back to the topic of removal of this udev-rules file: in many cases in our test infra, the machines are built "in place" and then rebooted for upgrades or tests, in princicple with the same hardware configuration, so there is no need to remove these files. In cases where the underlying (virtualized) hardware changes, e.g. to use like local VirtualBox-based vagrant machines, we will need to adapt the rules for the existing devices. Change-Id: I57e39a2ec6849f3b5bb8f6cf518e2a2923ec19cb	4 years ago
Manuel Montecelo	44750996be	TT#105151 Rename network interfaces eth->neth Using "eth*" names was discouraged for many years, we've been finding problems here and there and working around them with the help of udev-rules (/etc/udev/rules.d/70-persistent-net.rules) to map address interfaces according to PCIIDs, using "net.ifnames=0" as Linux kernel boot parameter when booting in GRUB, etc. Finally we found unsurmountable problems when moving to Debian bullseye (mr9.5), because as we attempt to rename interfaces in some hardware systems that we use, we got race conditions and clashes with renaming that we could not solve in other ways. We had different alternatives: - Use names purely deterministic, based on PCI paths (for example "enp4s0f1"), MAC address or other of the alternatives, which would be "definitive", but given that we have a diversity of hardware and VM installations in customers the devices in different systems would be different, and the fact that it would be easier to mistype or confuse them makes this not ideal. - Use names purely based on functionality, like for example "ha0", "ext0" or "int0". The problem in this case is that we would have to find names that would satisfy everyone (and there's no time for doing this at this point), that different of our system types are quite different (e.g. Pro without bonds, Carrier with bonds and many vlans by default; using the same hardware), and some customers with different installations or needs (e.g. using VMs) have also totally different network configuration -- so any attempt to unify this to make good use of the functionality-based names would be very challenging. - Finally, there's the option to use some symbolic names similar to traditional names like "eth0", but without being exactly this. Popular names in general, although there's no wide consensus, are names like "net0" and "lan0". Talking with groups involved in deploying and maintaining the system, the decision was taken to move to names not purely deterministic, and there's no time for purely symbolic (they also didn't express much interest on them), and prefer something more traditional that they are already used too. Instead of names like "net0" or "lan0", they prefer the more direct mapping to existing interfaces like "neth0". This is ugly or slighly discomforting to use for some, but since the main users (among us) of these names prefer them, so be it. It has the advantage of having a very simple and mechanichal translation based on the current names, which is an advantage especially at the critical time of upgrading existing systems to the new name. Change-Id: I4a168c7d81e40f609749f77a509d2acb72d3a9d3	4 years ago

1 2 3 4 5 ...

500 Commits (mr11.4) All Branches Search

500 Commits (mr11.4)

All Branches