deployment-iso

Commit Graph

Author	SHA1	Message	Date
Michael Prokop	c828990503	MT#62436 Use virtualbox-guest-additions ISO from upstream on Debian/trixie virtualbox-guest-additions-iso v7.0.20-1 as present in current Debian/trixie doesn't yet support kernel v6.12.22-1 (being the current kernel version in Debian/trixie), while upstream supports kernel 6.12 as of VirtualBox 7.1.4. Reported towards Debian as https://bugs.debian.org/1104024 FTR: \| mprokop@jenkins1 ~ % cd /var/www/files \| mprokop@jenkins1 ~www/files % wget https://download.virtualbox.org/virtualbox/7.1.8/VBoxGuestAdditions_7.1.8.iso \| [...] \| mprokop@jenkins1 ~www/files % curl -s https://download.virtualbox.org/virtualbox/7.1.8/SHA256SUMS \| sha256sum -c --ignore-missing \| VBoxGuestAdditions_7.1.8.iso: OK Change-Id: I32aa7806e375c4b85084a99d5a6903f632807694	1 day ago
Michael Prokop	112f883d49	MT#62436 ensure_packages_installed: to not get stuck on conf file conflicts Our deployment ISO might be outdated and when installing any additional packages, we might get stuck in dpkg: \| +10:10:34 (netscript.grml:311): ensure_packages_installed(): DEBIAN_FRONTEND=noninteractive \| +10:10:34 (netscript.grml:311): ensure_packages_installed(): apt-get -o dir::cache=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/cachedir -o dir::state=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/statedir -o dir::etc=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/etc -o dir::e \| tc::trustedparts=/etc/apt/trusted.gpg.d/ -y --no-install-recommends install jq \| Reading package lists... \| Building dependency tree... \| The following additional packages will be installed: \| [...] \| Get:33 https://debian.sipwise.com/debian trixie/main amd64 libnss-myhostname amd64 257.5-2 [113 kB] \| Preconfiguring packages ... \| Fetched 25.3 MB in 4s (6777 kB/s) \| (Reading database ... 32224 files and directories currently installed.) \| Preparing to unpack .../base-files_13.7_amd64.deb ... \| Unpacking base-files (13.7) over (12.4+deb12u10) ... \| Setting up base-files (13.7) ... \| Installing new version of config file /etc/debian_version ... \| \| Configuration file '/etc/issue' \| ==> Modified (by you or by a script) since installation. \| ==> Package distributor has shipped an updated version. \| What would you like to do about it ? Your options are: \| Y or I : install the package maintainer's version \| N or O : keep your currently-installed version \| D : show the differences between the versions \| Z : start a shell to examine the situation \| The default action is to keep your current version. \| \| *** issue (Y/I/N/O/D/Z) [default=N] ? # Avoid this, by setting DPKG option `--force-confnew`. Change-Id: Ic5fed3dbe4744e07290159cec6952468c0557c29	1 day ago
Michael Prokop	779b43b915	MT#62436 Support Debian/trixie in ensure_packages_installed vboxadd-service.service fails on our Debian/trixie systems: \| root@spce:~# lsb_release -c \| Codename: trixie \| \| root@spce:~# systemctl --failed \| UNIT LOAD ACTIVE SUB DESCRIPTION \| ● vboxadd-service.service loaded failed failed VirtualBox Guest Additions Services Daemon \| \| Legend: LOAD → Reflects whether the unit definition was properly loaded. \| ACTIVE → The high-level unit activation state, i.e. generalization of SUB. \| SUB → The low-level unit activation state, values depend on unit type. \| \| 1 loaded units listed. \| \| root@spce:~# sudo systemctl status vboxadd-service.service \| × vboxadd-service.service - VirtualBox Guest Additions Services Daemon \| Loaded: loaded (/etc/systemd/system/vboxadd-service.service; disabled; preset: disabled) \| Drop-In: /etc/systemd/system/vboxadd-service.service.d \| └─override.conf \| Active: failed (Result: exit-code) since Thu 2025-04-24 09:08:15 CEST; 34min ago \| Invocation: 4e151a29f0054a90a717a928fcfb3f8d \| Mem peak: 2.2M \| CPU: 17ms \| \| Apr 24 09:08:15 spce systemd[1]: Starting vboxadd-service.service... \| Apr 24 09:08:15 spce vboxadd-service[1934]: vboxadd-service.sh: Starting VirtualBox Guest Addition service. \| Apr 24 09:08:15 spce vboxadd-service.sh[1937]: Starting VirtualBox Guest Addition service. \| Apr 24 09:08:15 spce vboxadd-service[1940]: VBoxService: error: VbglR3Init failed with rc=VERR_FILE_NOT_FOUND \| Apr 24 09:08:15 spce vboxadd-service.sh[1943]: VirtualBox Guest Addition service started. \| Apr 24 09:08:15 spce systemd[1]: vboxadd-service.service: Control process exited, code=exited, status=1/FAILURE \| Apr 24 09:08:15 spce systemd[1]: vboxadd-service.service: Failed with result 'exit-code'. \| Apr 24 09:08:15 spce systemd[1]: Failed to start vboxadd-service.service. \| \| root@spce:~# cat /etc/systemd/system/vboxadd.service.d/override.conf \| [Unit] \| ConditionVirtualization=oracle \| \| root@spce:~# cat /var/log/vboxadd-setup.log \| Building the main Guest Additions 7.0.6 module for kernel 6.12.22-amd64. \| Error building the module. Build output follows. \| make V=1 CONFIG_MODULE_SIG= CONFIG_MODULE_SIG_ALL= -C /lib/modules/6.12.22-amd64/build M=/tmp/vbox.0 SRCROOT=/tmp/vbox.0 -j2 modules \| make[1]: warning: -j2 forced in submake: resetting jobserver mode. \| [...] \| [,,,] /tmp/vbox.0/VBoxGuest-common.c \| /tmp/vbox.0/VBoxGuest-linux.c:196:21: error: ‘no_llseek’ undeclared here (not in a function); did you mean ‘noop_llseek’? \| 196 \| llseek: no_llseek, \| \| ^~~~~~~~~ \| \| noop_llseek \| /tmp/vbox.0/VBoxGuest-linux.c: In function ‘vgdrvLinuxParamLogGrpSet’: \| /tmp/vbox.0/VBoxGuest-linux.c:1364:9: error: implicit declaration of function ‘strlcpy’; did you mean ‘strncpy’? [-Wimplicit-function-declaration] \| 1364 \| strlcpy(&g_szLogGrp[0], pszValue, sizeof(g_szLogGrp)); \| \| ^~~~~~~ \| \| strncpy \| make[2]: * [/usr/src/linux-headers-6.12.22-common/scripts/Makefile.build:234: /tmp/vbox.0/VBoxGuest-linux.o] Error 1 \| make[2]: * Waiting for unfinished jobs.... \| [...] We get virtualbox-guest-additions-iso v7.0.6-1 for Debian stable/bookworm, but virtualbox-guest-additions-iso v7.0.20-1 is available in current Debian testing AKA trixie. Ensure we use the package from trixie for trixie based systems, even though the the VirtualBox Guest Additions v7.0.20 don't work for kernel 6.12.22 either, yet. Also adjust ensure_packages_installed to fail installation, if we're using a yet unknown/unexpected Debian release, to not fall back to Debian/bookworm, to prevent issue like it has been observed here. See MT#60815 for main tracking issue WRT Debian/trixie Change-Id: I030525d37edbe1cf75065d021b51d38273ce81ef	1 day ago
Michael Prokop	b2e2954852	MT#62436 Fix shellcheck issues + parse IP information programmatically As reported when sending new deployment-iso reviews, triggered by newer docker image / shellcheck: \| not ok 1 source/templates/scripts/includes/deployment.sh:1543:10: warning: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a. [SC2206] \| not ok 2 source/templates/scripts/includes/deployment.sh:1903:22: warning: Prefer mapfile or read -a to split command output (or quote to avoid splitting). [SC2207] \| not ok 3 source/templates/scripts/includes/deployment.sh:2275:20: warning: Prefer mapfile or read -a to split command output (or quote to avoid splitting). [SC2207] \| not ok 4 source/templates/scripts/includes/deployment.sh:2486:12: note: Not following: ./etc/profile.d/puppet-agent.sh was not specified as input (see shellcheck -x). [SC1091] Let's take this as a chance to properly parse ip(8) output via its JSON output, instead of awk/sed magic. Change-Id: I723959626fb514ab9e57202b0e5f415b411f5a01	1 day ago
Sipwise Jenkins Builder	4b7a0e518b	Release new version 13.4.0.0+0~mr13.4.0.0	2 weeks ago
Guillem Jover	dfd46069e7	MT#62436 Remove workaround for vboxadd services We have made these services conditional on running inside a VirtualBox VM, so we do not need to remove them anymore. Change-Id: I6dc563688ba5b0c5e935b0cb88767fcb05ab9a19	3 weeks ago
Sipwise Jenkins Builder	8dbd67c82d	Release new version 13.3.0.0+0~mr13.3.0.0	3 months ago
Sipwise Jenkins Builder	6dac69d9df	Release new version 13.2.0.0+0~mr13.2.0.0	5 months ago
Michael Prokop	41029ed891	MT#61264 Mark EFI partition as such only when running in an EFI environment On Debian/trixie we get a failing efi.mount systemd unit: \| root@sp1:~# systemctl --failed \| UNIT LOAD ACTIVE SUB DESCRIPTION \| ● efi.mount loaded failed failed EFI System Partition Automount \| \| Legend: LOAD → Reflects whether the unit definition was properly loaded. \| ACTIVE → The high-level unit activation state, i.e. generalization of SUB. \| SUB → The low-level unit activation state, values depend on unit type. \| \| 1 loaded units listed. \| \| root@sp1:~# systemctl status efi.mount \| × efi.mount - EFI System Partition Automount \| Loaded: loaded (/run/systemd/generator.late/efi.mount; generated) \| Active: failed (Result: exit-code) since Fri 2024-11-15 17:20:59 CET; 28min ago \| Invocation: 62c7b659dfd540e294f4b1f6fcda5e13 \| TriggeredBy: ● efi.automount \| Where: /efi \| What: /dev/disk/by-diskseq/9-part2 \| Docs: man:systemd-gpt-auto-generator(8) \| Mem peak: 1.5M \| CPU: 8ms \| \| Nov 15 17:20:59 sp1 systemd[1]: Mounting efi.mount - EFI System Partition Automount... \| Nov 15 17:20:59 sp1 mount[631]: mount: /efi: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error. \| Nov 15 17:20:59 sp1 mount[631]: dmesg(1) may have more information after failed mount system call. \| Nov 15 17:20:59 sp1 systemd[1]: efi.mount: Mount process exited, code=exited, status=32/n/a \| Nov 15 17:20:59 sp1 systemd[1]: efi.mount: Failed with result 'exit-code'. \| Nov 15 17:20:59 sp1 systemd[1]: Failed to mount efi.mount - EFI System Partition Automount. \| \| root@sp1:~# ls -la /efi \| ls: cannot open directory '/efi': No such device \| \| root@sp1:~# ls -la /dev/disk/by-diskseq/9-part2 \| lrwxrwxrwx 1 root root 10 Nov 15 17:20 /dev/disk/by-diskseq/9-part2 -> ../../sda2 \| \| root@sp1:~# blkid /dev/sda2 \| /dev/sda2: PARTLABEL="EFI System" PARTUUID="fa67b52e-c018-401d-ac71-fad324cad193" The efi.mount systemd unit is automatically generated by systemd-gpt-auto-generator. Quoting from systemd-gpt-auto-generator(8): \| The ESP is mounted to /boot/ if that directory exists and is not used \| for XBOOTLDR, and otherwise to /efi/ This got introduced as of systemd v254, see `6a488fa7cc` for details. Now having systemd v256.7-3 in current Debian/testing AKA trixie, we need to make sure to not present an EFI partition if we actually don't use it. So when we don't run in an EFI environment do not mark the second partition as EFI one. Change-Id: I546c77fce862a41594500a33da1178c5c6182a1a	5 months ago
Michael Prokop	cfe9cceb6a	MT#61271 trixie: adjust sshd_config after system is installed If we set up /etc/ssh/sshd_config early in early system deployment, we end up with an empty /etc/ssh/sshd_config configuration file with only our own changes: \| root@spce:~# cat /etc/ssh/sshd_config \| # added by deployment.sh \| PerSourcePenalties no \| # end of deployment.sh changes \| ### Added by ngcp-installer \| PermitRootLogin yes The other defaults of sshd are OK for us, but for automated SSH logins we also need: AuthorizedKeysFile %h/.ssh/authorized_keys %h/.ssh/sipwise_vagrant_key And for SCP-ing files we also need: Subsystem sftp /usr/lib/openssh/sftp-server Otherwise our Jenkins job fail due to failing ssh/scp actions. So instead move our trixie specific code in deployment.sh for adjusting /etc/ssh/sshd_config to be executed after installing base system. Then the openssh-server package sets up /etc/ssh/sshd_config as expected, and we only extend its configuration then. While at it, explicitly mark beginning and end of our changes. Change-Id: I68a235b55e9cf18c39e9034b7f3b2ed0ffd237f0	6 months ago
Michael Prokop	6eee97de7b	MT#61265 trixie: avoid SSH login failures due to OpenSSH penalize feature Our https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/ matrix no longer provides builds for debian/trixie, because its daily-build-images subproject Jenkins job with its proxmox-vm-clean-fs job failed to run. After running proxmox-vm-clean-fs under `set -x`, and also overriding the ssh_wrapper function with `ssh -v ...`, I managed to grab this from the Jenkins job execution: \| + ssh -v -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'ServerAliveInterval 10' -o 'ConnectTimeout 15' 192.168.210.101 'rm -vf /etc/udev/rules.d/70-persistent-net.rules' \| OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.14 4 Jun 2024 \| debug1: Reading configuration data /var/lib/jenkins/.ssh/config \| debug1: /var/lib/jenkins/.ssh/config line 7: Applying options for 192.168.* \| debug1: Reading configuration data /etc/ssh/ssh_config \| debug1: /etc/ssh/ssh_config line 50: Applying options for * \| debug1: /etc/ssh/ssh_config line 57: Deprecated option "useroaming" \| debug1: Connecting to 192.168.210.101 [192.168.210.101] port 22. \| debug1: fd 3 clearing O_NONBLOCK \| debug1: Connection established. \| debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise type 0 \| debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise-cert type -1 \| debug1: identity file /var/lib/jenkins/.ssh/id_rsa type 0 \| debug1: identity file /var/lib/jenkins/.ssh/id_rsa-cert type -1 \| debug1: identity file /var/lib/jenkins/.ssh/id_dsa type -1 \| debug1: identity file /var/lib/jenkins/.ssh/id_dsa-cert type -1 \| debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u3 \| debug1: kex_exchange_identification: banner line 0: Not allowed at this time The `Not allowed at this time` pointed to a new OpenSSH feature, which triggered the regression for us. OpenSSH introduced options to penalize undesirable behavior, see https://undeadly.org/cgi?action=article;sid=20240607042157 and https://www.openssh.com/releasenotes.html#9.9p1 and https://sources.debian.org/src/openssh/1:9.9p1-1/sshd.c/?hl=576#L573 This is now present as of openssh-server v1:9.9p1-1 since end of September 2024 also in Debian/trixie. Now, when too many SSH logins fail, a client system can't necessarily no longer connect via SSH due this new penalty behavior. And indeed, within our Jenkins job "daily-build-install-vm" we try to collect several log files through our grab_log and SSH wrapper: \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /tmp/ngcp-installer-cmdline.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-cmdline.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/syslog /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/syslog \| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/boot /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/boot We even execute this grab_log wrapper twice: once for the running Grml live system, and once when we booted into the actually deployed system. This works fine for the Grml live system situation, but as root logins aren't allowed by default in OpenSSH since quite some time, all the sipwise-ssh-copier runs with user/password against a plain Debian system then fail. As a consequence, we lock ourselves out of the system with all those SSH login failures, and the Jenkins job proxmox-vm-clean-fs then runs into the OpenSSH penalty, which causes the trixie/debian job to fail. We use our Debian images as base for further configuration, where we control the sshd_config file through our ngcpcfg system anyways, so the `PerSourcePenalties no` setting is supposed to disappear then. FTR: We could also enable `PermitRootLogin yes` in sshd_config to get the grab_log working, though this didn't have any relevance for us so far. Disabling only the `PerSourcePenalties` feature feels like a better trade-off, at least security wise, for now. Change-Id: Ibf16019b4787cc63d450501c8bccebeac77dd9f1	6 months ago
Sipwise Jenkins Builder	4debc55f6b	Release new version 13.1.0.0+0~mr13.1.0.0	7 months ago
Sipwise Jenkins Builder	862c84ccc6	MT#60698 Add mr12.5 LTS key to bootstrap Now it contains: pub rsa4096 2015-03-05 [SC] [expires: 2029-10-12] 68A702B1FD8E422AAAA1ADA3773236EFF411A836 uid [ unknown] Sipwise GmbH (Sipwise Repository Key) <support@sipwise.com> sub rsa4096 2015-03-05 [E] [expires: 2029-10-12] pub rsa4096 2011-06-06 [SC] F7B8A739CE638D719A078C9859104633EE5E097D uid [ unknown] Sipwise autobuilder (Used to sign packages for autobuild) <development@sipwise.com> sub rsa4096 2011-06-06 [E] pub rsa4096 2022-05-31 [SCEA] [expires: 2032-05-28] 39EB73D5B54870181632E48786C3B4395CB844A2 uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2023-08-04 [SCEA] [expires: 2033-08-01] F0A595D85C375447BB09F25E34A72CE4979CA98A uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2024-08-14 [SCEA] [expires: 2034-08-12] A164D3A12AC0F6AB8F737EF66D1B7D01D2AD9C24 uid [ unknown] Sipwise autobuilder <development@sipwise.com> Change-Id: I142de8611572fd35fa6bbac3695b236a1b3f9a97	8 months ago
Sipwise Jenkins Builder	88efd48cad	Release new version 13.0.0.0+0~mr13.0.0.0	9 months ago
Michael Prokop	cf94193f88	MT#60284 Ensure to start qemu-guest-agent only after package got installed We install the qemu-guest-agent package in ensure_packages_installed(). Try to start the qemu-guest-agent service only afterwards therefore. Fixup for commit `82e6638b40` Change-Id: Ic4aa2e493851b4c92ac134d68a9a76e05485658d	11 months ago
Michael Prokop	4a292ab4be	MT#60284 Only check whether /dev/virtio-ports/org.qemu.guest_agent.0 exists /dev/virtio-ports/org.qemu.guest_agent.0 usually is a symlink to the character device /dev/vport1p1. So adjust the device check accordingly and only verify it exists, but don't expected any special file type. This actually matches the behavior we also have in ngcp-installer. Fixup for commit `82e6638b40` Change-Id: I0aa93c1f0e1086847eb7ed6967692a52e183bdc3	11 months ago
Michael Prokop	82e6638b40	MT#60284 Make sure qemu-guest-agent is available Now that we enabled the QEMU Guest Agent option for our PVE VMs, we need to have qemu-guest-agent present and active. Otherwise the VMs might fail to shut down, like with our debian/sipwise/docker Debian systems which are created via https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/: \| [proxmox-vm-shutdown] $ /bin/sh -e /tmp/env-proxmox-vm-shutdown7956268380939677154.sh \| [environment-script] Adding variable 'vm1reset' with value 'NO' \| [environment-script] Adding variable 'vm2' with value 'none' \| [environment-script] Adding variable 'vm1' with value 'none' \| [environment-script] Adding variable 'vm2reset' with value 'NO' \| [proxmox-vm-shutdown] $ /bin/bash /tmp/jenkins14192704603218787414.sh \| Using safe VM 'shutdown' for modern releases (mr6.5+). Executing action 'shutdown'... \| Shutting down VM 106 \| Build timed out (after 10 minutes). Marking the build as aborted. \| Build was aborted \| [WS-CLEANUP] Deleting project workspace... Let's make sure qemu-guest-agent is available in our Grml live system. We added qemu-guest-agent to the package list of our Grml Sipwise ISO (see git rev `65c3fea4c`), but to ensure we don't strictly depend on this brand new Grml Sipwise ISO yet, make sure to install it on-the-fly if not yet present (like we already did for git, augeas-tools + gdisk). Also make sure qemu-guest-agent service is enabled if socket /dev/virtio-ports/org.qemu.guest_agent.0 is present (indicating that the agent feature is enabled on VM level). Furthermore ensure qemu-guest-agent is present also in the installed Debian system. Otherwise when rebooting the VM once it's no longer running the Grml live system but the installed Debian system, it might also fail to shutdown. So add it to the default package list of packages for bootstrapping. Change-Id: Id6adac55a47cfaed542cad2f9ac9740783e6d924	11 months ago
Michael Prokop	24841c09eb	MT#60283 Update grml-live to latest stable release v0.47.7 Change-Id: Ia157034ebfadb884f475802046a596937b4afac4	11 months ago
Michael Prokop	c30b0b5af6	MT#60283 Update grml2usb to latest stable release v0.19.2 Change-Id: Ic74d4f00c5b67baf135f6249acc81dfc214ac77c	11 months ago
Michael Prokop	65c3fea4c5	MT#60284 Provide qemu-guest-agent in our Grml Sipwise ISO Otherwise we lack qemu-guest-agent integration in our VMs when running Grml live system. Change-Id: Ie61d85c36dfbddddfbd59b46b6bfc4f0e98b587a	11 months ago
Sipwise Jenkins Builder	aff8154df7	Release new version 12.5.0.0+0~mr12.5.0.0	11 months ago
Mykola Malkov	6cf4786735	MT#59872 Remove NGCP_PXE_INSTALL variable With this variable we had some tricks in ngcp-initial-configuration if the Pro sp2 node is installer via iPXE/cm image. Now we support installation of sp2 via iPXE only so no need to pass this variable. But we need to keep parent ngcppxeinstall parameter as we need this information for netcardconfig. Change-Id: I20491289917cbb427ad6f5670f108c632838be71	1 year ago
Mykola Malkov	0fb8327415	MT#59872 Remove Pro sp2 from boot menu We are dropping the scenario when sp2 node is installed from cd image so remove appropriate part of the code. Change-Id: Idced6b43a21add903dca070aa68f84b77acba28e	1 year ago
Guillem Jover	0a91a49826	MT#58014 Remove support for fetching OpenPGP certificates from keyservers The code trying to fetch the OpenPGP certificate from a keyserver has been non-functional for a while as the GPG_KEY_SERVER variable was removed in commit `316c28bcc2`. Instead of restoring the variable with an up-to-date keyserver (not part of the SKS pool, as that network is dead), we remove the support entirely as it's a potential security issue due to fingerprint collisions for example. As a side effect this removes apt-key usage which has been deprecated upstream and is slated for removal. Change-Id: I63171a66201c631da9233d54579bd1601ff22e3e	1 year ago
Sipwise Jenkins Builder	362f7cbea1	Release new version 12.4.0.0+0~mr12.4.0.0	1 year ago
Michael Prokop	e99f33e11a	TT#118659 Do not fail when deploying SW-RAID if no RAID was present yet Followup fix for commit `fc9b43f92e` (Fix re-deploying over existing SW-RAID arrays). We try to detect present SW-RAIDs and identify the disks which are part of the RAID array, to be able to properly reset them then. Though if we don't find such an existing SW-RAID array the orig_swraid_device variable stays unset and our deployments with SW-RAID fails now, as observed on carrier-sp1-trunk: \| root@carrier-sp1-trunk ~ # tail -20 /tmp/deployment-installer-debug.log \| ++02:00:04 (netscript.grml:620): set_up_partition_table_swraid(): head -1 \| ++02:00:04 (netscript.grml:620): set_up_partition_table_swraid(): lsblk --list --noheadings --output TYPE,NAME \| Sleeping for 10 seconds (as requested via boot option 'ngcpstatus') \| +02:00:04 (netscript.grml:620): set_up_partition_table_swraid(): raid_device= \| +02:00:04 (netscript.grml:623): set_up_partition_table_swraid(): [[ -n '' ]] \| +02:00:04 (netscript.grml:645): set_up_partition_table_swraid(): [[ -b /dev/md0 ]] \| /tmp/netscript.grml: line 669: orig_swraid_device: unbound variable \| ++02:00:04 (netscript.grml:1): set_up_partition_table_swraid(): wait_exit \| ++02:00:04 (netscript.grml:339): wait_exit(): local e_code=1 \| ++02:00:04 (netscript.grml:340): wait_exit(): [[ 1 -ne 0 ]] \| ++02:00:04 (netscript.grml:341): wait_exit(): set_deploy_status error \| ++02:00:04 (netscript.grml:103): set_deploy_status(): '[' -n error ']' \| ++02:00:04 (netscript.grml:104): set_deploy_status(): echo error \| ++02:00:04 (netscript.grml:343): wait_exit(): trap '' 1 2 3 6 15 ERR EXIT \| ++02:00:04 (netscript.grml:344): wait_exit(): status_wait \| ++02:00:04 (netscript.grml:329): status_wait(): [[ -n 10 ]] \| ++02:00:04 (netscript.grml:329): status_wait(): [[ 10 != 0 ]] \| ++02:00:04 (netscript.grml:333): status_wait(): echo 'Sleeping for 10 seconds (as requested via boot option '\''ngcpstatus'\'')' \| ++02:00:04 (netscript.grml:334): status_wait(): sleep 10 \| ++02:00:14 (netscript.grml:345): wait_exit(): exit 1 FTR: \| root@carrier-sp1-trunk ~ # cat /proc/cmdline \| BOOT_IMAGE=vmlinuz initrd=initrd.img fetch=http://builder6.mgm.sipwise.com:3000/ngcp-pxe-boot-sipwise20230915/fs/grml64-small/grml64-small.squashfs boot=live ignore_bootid apm=power-off nomce net.ifnames=0 noprompt noeject vga=791 ssh=sipwise ethdevice=eth0 ethdevice-timeout=30 live-netdev=eth0 netscript=http://deb.sipwise.com/netscript/master/deployment.sh debianrelease=bookworm lowperformance enablevmservices debugmode ngcpvers=trunk ngcpnoupload ngcppro ngcpsp1 ngcphostname=web01a ngcpcrole=mgmt ngcpnonwrecfg dns=1.1.1.1,1.0.0.1 ngcpeaddr=192.168.209.180 ip=192.168.209.180::192.168.209.1:255.255.255.0:sp1:eth0:off vagrant swraiddisk1=sda swraiddisk2=sdb ngcpnodename=sp1 ngcpstatus=10 swapfilesize=2048M rootfssize=8G fallbackfssize=10M \| \| root@carrier-sp1-trunk ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| unused devices: <none> \| \| root@carrier-sp1-trunk ~ # lsblk \| NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS \| loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/grml64-small.squashfs \| /run/live/rootfs/grml64-small.squashfs \| sda 8:0 0 16G 0 disk \| sdb 8:16 0 16G 0 disk \| sr0 11:0 1 1024M 0 rom Change-Id: I2329aaa0754674b5d192a174b644900f09f9db84	1 year ago
Michael Prokop	1d59d89d04	TT#118659 Do not abort on disk partition listing failures We identify any existing partitions of the disk we need to wipe via: \| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda \| sda \| sda1 \| sda2 \| sda3 \| root@license42 ~ # blockdevice="/dev/sda" \| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda \| grep -v "^${blockdevice#\/dev\/}$" \| sda1 \| sda2 \| sda3 This might fail though, if there are no partitions present: \| root@license42 ~ # dd if=/dev/zero of=/dev/sda bs=10M count=1 \| 1+0 records in \| 1+0 records out \| 10485760 bytes (10 MB, 10 MiB) copied, 0.0487036 s, 215 MB/s \| root@license42 ~ # pvremove /dev/sda --force --force --yes \| Labels on physical volume "/dev/sda" successfully wiped. \| root@license42 ~ # blockdevice="/dev/sda" \| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda \| grep -v "^${blockdevice#\/dev\/}$" \| 1 root@license42 ~ # Ending up in our daily-build-install-vm Jenkins jobs like this: \| +13:08:19 (netscript.grml:489): clear_partition_table(): echo 'Removing possibly existing LVM/PV label from /dev/sda' \| +13:08:19 (netscript.grml:490): clear_partition_table(): pvremove /dev/sda --force --force --yes \| Labels on physical volume "/dev/sda" successfully wiped. \| ++13:08:19 (netscript.grml:495): clear_partition_table(): grep -v '^sda$' \| ++13:08:19 (netscript.grml:495): clear_partition_table(): lsblk --noheadings --output KNAME /dev/sda \| +++13:08:19 (netscript.grml:495): clear_partition_table(): wait_exit \| +++13:08:19 (netscript.grml:339): wait_exit(): local e_code=1 \| +++13:08:19 (netscript.grml:340): wait_exit(): [[ 1 -ne 0 ]] \| +++13:08:19 (netscript.grml:341): wait_exit(): set_deploy_status error \| +++13:08:19 (netscript.grml:103): set_deploy_status(): '[' -n error ']' \| +++13:08:19 (netscript.grml:104): set_deploy_status(): echo error \| Wiping disk signatures from /dev/sda \| +++13:08:19 (netscript.grml:343): wait_exit(): trap '' 1 2 3 6 15 ERR EXIT \| +++13:08:19 (netscript.grml:344): wait_exit(): status_wait \| +++13:08:19 (netscript.grml:329): status_wait(): [[ -n 0 ]] \| +++13:08:19 (netscript.grml:329): status_wait(): [[ 0 != 0 ]] Followup change for `e9244a289b`, to fix failing VM deployments. Change-Id: Ic63ecf5dd090722705473ad5aac289473b082650	1 year ago
Michael Prokop	fc9b43f92e	TT#118659 Fix re-deploying over existing SW-RAID arrays Fresh deployments with SW-RAID (Software-RAID) might fail if the present disks were already part of an SW-RAID setup: \| Error: disk nvme1n1 seems to be part of an existing SW-RAID setup. We could also reproduce this inside PVE VMs: \| mdadm: /dev/md/127 has been started with 2 drives. \| Error: disk sda seems to be part of an existing SW-RAID setup. This is caused by the following behavior: \| + SWRAID_DEVICE="/dev/md0" \| [...] \| + mdadm --assemble --scan \| + true \| + [[ -b /dev/md0 ]] \| + for disk in "${SWRAID_DISK1}" "${SWRAID_DISK2}" \| + grep -q nvme1n1 /proc/mdstat \| + die 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.' \| + echo 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.' \| Error: disk nvme1n1 seems to be part of an existing SW-RAID setup. By default we expect and set the SWRAID_DEVICE to be /dev/md0. But only "local" arrays get assembled as /dev/md0 and upwards, whereas "foreign" arrays start at md127 downwards. This is exactly what we get when booting our deployment live system on top of an existing installation, and assemble existing SW-RAIDs (to not overwrite unexpected disks by mistake): \| root@grml ~ # lsblk \| NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS \| loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/ngcp.squashfs \| /run/live/rootfs/ngcp.squashfs \| nvme0n1 259:0 0 447.1G 0 disk \| └─md127 9:127 0 447.1G 0 raid1 \| ├─md127p1 259:14 0 18G 0 part \| ├─md127p2 259:15 0 18G 0 part \| ├─md127p3 259:16 0 405.6G 0 part \| ├─md127p4 259:17 0 512M 0 part \| ├─md127p5 259:18 0 4G 0 part \| └─md127p6 259:19 0 1G 0 part \| nvme1n1 259:7 0 447.1G 0 disk \| └─md127 9:127 0 447.1G 0 raid1 \| ├─md127p1 259:14 0 18G 0 part \| ├─md127p2 259:15 0 18G 0 part \| ├─md127p3 259:16 0 405.6G 0 part \| ├─md127p4 259:17 0 512M 0 part \| ├─md127p5 259:18 0 4G 0 part \| └─md127p6 259:19 0 1G 0 part \| \| root@grml ~ # lsblk -l -n -o TYPE,NAME \| loop loop0 \| raid1 md127 \| disk nvme0n1 \| disk nvme1n1 \| part md127p1 \| part md127p2 \| part md127p3 \| part md127p4 \| part md127p5 \| part md127p6 \| \| root@grml ~ # cat /proc/cmdline \| vmlinuz initrd=initrd.img swraiddestroy swraiddisk2=nvme0n1 swraiddisk1=nvme1n1 [...] Let's identify existing RAID devices and check their configuration by going through the disks and comparing them with our SWRAID_DISK1 and SWRAID_DISK2. If they don't match with each other, we stop execution to prevent any possible data damage. Furthermore, we need to assemble the mdadm array without relying on a possibly existing local `/etc/mdadm/mdadm.conf` configuration file. Otherwise assembling might fail: \| root@grml ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| unused devices: <none> \| root@grml ~ # lsblk -l -n -o TYPE,NAME \| awk '/^raid/ {print $2}' \| root@grml ~ # grep ARRAY /etc/mdadm/mdadm.conf \| ARRAY /dev/md/127 metadata=1.0 UUID=0d44774e:7269bac6:2f02f337:4551597b name=localhost:127 \| root@grml ~ # mdadm --assemble --scan \| 2 root@grml ~ # mdadm --assemble --scan --verbose \| mdadm: looking for devices for /dev/md/127 \| mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got 800989c0) \| mdadm: no RAID superblock on /dev/loop0 \| mdadm: No super block found on /dev/nvme1n1p3 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme1n1p3 \| mdadm: No super block found on /dev/nvme1n1p2 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme1n1p2 \| mdadm: No super block found on /dev/nvme1n1p1 (Expected magic a92b4efc, got 000080fe) \| mdadm: no RAID superblock on /dev/nvme1n1p1 \| mdadm: No super block found on /dev/nvme1n1 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme1n1 \| mdadm: No super block found on /dev/nvme0n1p3 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme0n1p3 \| mdadm: No super block found on /dev/nvme0n1p2 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme0n1p2 \| mdadm: No super block found on /dev/nvme0n1p1 (Expected magic a92b4efc, got 000080fe) \| mdadm: no RAID superblock on /dev/nvme0n1p1 \| mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc, got 00000000) \| mdadm: no RAID superblock on /dev/nvme0n1 \| 2 root@grml ~ # mdadm --assemble --scan --config /dev/null \| mdadm: /dev/md/grml:127 has been started with 2 drives. \| root@grml ~ # lsblk -l -n -o TYPE,NAME \| awk '/^raid/ {print $2}' \| md127 By running mdadm assemble with `--config /dev/null`, we prevent consideration and usage of a possibly existing /etc/mdadm/mdadm.conf configuration file. Example output of running the new code: \| [...] \| mdadm: No arrays found in config file or automatically \| NOTE: default SWRAID_DEVICE set to /dev/md0 though we identified active md127 \| NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup \| Wiping signatures from /dev/md127 \| /dev/md127: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 \| Removing mdadm device /dev/md127 \| Stopping mdadm device /dev/md127 \| mdadm: stopped /dev/md127 \| Zero-ing superblock from /dev/nvme1n1 \| mdadm: Unrecognised md component device - /dev/nvme1n1 \| Zero-ing superblock from /dev/nvme0n1 \| mdadm: Unrecognised md component device - /dev/nvme0n1 \| NOTE: modified RAID array detected, setting SWRAID_DEVICE back to original setting '/dev/md0' \| Removing possibly existing LVM/PV label from /dev/nvme1n1 \| Cannot use /dev/nvme1n1: device is partitioned \| Removing possibly existing LVM/PV label from /dev/nvme1n1p1 \| Cannot use /dev/nvme1n1p1: device is too small (pv_min_size) \| Removing possibly existing LVM/PV label from /dev/nvme1n1p2 \| Labels on physical volume "/dev/nvme1n1p2" successfully wiped. \| Removing possibly existing LVM/PV label from /dev/nvme1n1p3 \| Cannot use /dev/nvme1n1p3: device is an md component \| Wiping disk signatures from /dev/nvme1n1 \| /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme1n1: calling ioctl to re-read partition table: Success \| 1+0 records in \| 1+0 records out \| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0027866 s, 376 MB/s \| Removing possibly existing LVM/PV label from /dev/nvme0n1 \| Cannot use /dev/nvme0n1: device is partitioned \| Removing possibly existing LVM/PV label from /dev/nvme0n1p1 \| Cannot use /dev/nvme0n1p1: device is too small (pv_min_size) \| Removing possibly existing LVM/PV label from /dev/nvme0n1p2 \| Labels on physical volume "/dev/nvme0n1p2" successfully wiped. \| Removing possibly existing LVM/PV label from /dev/nvme0n1p3 \| Cannot use /dev/nvme0n1p3: device is an md component \| Wiping disk signatures from /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme0n1: calling ioctl to re-read partition table: Success \| 1+0 records in \| 1+0 records out \| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00278955 s, 376 MB/s \| Creating partition table \| Get path of EFI partition \| pvdevice is now available: /dev/nvme1n1p2 \| The operation has completed successfully. \| The operation has completed successfully. \| pvdevice is now available: /dev/nvme1n1p3 \| pvdevice is now available: /dev/nvme0n1p3 \| mdadm: /dev/nvme1n1p3 appears to be part of a raid array: \| level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024 \| mdadm: Note: this array has metadata at the start and \| may not be suitable as a boot device. If you plan to \| store '/boot' on this device please ensure that \| your boot-loader understands md/v1.x metadata, or use \| --metadata=0.90 \| mdadm: /dev/nvme0n1p3 appears to be part of a raid array: \| level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024 \| mdadm: size set to 468218880K \| mdadm: automatically enabling write-intent bitmap on large array \| Continue creating array? mdadm: Defaulting to version 1.2 metadata \| mdadm: array /dev/md0 started. \| Creating PV + VG on /dev/md0 \| Physical volume "/dev/md0" successfully created. \| Volume group "ngcp" successfully created \| 0 logical volume(s) in volume group "ngcp" now active \| Creating LV 'root' with 10G \| [...] \| \| mdadm: stopped /dev/md127 \| mdadm: No arrays found in config file or automatically \| NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup \| Removing mdadm device /dev/md127 \| Stopping mdadm device /dev/md127 \| mdadm: stopped /dev/md127 \| mdadm: Unrecognised md component device - /dev/nvme1n1 \| mdadm: Unrecognised md component device - /dev/nvme0n1 \| mdadm: /dev/nvme1n1p3 appears to be part of a raid array: \| mdadm: Note: this array has metadata at the start and \| mdadm: /dev/nvme0n1p3 appears to be part of a raid array: \| mdadm: size set to 468218880K \| mdadm: automatically enabling write-intent bitmap on large array \| Continue creating array? mdadm: Defaulting to version 1.2 metadata \| mdadm: array /dev/md0 started. \| lvm2 mdadm wget \| Get:1 http://http-proxy.lab.sipwise.com/debian bookworm/main amd64 mdadm amd64 4.2-5 [443 kB] \| Selecting previously unselected package mdadm. \| Preparing to unpack .../0-mdadm_4.2-5_amd64.deb ... \| Unpacking mdadm (4.2-5) ... \| Setting up mdadm (4.2-5) ... \| [...] \| mdadm: stopped /dev/md0 Change-Id: Ib5875248e9c01dd4251bfab2cc4c94daace503fa	1 year ago
Michael Prokop	e9244a289b	TT#118659 Wipe disk signatures more reliably with SW-RAID and NVMe setup Deployed current NGCP trunk on NVMe powered SW-RAID setup failed with: \| mdadm: size set to 468218880K \| mdadm: automatically enabling write-intent bitmap on large array \| Continue creating array? mdadm: Defaulting to version 1.2 metadata \| mdadm: array /dev/md0 started. \| Creating PV + VG on /dev/md0 \| Cannot use /dev/md0: device is partitioned This is caused because /dev/md0 still contains partition data, and its nvme1n1p3 also still has disk signature about linux_raid_member. So it's not enough to stop the mdadm array, remove PV/LVM information from the partitions and finally wipe SW-RAID disks /dev/nvme1n1 + /dev/nvme0n1 (example output from such a failing run): \| mdadm: /dev/md/0 has been started with 2 drives. \| mdadm: stopped /dev/md0 \| mdadm: Unrecognised md component device - /dev/nvme1n1 \| mdadm: Unrecognised md component device - /dev/nvme0n1 \| Removing possibly existing LVM/PV label from /dev/nvme1n1 \| Cannot use /dev/nvme1n1: device is partitioned \| Removing possibly existing LVM/PV label from /dev/nvme1n1p1 \| Cannot use /dev/nvme1n1p1: device is too small (pv_min_size) \| Removing possibly existing LVM/PV label from /dev/nvme1n1p2 \| Labels on physical volume "/dev/nvme1n1p2" successfully wiped. \| Removing possibly existing LVM/PV label from /dev/nvme1n1p3 \| Cannot use /dev/nvme1n1p3: device is an md component \| Wiping disk signatures from /dev/nvme1n1 \| /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme1n1: calling ioctl to re-read partition table: Success \| 1+0 records in \| 1+0 records out \| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00314195 s, 334 MB/s \| Removing possibly existing LVM/PV label from /dev/nvme0n1 \| Cannot use /dev/nvme0n1: device is partitioned \| Removing possibly existing LVM/PV label from /dev/nvme0n1p1 \| Cannot use /dev/nvme0n1p1: device is too small (pv_min_size) \| Removing possibly existing LVM/PV label from /dev/nvme0n1p2 \| Labels on physical volume "/dev/nvme0n1p2" successfully wiped. \| Removing possibly existing LVM/PV label from /dev/nvme0n1p3 \| Cannot use /dev/nvme0n1p3: device is an md component \| Wiping disk signatures from /dev/nvme0n1 \| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 \| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa \| /dev/nvme0n1: calling ioctl to re-read partition table: Success \| 1+0 records in \| 1+0 records out \| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00893285 s, 117 MB/s \| Creating partition table \| Get path of EFI partition \| pvdevice is now available: /dev/nvme1n1p2 \| The operation has completed successfully. \| The operation has completed successfully. \| pvdevice is now available: /dev/nvme1n1p3 \| pvdevice is now available: /dev/nvme0n1p3 \| mdadm: /dev/nvme1n1p3 appears to be part of a raid array: \| level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 \| mdadm: Note: this array has metadata at the start and \| may not be suitable as a boot device. If you plan to \| store '/boot' on this device please ensure that \| your boot-loader understands md/v1.x metadata, or use \| --metadata=0.90 \| mdadm: /dev/nvme0n1p3 appears to be part of a raid array: \| level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 \| mdadm: size set to 468218880K \| mdadm: automatically enabling write-intent bitmap on large array \| Continue creating array? mdadm: Defaulting to version 1.2 metadata \| mdadm: array /dev/md0 started. \| Creating PV + VG on /dev/md0 \| Cannot use /dev/md0: device is partitioned Instead we also need to wipe signatures from the SW-RAID device (like /dev/md0), only then stop it, ensure we wipe disk signatures also from all the partitions (like /dev/nvme1n1p3) and only then finally remove the disk signatures from the main block device (like /dev/nvme1n1). Example from a successful run with this change: \| root@grml ~ # grep -e mdadm -e Wiping /tmp/deployment-installer-debug.log \| mdadm: /dev/md/0 has been started with 2 drives. \| Wiping signatures from /dev/md0 \| Removing mdadm device /dev/md0 \| Stopping mdadm device /dev/md0 \| mdadm: stopped /dev/md0 \| mdadm: Unrecognised md component device - /dev/nvme1n1 \| mdadm: Unrecognised md component device - /dev/nvme0n1 \| Wiping disk signatures from partition /dev/nvme1n1p1 \| Wiping disk signatures from partition /dev/nvme1n1p2 \| Wiping disk signatures from partition /dev/nvme1n1p3 \| Wiping disk signatures from /dev/nvme1n1 \| Wiping disk signatures from partition /dev/nvme0n1p1 \| Wiping disk signatures from partition /dev/nvme0n1p2 \| Wiping disk signatures from partition /dev/nvme0n1p3 \| Wiping disk signatures from /dev/nvme0n1 \| mdadm: Note: this array has metadata at the start and \| mdadm: size set to 468218880K \| mdadm: automatically enabling write-intent bitmap on large array \| Continue creating array? mdadm: Defaulting to version 1.2 metadata \| mdadm: array /dev/md0 started. \| Wiping ext3 signature on /dev/ngcp/root. \| Wiping ext4 signature on /dev/ngcp/fallback. \| Wiping ext4 signature on /dev/ngcp/data. While at it, be more verbose about the executed steps. FTR, disk and setup information of such a system where we noticed the failure and worked on this change: \| root@grml ~ # fdisk -l \| Disk /dev/nvme0n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors \| Disk model: DELL NVME ISE PE8010 RI M.2 480GB \| Units: sectors of 1 * 512 = 512 bytes \| Sector size (logical/physical): 512 bytes / 512 bytes \| I/O size (minimum/optimal): 512 bytes / 512 bytes \| Disklabel type: gpt \| Disk identifier: 5D296676-52CF-49CF-863A-6D3A3BD0604F \| \| Device Start End Sectors Size Type \| /dev/nvme0n1p1 2048 4095 2048 1M BIOS boot \| /dev/nvme0n1p2 4096 999423 995328 486M EFI System \| /dev/nvme0n1p3 999424 937701375 936701952 446.7G Linux RAID \| \| \| Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors \| Disk model: DELL NVME ISE PE8010 RI M.2 480GB \| Units: sectors of 1 * 512 = 512 bytes \| Sector size (logical/physical): 512 bytes / 512 bytes \| I/O size (minimum/optimal): 512 bytes / 512 bytes \| Disklabel type: gpt \| Disk identifier: 9AFA8ACF-D2CD-4224-BA0C-D38A6581D0F9 \| \| Device Start End Sectors Size Type \| /dev/nvme1n1p1 2048 4095 2048 1M BIOS boot \| /dev/nvme1n1p2 4096 999423 995328 486M EFI System \| /dev/nvme1n1p3 999424 937701375 936701952 446.7G Linux RAID \| [...] \| \| root@grml ~ # lsblk \| NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS \| loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/ngcp.squashfs \| /run/live/rootfs/ngcp.squashfs \| nvme0n1 259:0 0 447.1G 0 disk \| ├─nvme0n1p1 259:5 0 1M 0 part \| ├─nvme0n1p2 259:8 0 486M 0 part \| └─nvme0n1p3 259:9 0 446.7G 0 part \| └─md0 9:0 0 446.5G 0 raid1 \| ├─ngcp-root 253:0 0 10G 0 lvm /mnt \| ├─ngcp-fallback 253:1 0 10G 0 lvm \| └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data \| nvme1n1 259:4 0 447.1G 0 disk \| ├─nvme1n1p1 259:2 0 1M 0 part \| ├─nvme1n1p2 259:6 0 486M 0 part \| └─nvme1n1p3 259:7 0 446.7G 0 part \| └─md0 9:0 0 446.5G 0 raid1 \| ├─ngcp-root 253:0 0 10G 0 lvm /mnt \| ├─ngcp-fallback 253:1 0 10G 0 lvm \| └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data \| \| root@grml ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| md0 : active raid1 nvme0n1p3[1] nvme1n1p3[0] \| 468218880 blocks super 1.2 [2/2] [UU] \| [==>..................] resync = 12.7% (59516864/468218880) finish=33.1min speed=205685K/sec \| bitmap: 4/4 pages [16KB], 65536KB chunk \| \| unused devices: <none> Change-Id: Iaa7f49eef11ef6ad6209fe962bb8940a75a87c95	1 year ago
Sipwise Jenkins Builder	76893e3acb	Release new version 12.3.0.0+0~mr12.3.0.0	1 year ago
Michael Prokop	236cb2d1a7	MT#58926 Vagrant: ensure to have libxmu6 available We get the following error message in /var/log/vboxadd-install.log, /var/log/deployment-installer-debug.log, /var/log/daemon.log + /var/log/syslog: \| /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory This is caused by missing libxmu6: \| [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help \| /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory \| [sipwise-lab-trunk] sipwise@spce:~$ sudo apt install libxmu6 \| Reading package lists... Done \| Building dependency tree... Done \| Reading state information... Done \| The following NEW packages will be installed: \| libxmu6 \| 0 upgraded, 1 newly installed, 0 to remove and 83 not upgraded. \| Need to get 60.1 kB of archives. \| After this operation, 143 kB of additional disk space will be used. \| Get:1 https://debian.sipwise.com/debian bookworm/main amd64 libxmu6 amd64 2:1.1.3-3 [60.1 kB] \| Fetched 60.1 kB in 0s (199 kB/s) \| [...] \| [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help \| Oracle VM VirtualBox VBoxClient 7.0.6 \| Copyright (C) 2005-2023 Oracle and/or its affiliates \| \| Usage: VBoxClient --clipboard\|--draganddrop\|--checkhostversion\|--seamless\|--vmsvga\|--vmsvga-session \| [-d\|--nodaemon] \| \| Options: \| [...] It looks like lack of libxmu6 doesn't cause any actual problems for our use case (we don't use X.org at all), though given that libxmu6 is a small library package, let's try to get it working as expected and avoid the alarming errors on the logs. Thanks Guillem Jover for spotting and reporting Change-Id: I65f3dd496a4026f04fd9944fd7cc43d6abbdf336	1 year ago
Sipwise Jenkins Builder	0f384353f8	Release new version 12.2.0.0+0~mr12.2.0.0	1 year ago
Michael Prokop	8c3ab6b241	MT#57559 Always include zstd when bootstrapping systems During initial deployment of a system, we get warnings about lack of zstd: \| Setting up linux-image-6.1.0-13-amd64 (6.1.55-1) ... \| I: /vmlinuz.old is now a symlink to boot/vmlinuz-6.1.0-13-amd64 \| I: /initrd.img.old is now a symlink to boot/initrd.img-6.1.0-13-amd64 \| I: /vmlinuz is now a symlink to boot/vmlinuz-6.1.0-13-amd64 \| I: /initrd.img is now a symlink to boot/initrd.img-6.1.0-13-amd64 \| /etc/kernel/postinst.d/initramfs-tools: \| update-initramfs: Generating /boot/initrd.img-6.1.0-13-amd64 \| W: No zstd in /usr/bin:/sbin:/bin, using gzip \| [...] The initramfs generation and update overall runs four times within the initial bootstrapping of a system (we'll try to do something about this, but this is outside the scope of this). As of initramfs-tools v0.141, initramfs-tools uses zstd as default compression for initramfs. Version 0.142 is shipped with Debian/bookworm, and therefore it makes sense to have it available upfront. Note that also the initrd generation is faster with zstd (~10sec for zstd vs. ~13sec for gzip) and also the resulting initrd is smaller (~33MB for zstd vs ~39MB for gzip). By making sure that zstd is available straight from the very beginning and before ngcp-installer pulls it in later, we can avoid the warning message but also save >10 seconds of install time. Given that zstd is available even in Debian oldoldstable, let's install it unconditionally in all our systems. Thanks: Volodymyr Fedorov for reporting Change-Id: I56674c3c213f7c7a6e6cbce3c8e2e00a4cfbdbd4	1 year ago
Guillem Jover	9cceb8d655	MT#58356 ntp: Use ntpsec.service instead of ntp.service Even though the ntpsec.service contains an Alias for ntp.service, that does not work for us when the service has not yet been installed, so the first run will fail. Use the actual name to avoid this issue. Change-Id: I8f0ee3b38390a7e58c3bbee65fd96bfd4b717dfa	2 years ago
Sipwise Jenkins Builder	f483c18b82	Release new version 12.1.0.0+0~mr12.1.0.0	2 years ago
Guillem Jover	39949fcd06	MT#58356 Update packaging for bookworm - Add Rules-Requires-Root field. - Switch to Standards-Version 4.6.2. - Update copyright years. Change-Id: Ia24821937c439718750b1832b782cd3832dc9c19	2 years ago
Mykola Malkov	d132ecc4bc	MT#57165 Add ngcp-kernel-firmware package to grml-sipwise It's better to have this package in grml-sipwise image so any system with this network card can use all it's power even in deployment stage. Change-Id: I765efcf446a410a42ef156b2ccc2e6612a33ddd6	2 years ago
Sipwise Jenkins Builder	1239aeab8b	Release new version 12.0.1.0+0~mr12.0.1.0	2 years ago
Sipwise Jenkins Builder	366c412c1f	MT#57980 Add mr11.5 LTS key to bootstrap Now it contains: pub rsa4096 2015-03-05 [SC] [expires: 2029-10-12] 68A702B1FD8E422AAAA1ADA3773236EFF411A836 uid [ unknown] Sipwise GmbH (Sipwise Repository Key) <support@sipwise.com> sub rsa4096 2015-03-05 [E] [expires: 2029-10-12] pub rsa4096 2011-06-06 [SC] F7B8A739CE638D719A078C9859104633EE5E097D uid [ unknown] Sipwise autobuilder (Used to sign packages for autobuild) <development@sipwise.com> sub rsa4096 2011-06-06 [E] pub rsa4096 2022-05-31 [SCEA] [expires: 2032-05-28] 39EB73D5B54870181632E48786C3B4395CB844A2 uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2023-08-04 [SCEA] [expires: 2033-08-01] F0A595D85C375447BB09F25E34A72CE4979CA98A uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2021-05-04 [SCEA] [expires: 2031-05-02] AB7FE3DCD53767F6160406442A5CA71B542B9A22 uid [ unknown] Sipwise autobuilder <development@sipwise.com> Change-Id: I33c8a4e666f1a7f8b64d823c3d4e2550ca8dcf11	2 years ago
Michael Prokop	793a93bc43	MT#57453 vagrant_configuration: remove fake systemd presence after execution Let's restore system state of /run/systemd/system for VBoxLinuxAdditions, to avoid any unexpected side effects. Followup for git rev `8601193` Change-Id: I632c7d60ebb627c3a80d4c1f9b264d6d0a13b4f1	2 years ago
Michael Prokop	561303359e	MT#57453 Use tty1 for stdin when running under grml-autoconfig service Recent Grml ISOs, including our Grml-Sipwise ISO (v2023-06-01), include grml-autoconfig v0.20.3 which execute the grml-autoconfig service under `StandardInput=null`. This is necessary to not conflict with tty usage, like used with serial console. See `1e268ffe4f` Now that we run with /dev/null for stdin, we can't interact with the user, so let's try to detect when running from within grml-autoconfig's systemd unit, and if so assume that we're executing on /dev/tty1 and use/reopen that for stdin. Change-Id: Id55283c7f862487a6ef8acb8ab01f67a05bd8dd7	2 years ago
Michael Prokop	8601193128	MT#57453 vagrant_configuration: fake systemd presence As of git rev `6c960afee4` we're using the virtualbox-guest-additions-iso from bookworm. Previous versions of VBoxGuestAdditions had a simple test to check for present of systemd, quoting from /opt/VBoxGuestAdditions-6.1.22/routines.sh: \| use_systemd() \| { \| test ! -f /sbin/init \|\| test -L /sbin/init \| } Now in more recent versions of VBoxGuestAdditions[1], the systemd check was modified, quoting from /opt/VBoxGuestAdditions-7.0.6/routines.sh: \| use_systemd() \| { \| # First condition is what halfway recent systemd uses itself, and the \| # other two checks should cover everything back to v1. \| test -e /run/systemd/system \|\| test -e /sys/fs/cgroup/systemd \|\| test -e /cgroup/systemd \| } So if we're running inside a chroot as with our deployment.sh, it looks like a non-systemd system for VBoxGuestAdditions's installer, and we end up with installation and presence of /etc/init.d/vboxadd, leading to: \| root@spce:~# ls -lah /run/systemd/generator.late/ \| total 4.0K \| drwxr-xr-x 4 root root 100 Jul 18 00:20 . \| drwxr-xr-x 23 root root 580 Jul 18 00:20 .. \| drwxr-xr-x 2 root root 60 Jul 18 00:20 graphical.target.wants \| drwxr-xr-x 2 root root 60 Jul 18 00:20 multi-user.target.wants \| -rw-r--r-- 1 root root 537 Jul 18 00:20 vboxadd.service \| \| root@spce:~# systemctl cat vboxadd.service \| # /run/systemd/generator.late/vboxadd.service \| # Automatically generated by systemd-sysv-generator \| \| [Unit] \| Documentation=man:systemd-sysv-generator(8) \| SourcePath=/etc/init.d/vboxadd \| Description=LSB: VirtualBox Linux Additions kernel modules \| Before=multi-user.target \| Before=multi-user.target \| Before=multi-user.target \| Before=graphical.target \| Before=display-manager.service \| \| [Service] \| Type=forking \| Restart=no \| TimeoutSec=5min \| IgnoreSIGPIPE=no \| KillMode=process \| GuessMainPID=no \| RemainAfterExit=yes \| SuccessExitStatus=5 6 \| ExecStart=/etc/init.d/vboxadd start \| ExecStop=/etc/init.d/vboxadd stop We don't expect any init scripts to be present, as all our services must have systemd unit files. Therefore we check for absence of systemd's /run/systemd/generator.late in our system-tests, which started to fail with the upgrade to VBoxGuestAdditions-v7.0.6 due to the systemd presence detection mentioned above. Let's fake presence of systemd before invoking VBoxGuestAdditions's installer, to avoid ending up with unexpected vbox* init scripts. [1] See svn rev 92682: https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Installer/linux/routines.sh?rev=92682 https://www.virtualbox.org/changeset?old=92681&old_path=vbox%2Ftrunk%2Fsrc%2FVBox%2FInstaller%2Flinux%2Froutines.sh&new=92682&new_path=vbox%2Ftrunk%2Fsrc%2FVBox%2FInstaller%2Flinux%2Froutines.sh Change-Id: Ifd11460e3a8fd4f4c1269453a9b8376065861b8e	2 years ago
Victor Seva	6c960afee4	TT#104221 Use bookworm repos in ensure_packages_installed appropriately Support bookworm option in DEBIAN_RELEASE selection. We have support for it already. Use bookworm as fallback since nowadays we jumped to it. Change-Id: I118c1b5cf81fe57394495b5f745fc81032406c78	2 years ago
Michael Prokop	37163532ee	MT#56773 Use bullseye puppetlabs repository for bookworm To be able to upgrade our internal systems to Debian/bookworm we need to have puppet packages available. Upstream still doesn't provide any Debian packages (see https://tickets.puppetlabs.com/browse/PA-4995), though their AIO (All In One) packages for Debian/bullseye seem to be working on Debian/bookworm as well (at least for puppet-agent). So until we either migrated to puppet-agent as present in Debian/bookworm or upstream provides according AIO packages, let's use the puppet-agent packages we already use for our Debian/bullseye systems. Change-Id: I2211ffd79f70a2a79873e737b0b512bfb7492328	2 years ago
Mykola Malkov	3a942b1b8c	MT#57453 Switch docker image to bookworm Change-Id: I9cfc7f0f6062d5e4916c7ba18b72cbc3e8c8ebbb	2 years ago
Sipwise Jenkins Builder	1cb15c866e	Release new version 11.5.0.0+0~mr11.5.0.0	2 years ago
Michael Prokop	0fedba6144	MT#57643 Ensure /var/lib/dpkg/available exists on Debian releases <=buster Since version 1.20.0, dpkg no longer creates /var/lib/dpkg/available (see #647911). Now that we upgraded our Grml-Sipwise deployment system to bookworm, we have dpkg v1.21.22 on our live system, and mmdebstrap relies on dpkg of the host system for execution. But on Debian releases until and including buster, dpkg fails to operate with e.g. `dpkg --set-selections`, if /var/lib/dpkg/available doesn't exist: \| The following NEW packages will be installed: \| nullmailer \| [...] \| debconf: delaying package configuration, since apt-utils is not installed \| dpkg: error: failed to open package info file '/var/lib/dpkg/available' for reading: No such file or directory We could also switch from mmdebstrap to debootstrap for deploying Debian releases <=buster, but this would be slower and we use mmdebstrap since quite some time for everything. So instead let's create /var/lib/dpkg/available after bootstrapping the system. Reported towards mmdebstrap as #1037946. Change-Id: I0a87ca255d5eb7144a9c093051c0a6a3114a3c0b	2 years ago
Michael Prokop	eccdc586ae	MT#57644 puppet/git: allow ssh-rsa pubkey usage Now that our deployment system is based on Debian/bookworm, but our gerrit/git server still runs on Debian/bullseye, we run into the OpenSSH RSA issue (RSA signatures using the SHA-1 hash algorithm got disabled by default), see https://michael-prokop.at/blog/2023/06/11/what-to-expect-from-debian-bookworm-newinbookworm/ and https://www.jhanley.com/blog/ssh-signature-algorithm-ssh-rsa-error/ We need to enable ssh-rsa usage, otherwise deployment fails with: \| Warning: Permanently added '[gerrit.mgm.sipwise.com]:29418' (ED25519) to the list of known hosts. \| sign_and_send_pubkey: no mutual signature supported \| puppet-r10k@gerrit.mgm.sipwise.com: Permission denied (publickey). \| fatal: Could not read from remote repository. Change-Id: I5894170dab033d52a2612beea7b6f27ab06cc586	2 years ago
Michael Prokop	8cfb8c8392	MT#57630 Check online connectivity to work around Intel E810 / ice issue Deploying the Debian/bookworm based NGCP system fails on a Lenovo sr250 v2 node with an Intel E810 network card: \| # lshw -c net -businfo \| Bus info Device Class Description \| ======================================================= \| pci@0000:01:00.0 eth0 network Ethernet Controller E810-XXV for SFP \| pci@0000:01:00.1 eth1 network Ethernet Controller E810-XXV for SFP \| # lshw -c net \| -network:0 \| description: Ethernet interface \| product: Ethernet Controller E810-XXV for SFP \| vendor: Intel Corporation \| physical id: 0 \| bus info: pci@0000:01:00.0 \| logical name: eth0 \| version: 02 \| serial: [...] \| size: 10Gbit/s \| capacity: 25Gbit/s \| width: 64 bits \| clock: 33MHz \| capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd \| configuration: autonegotiation=off broadcast=yes driver=ice driverversion=1.11.14 duplex=full firmware=2.25 0x80007027 1.2934.0 ip=192.168.90.51 latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s \| resources: iomemory:400-3ff iomemory:400-3ff irq:16 memory:4002000000-4003ffffff memory:4006010000-400601ffff memory:a1d00000-a1dfffff memory:4005000000-4005ffffff memory:4006220000-400641ffff We set up the /etc/network/interfaces file by invoking Grml's netcardconfig script in automated mode, like: NET_DEV=eth0 METHOD=static IPADDR=192.168.90.51 NETMASK=255.255.255.248 GATEWAY=192.168.90.49 /usr/sbin/netcardconfig The resulting /etc/network/interfaces gets used as base for usage inside the NGCP chroot/target system. netcardconfig shuts down the network interface (eth0 in the example above) via ifdown, then sleeps for 3 seconds and re-enables the interface (via ifup) with the new configuration. This used to work fine so far, but with the Intel e810 network card and kernel version 6.1.0-9-amd64 from Debian/bookworm we see a link failure and it takes ~10 seconds until the network device is up and running again. The following vagrant_configuration() execution from deployment.sh then fails: \| +11:41:01 (netscript.grml:1022): vagrant_configuration(): wget -O /var/tmp/id_rsa_sipwise.pub http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| --2023-06-11 11:41:01-- http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub \| Resolving builder.mgm.sipwise.com (builder.mgm.sipwise.com)... failed: Name or service not known. \| wget: unable to resolve host address 'builder.mgm.sipwise.com' However, when we retry it again just a bit later, the network works fine again. During investigation we identified that the network card flips the port, quoting the related log from the connected Cisco nexus 5020 switch (with fast stp learning mode): \| nexus5k %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/33 is down (Link failure) It seems to be related to some autonegotiation problem, as when we execute `ethtool -A eth0 rx on tx on` (no matter whether with `on` or `off`), we see: \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: Autoneg did not complete so changing settings may not result in an actual change. \| [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: NIC Link is Down \| [Tue Jun 13 08:51:45 2023] ice 0000:01:00.0 eth0: NIC Link is up 10 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: NONE, Autoneg Advertised: On, Autoneg Negotiated: False, Flow Control: Rx/Tx FTR: \| root@sp1 ~ # ethtool -A eth0 autoneg off \| netlink error: Operation not supported \| 76 root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # ethtool -A eth0 autoneg on \| root@sp1 ~ # ethtool eth0 \| grep -C1 Auto-negotiation \| Duplex: Full \| Auto-negotiation: off \| Port: FIBRE \| root@sp1 ~ # dmesg -T \| tail -1 \| [Tue Jun 13 08:53:26 2023] ice 0000:01:00.0 eth0: To change autoneg please use: ethtool -s <dev> autoneg <on\|off> \| root@sp1 ~ # ethtool -s eth0 autoneg off \| root@sp1 ~ # ethtool -s eth0 autoneg on \| netlink error: link settings update failed \| netlink error: Operation not supported \| 75 root@sp1 ~ # As a workaround, at least until we have a better fix/solution, we try to reach the default gateway (or fall back to the repository host if gateway couldn't be identified) via ICMP/ping, and once that works we we continue as usual. But even if that should fail we continue execution, to minimize behavior change but have a workaround for this specific situation available. FTR, broken system: \| root@sp1 ~ # ethtool -i eth0 \| driver: ice \| version: 6.1.0-9-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Whereas with kernel 5.10.0-23-amd64 from Debian/bullseye we don't seem to see that behavior: \| root@sp1:~# ethtool -i neth0 \| driver: ice \| version: 5.10.0-23-amd64 \| firmware-version: 2.25 0x80007027 1.2934.0 \| [...] Also using latest available ice v1.11.14 (from https://sourceforge.net/projects/e1000/files/ice%20stable/1.11.14/) on Kernel version 6.1.0-9-amd64 doesn't bring any change: \| root@sp1 ~ # modinfo ice \| filename: /lib/modules/6.1.0-9-amd64/updates/drivers/net/ethernet/intel/ice/ice.ko \| firmware: intel/ice/ddp/ice.pkg \| version: 1.11.14 \| license: GPL v2 \| description: Intel(R) Ethernet Connection E800 Series Linux Driver \| author: Intel Corporation, <linux.nics@intel.com> \| srcversion: 818E9C817731C98A25470C0 \| alias: pci:v00008086d00001888svsdbcsci \| [...] \| alias: pci:v00008086d00001591svsdbcsci* \| depends: ptp \| retpoline: Y \| name: ice \| vermagic: 6.1.0-9-amd64 SMP preempt mod_unload modversions \| parm: debug:netif level (0=none,...,16=all) (int) \| parm: fwlog_level:FW event level to log. All levels <= to the specified value are enabled. Values: 0=none, 1=error, 2=warning, 3=normal, 4=verbose. Invalid values: >=5 \| (ushort) \| parm: fwlog_events:FW events to log (32-bit mask) \| (ulong) \| root@sp1 ~ # ethtool -i eth0 \| head -3 \| driver: ice \| version: 1.11.14 \| firmware-version: 2.25 0x80007027 1.2934.0 \| root@sp1 ~ # Change-Id: Ieafe648be4e06ed0d936611ebaf8ee54266b6f3c	2 years ago
Michael Prokop	f4da3e094e	MT#57049 Ensure SW-RAID device is inactive before re-reading partition table Re-reading of disks fails if the mdadm SW-RAID device is still active: \| root@sp1 ~ # cat /proc/mdstat \| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \| md0 : active raid1 sdb3[1] sda3[0] \| 468218880 blocks super 1.2 [2/2] [UU] \| [========>............] resync = 42.2% (197855168/468218880) finish=22.4min speed=200756K/sec \| bitmap: 3/4 pages [12KB], 65536KB chunk \| \| unused devices: <none> \| root@sp1 ~ # blockdev --rereadpt /dev/sdb \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # blockdev --rereadpt /dev/sda \| blockdev: ioctl error on BLKRRPART: Device or resource busy \| 1 root@sp1 ~ # Only if we stop the mdadm SW-RAID device, then we can re-read the partition table: \| root@sp1 ~ # mdadm --stop /dev/md0 \| mdadm: stopped /dev/md0 \| root@sp1 ~ # blockdev --rereadpt /dev/sda \| root@sp1 ~ # This behavior isn't new and unrelated to Debian/bookworm but was spotted while debugging an unrelated issue. FTR: we re-read the partition table (via `blockdev --rereadpt`) to ensure that /etc/fstab of the live system is up2date and matches the current system state. While this isn't stricly needed, we preserve existing behavior and also try to avoid a hard "cut" of a possibly ongoing SW-RAID sync. Change-Id: I735b00423e6efa932f74b78a38ed023576e5d306	2 years ago

1 2 3 4 5 ...

542 Commits (master) All Branches Search

542 Commits (master)

All Branches