master
mr13.3
mr13.3.1
mr13.2
mr13.2.1
mr13.1
mr13.1.1
mr13.0
mr13.0.1
mr10.5
mr10.5.8
mr12.5.1
mr12.5
mr9.5
mr9.5.9
mr12.3.1
mr12.4.1
mr12.3
mr12.4
mr11.3
mr10.5.7
mr12.2
mr12.2.1
mr12.1
mr12.1.1
mr8.5
mr8.5.12
mr10.5.6
mr12.0
mr12.0.1
mr9.5.8
mr11.5
mr11.5.1
mr10.5.5
mr11.4.1
mr11.4
mr8.5.11
mr9.5.7
mr11.3.1
mr10.5.4
mr11.2
mr11.2.1
mr10.5.3
mr8.5.10
mr9.5.6
mr11.1
mr11.1.1
mr10.5.2
mr11.0
mr11.0.1
mr7.5
mr7.5.13
mr10.5.1
mr9.5.5
mr8.5.9
mr7.5.12
mr10.4
mr10.4.1
mr8.5.8
mr10.3
mr9.5.4
mr10.3.1
mr7.5.11
mr9.5.3
mr10.2
mr10.2.1
mr8.5.7
mr6.5
mr6.5.13
mr10.1
mr10.1.1
mr8.5.6
mr7.5.10
mr8.5.5
mr9.5.2
mr10.0.1
mr10.0
mr9.1.1
mr9.5.1
mr7.5.9
mr9.1
mr9.4
mr9.4.1
mr8.5.4
mr7.5.8
mr6.5.12
mr7.5.1
mr7.5.4
mr7.5.3
mr8.5.1
mr7.5.2
mr7.5.6
mr7.5.5
mr8.5.2
mr9.3
mr9.3.1
mr8.5.3
mr7.5.7
mr9.2
mr9.2.1
mr6.5.11
legacy_releases_before_mr6.2
mr9.0
mr9.0.1
mr6.5.10
mr8.4
mr8.4.2
mr8.3
mr8.3.2
mr8.4.1
mr6.5.9
mr8.2
mr8.2.2
mr8.3.1
mr6.5.8
mr8.1
mr8.1.2
mr6.5.7
mr6.5.6
mr7.4.1
mr7.4.2
mr6.2.1
mr6.2.2
mr6.3.1
mr6.3.2
mr6.4.1
mr6.4.2
mr6.5.1
mr6.5.2
mr6.5.3
mr6.5.4
mr6.5.5
mr7.0.1
mr7.0.2
mr7.1.1
mr7.1.2
mr7.2.1
mr7.2.2
mr7.3.1
mr7.3.2
mr8.2.1
mr8.0
mr8.0.2
mr8.1.1
mr8.0.1
mr7.4
mr7.3
mr7.2
mr7.1
mr7.0
mr6.4
mr6.3
mr6.2
mr10.0.1.1
mr10.0.1.2
mr10.1.1.1
mr10.2.1.1
mr10.3.1.1
mr10.4.1.1
mr10.5.1.1
mr10.5.2.1
mr10.5.3.1
mr10.5.4.1
mr10.5.5.1
mr10.5.6.1
mr10.5.7.1
mr10.5.8.1
mr11.0.1.1
mr11.1.1.1
mr11.2.1.1
mr11.3.1.1
mr11.4.1.1
mr11.4.1.2
mr11.5.1.1
mr12.0.1.1
mr12.1.1.1
mr12.2.1.1
mr12.3.1.1
mr12.3.1.2
mr12.4.1.1
mr12.4.1.2
mr12.5.1.1
mr12.5.1.2
mr13.0.1.1
mr13.1.1.1
mr13.2.1.1
mr13.3.1.1
mr6.2.1.1
mr6.2.1.2
mr6.2.2.1
mr6.2.2.2
mr6.3.1.1
mr6.3.1.2
mr6.3.2.1
mr6.3.2.2
mr6.4.1.1
mr6.4.1.2
mr6.4.1.3
mr6.4.2.1
mr6.4.2.2
mr6.5.1.1
mr6.5.1.2
mr6.5.10.1
mr6.5.11.1
mr6.5.12.1
mr6.5.13.1
mr6.5.2.1
mr6.5.2.2
mr6.5.3.1
mr6.5.3.2
mr6.5.4.1
mr6.5.4.2
mr6.5.5.1
mr6.5.5.2
mr6.5.6.1
mr6.5.6.2
mr6.5.6.3
mr6.5.6.4
mr6.5.7.1
mr6.5.7.2
mr6.5.8.1
mr6.5.9.1
mr7.0.1.1
mr7.0.1.2
mr7.0.2.1
mr7.0.2.2
mr7.1.1.1
mr7.1.1.2
mr7.1.2.1
mr7.1.2.2
mr7.2.1.1
mr7.2.1.2
mr7.2.2.1
mr7.2.2.2
mr7.3.1.1
mr7.3.1.2
mr7.3.1.3
mr7.3.2.1
mr7.3.2.2
mr7.4.1.1
mr7.4.1.2
mr7.4.2.1
mr7.4.2.2
mr7.5.1.1
mr7.5.1.2
mr7.5.1.3
mr7.5.10.1
mr7.5.10.2
mr7.5.11.1
mr7.5.12.1
mr7.5.13.1
mr7.5.2.1
mr7.5.2.2
mr7.5.3.1
mr7.5.3.2
mr7.5.4.1
mr7.5.4.2
mr7.5.5.1
mr7.5.5.2
mr7.5.6.1
mr7.5.6.2
mr7.5.7.1
mr7.5.7.2
mr7.5.8.1
mr7.5.9.1
mr8.0.1.1
mr8.0.1.2
mr8.0.2.1
mr8.1.1.1
mr8.1.2.1
mr8.2.1.1
mr8.2.2.1
mr8.3.1.1
mr8.3.2.1
mr8.4.1.1
mr8.4.2.1
mr8.5.1.1
mr8.5.1.2
mr8.5.1.3
mr8.5.10.1
mr8.5.11.1
mr8.5.12.1
mr8.5.2.1
mr8.5.2.2
mr8.5.3.1
mr8.5.3.2
mr8.5.4.1
mr8.5.5.1
mr8.5.5.2
mr8.5.6.1
mr8.5.7.1
mr8.5.8.1
mr8.5.9.1
mr9.0.1.1
mr9.1.1.1
mr9.1.1.2
mr9.2.1.1
mr9.3.1.1
mr9.4.1.1
mr9.5.1.1
mr9.5.2.1
mr9.5.3.1
mr9.5.4.1
mr9.5.5.1
mr9.5.6.1
mr9.5.7.1
mr9.5.8.1
mr9.5.9.1
${ noResults }
542 Commits (master)
Author | SHA1 | Message | Date |
---|---|---|---|
|
c828990503 |
MT#62436 Use virtualbox-guest-additions ISO from upstream on Debian/trixie
virtualbox-guest-additions-iso v7.0.20-1 as present in current Debian/trixie doesn't yet support kernel v6.12.22-1 (being the current kernel version in Debian/trixie), while upstream supports kernel 6.12 as of VirtualBox 7.1.4. Reported towards Debian as https://bugs.debian.org/1104024 FTR: | mprokop@jenkins1 ~ % cd /var/www/files | mprokop@jenkins1 ~www/files % wget https://download.virtualbox.org/virtualbox/7.1.8/VBoxGuestAdditions_7.1.8.iso | [...] | mprokop@jenkins1 ~www/files % curl -s https://download.virtualbox.org/virtualbox/7.1.8/SHA256SUMS | sha256sum -c --ignore-missing | VBoxGuestAdditions_7.1.8.iso: OK Change-Id: I32aa7806e375c4b85084a99d5a6903f632807694 |
1 day ago |
|
112f883d49 |
MT#62436 ensure_packages_installed: to not get stuck on conf file conflicts
Our deployment ISO might be outdated and when installing any additional packages, we might get stuck in dpkg: | +10:10:34 (netscript.grml:311): ensure_packages_installed(): DEBIAN_FRONTEND=noninteractive | +10:10:34 (netscript.grml:311): ensure_packages_installed(): apt-get -o dir::cache=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/cachedir -o dir::state=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/statedir -o dir::etc=/tmp/ngcp-deployment-ensure-tmp.BKSocMV4KB/etc -o dir::e | tc::trustedparts=/etc/apt/trusted.gpg.d/ -y --no-install-recommends install jq | Reading package lists... | Building dependency tree... | The following additional packages will be installed: | [...] | Get:33 https://debian.sipwise.com/debian trixie/main amd64 libnss-myhostname amd64 257.5-2 [113 kB] | Preconfiguring packages ... | Fetched 25.3 MB in 4s (6777 kB/s) | (Reading database ... 32224 files and directories currently installed.) | Preparing to unpack .../base-files_13.7_amd64.deb ... | Unpacking base-files (13.7) over (12.4+deb12u10) ... | Setting up base-files (13.7) ... | Installing new version of config file /etc/debian_version ... | | Configuration file '/etc/issue' | ==> Modified (by you or by a script) since installation. | ==> Package distributor has shipped an updated version. | What would you like to do about it ? Your options are: | Y or I : install the package maintainer's version | N or O : keep your currently-installed version | D : show the differences between the versions | Z : start a shell to examine the situation | The default action is to keep your current version. | | *** issue (Y/I/N/O/D/Z) [default=N] ? # Avoid this, by setting DPKG option `--force-confnew`. Change-Id: Ic5fed3dbe4744e07290159cec6952468c0557c29 |
1 day ago |
|
779b43b915 |
MT#62436 Support Debian/trixie in ensure_packages_installed
vboxadd-service.service fails on our Debian/trixie systems: | root@spce:~# lsb_release -c | Codename: trixie | | root@spce:~# systemctl --failed | UNIT LOAD ACTIVE SUB DESCRIPTION | ● vboxadd-service.service loaded failed failed VirtualBox Guest Additions Services Daemon | | Legend: LOAD → Reflects whether the unit definition was properly loaded. | ACTIVE → The high-level unit activation state, i.e. generalization of SUB. | SUB → The low-level unit activation state, values depend on unit type. | | 1 loaded units listed. | | root@spce:~# sudo systemctl status vboxadd-service.service | × vboxadd-service.service - VirtualBox Guest Additions Services Daemon | Loaded: loaded (/etc/systemd/system/vboxadd-service.service; disabled; preset: disabled) | Drop-In: /etc/systemd/system/vboxadd-service.service.d | └─override.conf | Active: failed (Result: exit-code) since Thu 2025-04-24 09:08:15 CEST; 34min ago | Invocation: 4e151a29f0054a90a717a928fcfb3f8d | Mem peak: 2.2M | CPU: 17ms | | Apr 24 09:08:15 spce systemd[1]: Starting vboxadd-service.service... | Apr 24 09:08:15 spce vboxadd-service[1934]: vboxadd-service.sh: Starting VirtualBox Guest Addition service. | Apr 24 09:08:15 spce vboxadd-service.sh[1937]: Starting VirtualBox Guest Addition service. | Apr 24 09:08:15 spce vboxadd-service[1940]: VBoxService: error: VbglR3Init failed with rc=VERR_FILE_NOT_FOUND | Apr 24 09:08:15 spce vboxadd-service.sh[1943]: VirtualBox Guest Addition service started. | Apr 24 09:08:15 spce systemd[1]: vboxadd-service.service: Control process exited, code=exited, status=1/FAILURE | Apr 24 09:08:15 spce systemd[1]: vboxadd-service.service: Failed with result 'exit-code'. | Apr 24 09:08:15 spce systemd[1]: Failed to start vboxadd-service.service. | | root@spce:~# cat /etc/systemd/system/vboxadd.service.d/override.conf | [Unit] | ConditionVirtualization=oracle | | root@spce:~# cat /var/log/vboxadd-setup.log | Building the main Guest Additions 7.0.6 module for kernel 6.12.22-amd64. | Error building the module. Build output follows. | make V=1 CONFIG_MODULE_SIG= CONFIG_MODULE_SIG_ALL= -C /lib/modules/6.12.22-amd64/build M=/tmp/vbox.0 SRCROOT=/tmp/vbox.0 -j2 modules | make[1]: warning: -j2 forced in submake: resetting jobserver mode. | [...] | [,,,] /tmp/vbox.0/VBoxGuest-common.c | /tmp/vbox.0/VBoxGuest-linux.c:196:21: error: ‘no_llseek’ undeclared here (not in a function); did you mean ‘noop_llseek’? | 196 | llseek: no_llseek, | | ^~~~~~~~~ | | noop_llseek | /tmp/vbox.0/VBoxGuest-linux.c: In function ‘vgdrvLinuxParamLogGrpSet’: | /tmp/vbox.0/VBoxGuest-linux.c:1364:9: error: implicit declaration of function ‘strlcpy’; did you mean ‘strncpy’? [-Wimplicit-function-declaration] | 1364 | strlcpy(&g_szLogGrp[0], pszValue, sizeof(g_szLogGrp)); | | ^~~~~~~ | | strncpy | make[2]: *** [/usr/src/linux-headers-6.12.22-common/scripts/Makefile.build:234: /tmp/vbox.0/VBoxGuest-linux.o] Error 1 | make[2]: *** Waiting for unfinished jobs.... | [...] We get virtualbox-guest-additions-iso v7.0.6-1 for Debian stable/bookworm, but virtualbox-guest-additions-iso v7.0.20-1 is available in current Debian testing AKA trixie. Ensure we use the package from trixie for trixie based systems, even though the the VirtualBox Guest Additions v7.0.20 don't work for kernel 6.12.22 either, yet. Also adjust ensure_packages_installed to fail installation, if we're using a yet unknown/unexpected Debian release, to not fall back to Debian/bookworm, to prevent issue like it has been observed here. See MT#60815 for main tracking issue WRT Debian/trixie Change-Id: I030525d37edbe1cf75065d021b51d38273ce81ef |
1 day ago |
|
b2e2954852 |
MT#62436 Fix shellcheck issues + parse IP information programmatically
As reported when sending new deployment-iso reviews, triggered by newer docker image / shellcheck: | not ok 1 source/templates/scripts/includes/deployment.sh:1543:10: warning: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a. [SC2206] | not ok 2 source/templates/scripts/includes/deployment.sh:1903:22: warning: Prefer mapfile or read -a to split command output (or quote to avoid splitting). [SC2207] | not ok 3 source/templates/scripts/includes/deployment.sh:2275:20: warning: Prefer mapfile or read -a to split command output (or quote to avoid splitting). [SC2207] | not ok 4 source/templates/scripts/includes/deployment.sh:2486:12: note: Not following: ./etc/profile.d/puppet-agent.sh was not specified as input (see shellcheck -x). [SC1091] Let's take this as a chance to properly parse ip(8) output via its JSON output, instead of awk/sed magic. Change-Id: I723959626fb514ab9e57202b0e5f415b411f5a01 |
1 day ago |
|
4b7a0e518b |
Release new version 13.4.0.0+0~mr13.4.0.0
|
2 weeks ago |
|
dfd46069e7 |
MT#62436 Remove workaround for vboxadd services
We have made these services conditional on running inside a VirtualBox VM, so we do not need to remove them anymore. Change-Id: I6dc563688ba5b0c5e935b0cb88767fcb05ab9a19 |
3 weeks ago |
|
8dbd67c82d |
Release new version 13.3.0.0+0~mr13.3.0.0
|
3 months ago |
|
6dac69d9df |
Release new version 13.2.0.0+0~mr13.2.0.0
|
5 months ago |
|
41029ed891 |
MT#61264 Mark EFI partition as such only when running in an EFI environment
On Debian/trixie we get a failing efi.mount systemd unit:
| root@sp1:~# systemctl --failed
| UNIT LOAD ACTIVE SUB DESCRIPTION
| ● efi.mount loaded failed failed EFI System Partition Automount
|
| Legend: LOAD → Reflects whether the unit definition was properly loaded.
| ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
| SUB → The low-level unit activation state, values depend on unit type.
|
| 1 loaded units listed.
|
| root@sp1:~# systemctl status efi.mount
| × efi.mount - EFI System Partition Automount
| Loaded: loaded (/run/systemd/generator.late/efi.mount; generated)
| Active: failed (Result: exit-code) since Fri 2024-11-15 17:20:59 CET; 28min ago
| Invocation: 62c7b659dfd540e294f4b1f6fcda5e13
| TriggeredBy: ● efi.automount
| Where: /efi
| What: /dev/disk/by-diskseq/9-part2
| Docs: man:systemd-gpt-auto-generator(8)
| Mem peak: 1.5M
| CPU: 8ms
|
| Nov 15 17:20:59 sp1 systemd[1]: Mounting efi.mount - EFI System Partition Automount...
| Nov 15 17:20:59 sp1 mount[631]: mount: /efi: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error.
| Nov 15 17:20:59 sp1 mount[631]: dmesg(1) may have more information after failed mount system call.
| Nov 15 17:20:59 sp1 systemd[1]: efi.mount: Mount process exited, code=exited, status=32/n/a
| Nov 15 17:20:59 sp1 systemd[1]: efi.mount: Failed with result 'exit-code'.
| Nov 15 17:20:59 sp1 systemd[1]: Failed to mount efi.mount - EFI System Partition Automount.
|
| root@sp1:~# ls -la /efi
| ls: cannot open directory '/efi': No such device
|
| root@sp1:~# ls -la /dev/disk/by-diskseq/9-part2
| lrwxrwxrwx 1 root root 10 Nov 15 17:20 /dev/disk/by-diskseq/9-part2 -> ../../sda2
|
| root@sp1:~# blkid /dev/sda2
| /dev/sda2: PARTLABEL="EFI System" PARTUUID="fa67b52e-c018-401d-ac71-fad324cad193"
The efi.mount systemd unit is automatically generated by
systemd-gpt-auto-generator. Quoting from systemd-gpt-auto-generator(8):
| The ESP is mounted to /boot/ if that directory exists and is not used
| for XBOOTLDR, and otherwise to /efi/
This got introduced as of systemd v254, see
|
5 months ago |
|
cfe9cceb6a |
MT#61271 trixie: adjust sshd_config after system is installed
If we set up /etc/ssh/sshd_config early in early system deployment, we end up with an empty /etc/ssh/sshd_config configuration file with only our own changes: | root@spce:~# cat /etc/ssh/sshd_config | # added by deployment.sh | PerSourcePenalties no | # end of deployment.sh changes | ### Added by ngcp-installer | PermitRootLogin yes The other defaults of sshd are OK for us, but for automated SSH logins we also need: AuthorizedKeysFile %h/.ssh/authorized_keys %h/.ssh/sipwise_vagrant_key And for SCP-ing files we also need: Subsystem sftp /usr/lib/openssh/sftp-server Otherwise our Jenkins job fail due to failing ssh/scp actions. So instead move our trixie specific code in deployment.sh for adjusting /etc/ssh/sshd_config to be executed *after* installing base system. Then the openssh-server package sets up /etc/ssh/sshd_config as expected, and we only extend its configuration then. While at it, explicitly mark beginning and end of our changes. Change-Id: I68a235b55e9cf18c39e9034b7f3b2ed0ffd237f0 |
6 months ago |
|
6eee97de7b |
MT#61265 trixie: avoid SSH login failures due to OpenSSH penalize feature
Our https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/ matrix no longer provides builds for debian/trixie, because its daily-build-images subproject Jenkins job with its proxmox-vm-clean-fs job failed to run. After running proxmox-vm-clean-fs under `set -x`, and also overriding the ssh_wrapper function with `ssh -v ...`, I managed to grab this from the Jenkins job execution: | + ssh -v -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'ServerAliveInterval 10' -o 'ConnectTimeout 15' 192.168.210.101 'rm -vf /etc/udev/rules.d/70-persistent-net.rules' | OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.14 4 Jun 2024 | debug1: Reading configuration data /var/lib/jenkins/.ssh/config | debug1: /var/lib/jenkins/.ssh/config line 7: Applying options for 192.168.* | debug1: Reading configuration data /etc/ssh/ssh_config | debug1: /etc/ssh/ssh_config line 50: Applying options for * | debug1: /etc/ssh/ssh_config line 57: Deprecated option "useroaming" | debug1: Connecting to 192.168.210.101 [192.168.210.101] port 22. | debug1: fd 3 clearing O_NONBLOCK | debug1: Connection established. | debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise type 0 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise-cert type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa type 0 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa-cert type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_dsa type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_dsa-cert type -1 | debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u3 | debug1: kex_exchange_identification: banner line 0: Not allowed at this time The `Not allowed at this time` pointed to a new OpenSSH feature, which triggered the regression for us. OpenSSH introduced options to penalize undesirable behavior, see https://undeadly.org/cgi?action=article;sid=20240607042157 and https://www.openssh.com/releasenotes.html#9.9p1 and https://sources.debian.org/src/openssh/1:9.9p1-1/sshd.c/?hl=576#L573 This is now present as of openssh-server v1:9.9p1-1 since end of September 2024 also in Debian/trixie. Now, when too many SSH logins fail, a client system can't necessarily no longer connect via SSH due this new penalty behavior. And indeed, within our Jenkins job "daily-build-install-vm" we try to collect several log files through our grab_log and SSH wrapper: | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /tmp/ngcp-installer-cmdline.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-cmdline.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/syslog /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/syslog | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/boot /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/boot We even execute this grab_log wrapper twice: once for the running Grml live system, and once when we booted into the actually deployed system. This works fine for the Grml live system situation, but as root logins aren't allowed by default in OpenSSH since quite some time, all the sipwise-ssh-copier runs with user/password against a plain Debian system then fail. As a consequence, we lock ourselves out of the system with all those SSH login failures, and the Jenkins job proxmox-vm-clean-fs then runs into the OpenSSH penalty, which causes the trixie/debian job to fail. We use our Debian images as base for further configuration, where we control the sshd_config file through our ngcpcfg system anyways, so the `PerSourcePenalties no` setting is supposed to disappear then. FTR: We could also enable `PermitRootLogin yes` in sshd_config to get the grab_log working, though this didn't have any relevance for us so far. Disabling only the `PerSourcePenalties` feature feels like a better trade-off, at least security wise, for now. Change-Id: Ibf16019b4787cc63d450501c8bccebeac77dd9f1 |
6 months ago |
|
4debc55f6b |
Release new version 13.1.0.0+0~mr13.1.0.0
|
7 months ago |
|
862c84ccc6 |
MT#60698 Add mr12.5 LTS key to bootstrap
Now it contains: pub rsa4096 2015-03-05 [SC] [expires: 2029-10-12] 68A702B1FD8E422AAAA1ADA3773236EFF411A836 uid [ unknown] Sipwise GmbH (Sipwise Repository Key) <support@sipwise.com> sub rsa4096 2015-03-05 [E] [expires: 2029-10-12] pub rsa4096 2011-06-06 [SC] F7B8A739CE638D719A078C9859104633EE5E097D uid [ unknown] Sipwise autobuilder (Used to sign packages for autobuild) <development@sipwise.com> sub rsa4096 2011-06-06 [E] pub rsa4096 2022-05-31 [SCEA] [expires: 2032-05-28] 39EB73D5B54870181632E48786C3B4395CB844A2 uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2023-08-04 [SCEA] [expires: 2033-08-01] F0A595D85C375447BB09F25E34A72CE4979CA98A uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2024-08-14 [SCEA] [expires: 2034-08-12] A164D3A12AC0F6AB8F737EF66D1B7D01D2AD9C24 uid [ unknown] Sipwise autobuilder <development@sipwise.com> Change-Id: I142de8611572fd35fa6bbac3695b236a1b3f9a97 |
8 months ago |
|
88efd48cad |
Release new version 13.0.0.0+0~mr13.0.0.0
|
9 months ago |
|
cf94193f88 |
MT#60284 Ensure to start qemu-guest-agent only after package got installed
We install the qemu-guest-agent package in ensure_packages_installed().
Try to start the qemu-guest-agent service only afterwards therefore.
Fixup for commit
|
11 months ago |
|
4a292ab4be |
MT#60284 Only check whether /dev/virtio-ports/org.qemu.guest_agent.0 exists
/dev/virtio-ports/org.qemu.guest_agent.0 usually is a symlink to the
character device /dev/vport1p1. So adjust the device check accordingly
and only verify it exists, but don't expected any special file type.
This actually matches the behavior we also have in ngcp-installer.
Fixup for commit
|
11 months ago |
|
82e6638b40 |
MT#60284 Make sure qemu-guest-agent is available
Now that we enabled the QEMU Guest Agent option for our PVE VMs, we need
to have qemu-guest-agent present and active. Otherwise the VMs might
fail to shut down, like with our debian/sipwise/docker Debian systems
which are created via
https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/:
| [proxmox-vm-shutdown] $ /bin/sh -e /tmp/env-proxmox-vm-shutdown7956268380939677154.sh
| [environment-script] Adding variable 'vm1reset' with value 'NO'
| [environment-script] Adding variable 'vm2' with value 'none'
| [environment-script] Adding variable 'vm1' with value 'none'
| [environment-script] Adding variable 'vm2reset' with value 'NO'
| [proxmox-vm-shutdown] $ /bin/bash /tmp/jenkins14192704603218787414.sh
| Using safe VM 'shutdown' for modern releases (mr6.5+). Executing action 'shutdown'...
| Shutting down VM 106
| Build timed out (after 10 minutes). Marking the build as aborted.
| Build was aborted
| [WS-CLEANUP] Deleting project workspace...
Let's make sure qemu-guest-agent is available in our Grml live system.
We added qemu-guest-agent to the package list of our Grml Sipwise ISO
(see git rev
|
11 months ago |
|
24841c09eb |
MT#60283 Update grml-live to latest stable release v0.47.7
Change-Id: Ia157034ebfadb884f475802046a596937b4afac4 |
11 months ago |
|
c30b0b5af6 |
MT#60283 Update grml2usb to latest stable release v0.19.2
Change-Id: Ic74d4f00c5b67baf135f6249acc81dfc214ac77c |
11 months ago |
|
65c3fea4c5 |
MT#60284 Provide qemu-guest-agent in our Grml Sipwise ISO
Otherwise we lack qemu-guest-agent integration in our VMs when running Grml live system. Change-Id: Ie61d85c36dfbddddfbd59b46b6bfc4f0e98b587a |
11 months ago |
|
aff8154df7 |
Release new version 12.5.0.0+0~mr12.5.0.0
|
11 months ago |
|
6cf4786735 |
MT#59872 Remove NGCP_PXE_INSTALL variable
With this variable we had some tricks in ngcp-initial-configuration if the Pro sp2 node is installer via iPXE/cm image. Now we support installation of sp2 via iPXE only so no need to pass this variable. But we need to keep parent ngcppxeinstall parameter as we need this information for netcardconfig. Change-Id: I20491289917cbb427ad6f5670f108c632838be71 |
1 year ago |
|
0fb8327415 |
MT#59872 Remove Pro sp2 from boot menu
We are dropping the scenario when sp2 node is installed from cd image so remove appropriate part of the code. Change-Id: Idced6b43a21add903dca070aa68f84b77acba28e |
1 year ago |
|
0a91a49826 |
MT#58014 Remove support for fetching OpenPGP certificates from keyservers
The code trying to fetch the OpenPGP certificate from a keyserver has
been non-functional for a while as the GPG_KEY_SERVER variable was
removed in commit
|
1 year ago |
|
362f7cbea1 |
Release new version 12.4.0.0+0~mr12.4.0.0
|
1 year ago |
|
e99f33e11a |
TT#118659 Do not fail when deploying SW-RAID if no RAID was present yet
Followup fix for commit
|
1 year ago |
|
1d59d89d04 |
TT#118659 Do not abort on disk partition listing failures
We identify any existing partitions of the disk we need to wipe via:
| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda
| sda
| sda1
| sda2
| sda3
| root@license42 ~ # blockdevice="/dev/sda"
| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda | grep -v "^${blockdevice#\/dev\/}$"
| sda1
| sda2
| sda3
This might fail though, if there are no partitions present:
| root@license42 ~ # dd if=/dev/zero of=/dev/sda bs=10M count=1
| 1+0 records in
| 1+0 records out
| 10485760 bytes (10 MB, 10 MiB) copied, 0.0487036 s, 215 MB/s
| root@license42 ~ # pvremove /dev/sda --force --force --yes
| Labels on physical volume "/dev/sda" successfully wiped.
| root@license42 ~ # blockdevice="/dev/sda"
| root@license42 ~ # lsblk --noheadings --output KNAME /dev/sda | grep -v "^${blockdevice#\/dev\/}$"
| 1 root@license42 ~ #
Ending up in our daily-build-install-vm Jenkins jobs like this:
| +13:08:19 (netscript.grml:489): clear_partition_table(): echo 'Removing possibly existing LVM/PV label from /dev/sda'
| +13:08:19 (netscript.grml:490): clear_partition_table(): pvremove /dev/sda --force --force --yes
| Labels on physical volume "/dev/sda" successfully wiped.
| ++13:08:19 (netscript.grml:495): clear_partition_table(): grep -v '^sda$'
| ++13:08:19 (netscript.grml:495): clear_partition_table(): lsblk --noheadings --output KNAME /dev/sda
| +++13:08:19 (netscript.grml:495): clear_partition_table(): wait_exit
| +++13:08:19 (netscript.grml:339): wait_exit(): local e_code=1
| +++13:08:19 (netscript.grml:340): wait_exit(): [[ 1 -ne 0 ]]
| +++13:08:19 (netscript.grml:341): wait_exit(): set_deploy_status error
| +++13:08:19 (netscript.grml:103): set_deploy_status(): '[' -n error ']'
| +++13:08:19 (netscript.grml:104): set_deploy_status(): echo error
| Wiping disk signatures from /dev/sda
| +++13:08:19 (netscript.grml:343): wait_exit(): trap '' 1 2 3 6 15 ERR EXIT
| +++13:08:19 (netscript.grml:344): wait_exit(): status_wait
| +++13:08:19 (netscript.grml:329): status_wait(): [[ -n 0 ]]
| +++13:08:19 (netscript.grml:329): status_wait(): [[ 0 != 0 ]]
Followup change for
|
1 year ago |
|
fc9b43f92e |
TT#118659 Fix re-deploying over existing SW-RAID arrays
Fresh deployments with SW-RAID (Software-RAID) might fail if the present disks were already part of an SW-RAID setup: | Error: disk nvme1n1 seems to be part of an existing SW-RAID setup. We could also reproduce this inside PVE VMs: | mdadm: /dev/md/127 has been started with 2 drives. | Error: disk sda seems to be part of an existing SW-RAID setup. This is caused by the following behavior: | + SWRAID_DEVICE="/dev/md0" | [...] | + mdadm --assemble --scan | + true | + [[ -b /dev/md0 ]] | + for disk in "${SWRAID_DISK1}" "${SWRAID_DISK2}" | + grep -q nvme1n1 /proc/mdstat | + die 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.' | + echo 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.' | Error: disk nvme1n1 seems to be part of an existing SW-RAID setup. By default we expect and set the SWRAID_DEVICE to be /dev/md0. But only "local" arrays get assembled as /dev/md0 and upwards, whereas "foreign" arrays start at md127 downwards. This is exactly what we get when booting our deployment live system on top of an existing installation, and assemble existing SW-RAIDs (to not overwrite unexpected disks by mistake): | root@grml ~ # lsblk | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS | loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/ngcp.squashfs | /run/live/rootfs/ngcp.squashfs | nvme0n1 259:0 0 447.1G 0 disk | └─md127 9:127 0 447.1G 0 raid1 | ├─md127p1 259:14 0 18G 0 part | ├─md127p2 259:15 0 18G 0 part | ├─md127p3 259:16 0 405.6G 0 part | ├─md127p4 259:17 0 512M 0 part | ├─md127p5 259:18 0 4G 0 part | └─md127p6 259:19 0 1G 0 part | nvme1n1 259:7 0 447.1G 0 disk | └─md127 9:127 0 447.1G 0 raid1 | ├─md127p1 259:14 0 18G 0 part | ├─md127p2 259:15 0 18G 0 part | ├─md127p3 259:16 0 405.6G 0 part | ├─md127p4 259:17 0 512M 0 part | ├─md127p5 259:18 0 4G 0 part | └─md127p6 259:19 0 1G 0 part | | root@grml ~ # lsblk -l -n -o TYPE,NAME | loop loop0 | raid1 md127 | disk nvme0n1 | disk nvme1n1 | part md127p1 | part md127p2 | part md127p3 | part md127p4 | part md127p5 | part md127p6 | | root@grml ~ # cat /proc/cmdline | vmlinuz initrd=initrd.img swraiddestroy swraiddisk2=nvme0n1 swraiddisk1=nvme1n1 [...] Let's identify existing RAID devices and check their configuration by going through the disks and comparing them with our SWRAID_DISK1 and SWRAID_DISK2. If they don't match with each other, we stop execution to prevent any possible data damage. Furthermore, we need to assemble the mdadm array without relying on a possibly existing local `/etc/mdadm/mdadm.conf` configuration file. Otherwise assembling might fail: | root@grml ~ # cat /proc/mdstat | Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] | unused devices: <none> | root@grml ~ # lsblk -l -n -o TYPE,NAME | awk '/^raid/ {print $2}' | root@grml ~ # grep ARRAY /etc/mdadm/mdadm.conf | ARRAY /dev/md/127 metadata=1.0 UUID=0d44774e:7269bac6:2f02f337:4551597b name=localhost:127 | root@grml ~ # mdadm --assemble --scan | 2 root@grml ~ # mdadm --assemble --scan --verbose | mdadm: looking for devices for /dev/md/127 | mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got 800989c0) | mdadm: no RAID superblock on /dev/loop0 | mdadm: No super block found on /dev/nvme1n1p3 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme1n1p3 | mdadm: No super block found on /dev/nvme1n1p2 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme1n1p2 | mdadm: No super block found on /dev/nvme1n1p1 (Expected magic a92b4efc, got 000080fe) | mdadm: no RAID superblock on /dev/nvme1n1p1 | mdadm: No super block found on /dev/nvme1n1 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme1n1 | mdadm: No super block found on /dev/nvme0n1p3 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme0n1p3 | mdadm: No super block found on /dev/nvme0n1p2 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme0n1p2 | mdadm: No super block found on /dev/nvme0n1p1 (Expected magic a92b4efc, got 000080fe) | mdadm: no RAID superblock on /dev/nvme0n1p1 | mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc, got 00000000) | mdadm: no RAID superblock on /dev/nvme0n1 | 2 root@grml ~ # mdadm --assemble --scan --config /dev/null | mdadm: /dev/md/grml:127 has been started with 2 drives. | root@grml ~ # lsblk -l -n -o TYPE,NAME | awk '/^raid/ {print $2}' | md127 By running mdadm assemble with `--config /dev/null`, we prevent consideration and usage of a possibly existing /etc/mdadm/mdadm.conf configuration file. Example output of running the new code: | [...] | mdadm: No arrays found in config file or automatically | NOTE: default SWRAID_DEVICE set to /dev/md0 though we identified active md127 | NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup | Wiping signatures from /dev/md127 | /dev/md127: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 | Removing mdadm device /dev/md127 | Stopping mdadm device /dev/md127 | mdadm: stopped /dev/md127 | Zero-ing superblock from /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme1n1 | Zero-ing superblock from /dev/nvme0n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | NOTE: modified RAID array detected, setting SWRAID_DEVICE back to original setting '/dev/md0' | Removing possibly existing LVM/PV label from /dev/nvme1n1 | Cannot use /dev/nvme1n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme1n1p1 | Cannot use /dev/nvme1n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme1n1p2 | Labels on physical volume "/dev/nvme1n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme1n1p3 | Cannot use /dev/nvme1n1p3: device is an md component | Wiping disk signatures from /dev/nvme1n1 | /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme1n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0027866 s, 376 MB/s | Removing possibly existing LVM/PV label from /dev/nvme0n1 | Cannot use /dev/nvme0n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme0n1p1 | Cannot use /dev/nvme0n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme0n1p2 | Labels on physical volume "/dev/nvme0n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme0n1p3 | Cannot use /dev/nvme0n1p3: device is an md component | Wiping disk signatures from /dev/nvme0n1 | /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme0n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00278955 s, 376 MB/s | Creating partition table | Get path of EFI partition | pvdevice is now available: /dev/nvme1n1p2 | The operation has completed successfully. | The operation has completed successfully. | pvdevice is now available: /dev/nvme1n1p3 | pvdevice is now available: /dev/nvme0n1p3 | mdadm: /dev/nvme1n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024 | mdadm: Note: this array has metadata at the start and | may not be suitable as a boot device. If you plan to | store '/boot' on this device please ensure that | your boot-loader understands md/v1.x metadata, or use | --metadata=0.90 | mdadm: /dev/nvme0n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024 | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Creating PV + VG on /dev/md0 | Physical volume "/dev/md0" successfully created. | Volume group "ngcp" successfully created | 0 logical volume(s) in volume group "ngcp" now active | Creating LV 'root' with 10G | [...] | | mdadm: stopped /dev/md127 | mdadm: No arrays found in config file or automatically | NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup | Removing mdadm device /dev/md127 | Stopping mdadm device /dev/md127 | mdadm: stopped /dev/md127 | mdadm: Unrecognised md component device - /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | mdadm: /dev/nvme1n1p3 appears to be part of a raid array: | mdadm: Note: this array has metadata at the start and | mdadm: /dev/nvme0n1p3 appears to be part of a raid array: | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | lvm2 mdadm wget | Get:1 http://http-proxy.lab.sipwise.com/debian bookworm/main amd64 mdadm amd64 4.2-5 [443 kB] | Selecting previously unselected package mdadm. | Preparing to unpack .../0-mdadm_4.2-5_amd64.deb ... | Unpacking mdadm (4.2-5) ... | Setting up mdadm (4.2-5) ... | [...] | mdadm: stopped /dev/md0 Change-Id: Ib5875248e9c01dd4251bfab2cc4c94daace503fa |
1 year ago |
|
e9244a289b |
TT#118659 Wipe disk signatures more reliably with SW-RAID and NVMe setup
Deployed current NGCP trunk on NVMe powered SW-RAID setup failed with: | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Creating PV + VG on /dev/md0 | Cannot use /dev/md0: device is partitioned This is caused because /dev/md0 still contains partition data, and its nvme1n1p3 also still has disk signature about linux_raid_member. So it's *not* enough to stop the mdadm array, remove PV/LVM information from the partitions and finally wipe SW-RAID disks /dev/nvme1n1 + /dev/nvme0n1 (example output from such a failing run): | mdadm: /dev/md/0 has been started with 2 drives. | mdadm: stopped /dev/md0 | mdadm: Unrecognised md component device - /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | Removing possibly existing LVM/PV label from /dev/nvme1n1 | Cannot use /dev/nvme1n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme1n1p1 | Cannot use /dev/nvme1n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme1n1p2 | Labels on physical volume "/dev/nvme1n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme1n1p3 | Cannot use /dev/nvme1n1p3: device is an md component | Wiping disk signatures from /dev/nvme1n1 | /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme1n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00314195 s, 334 MB/s | Removing possibly existing LVM/PV label from /dev/nvme0n1 | Cannot use /dev/nvme0n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme0n1p1 | Cannot use /dev/nvme0n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme0n1p2 | Labels on physical volume "/dev/nvme0n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme0n1p3 | Cannot use /dev/nvme0n1p3: device is an md component | Wiping disk signatures from /dev/nvme0n1 | /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme0n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00893285 s, 117 MB/s | Creating partition table | Get path of EFI partition | pvdevice is now available: /dev/nvme1n1p2 | The operation has completed successfully. | The operation has completed successfully. | pvdevice is now available: /dev/nvme1n1p3 | pvdevice is now available: /dev/nvme0n1p3 | mdadm: /dev/nvme1n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 | mdadm: Note: this array has metadata at the start and | may not be suitable as a boot device. If you plan to | store '/boot' on this device please ensure that | your boot-loader understands md/v1.x metadata, or use | --metadata=0.90 | mdadm: /dev/nvme0n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Creating PV + VG on /dev/md0 | Cannot use /dev/md0: device is partitioned Instead we also need to wipe signatures from the SW-RAID device (like /dev/md0), only then stop it, ensure we wipe disk signatures also from all the partitions (like /dev/nvme1n1p3) and only then finally remove the disk signatures from the main block device (like /dev/nvme1n1). Example from a successful run with this change: | root@grml ~ # grep -e mdadm -e Wiping /tmp/deployment-installer-debug.log | mdadm: /dev/md/0 has been started with 2 drives. | Wiping signatures from /dev/md0 | Removing mdadm device /dev/md0 | Stopping mdadm device /dev/md0 | mdadm: stopped /dev/md0 | mdadm: Unrecognised md component device - /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | Wiping disk signatures from partition /dev/nvme1n1p1 | Wiping disk signatures from partition /dev/nvme1n1p2 | Wiping disk signatures from partition /dev/nvme1n1p3 | Wiping disk signatures from /dev/nvme1n1 | Wiping disk signatures from partition /dev/nvme0n1p1 | Wiping disk signatures from partition /dev/nvme0n1p2 | Wiping disk signatures from partition /dev/nvme0n1p3 | Wiping disk signatures from /dev/nvme0n1 | mdadm: Note: this array has metadata at the start and | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Wiping ext3 signature on /dev/ngcp/root. | Wiping ext4 signature on /dev/ngcp/fallback. | Wiping ext4 signature on /dev/ngcp/data. While at it, be more verbose about the executed steps. FTR, disk and setup information of such a system where we noticed the failure and worked on this change: | root@grml ~ # fdisk -l | Disk /dev/nvme0n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | Disk model: DELL NVME ISE PE8010 RI M.2 480GB | Units: sectors of 1 * 512 = 512 bytes | Sector size (logical/physical): 512 bytes / 512 bytes | I/O size (minimum/optimal): 512 bytes / 512 bytes | Disklabel type: gpt | Disk identifier: 5D296676-52CF-49CF-863A-6D3A3BD0604F | | Device Start End Sectors Size Type | /dev/nvme0n1p1 2048 4095 2048 1M BIOS boot | /dev/nvme0n1p2 4096 999423 995328 486M EFI System | /dev/nvme0n1p3 999424 937701375 936701952 446.7G Linux RAID | | | Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | Disk model: DELL NVME ISE PE8010 RI M.2 480GB | Units: sectors of 1 * 512 = 512 bytes | Sector size (logical/physical): 512 bytes / 512 bytes | I/O size (minimum/optimal): 512 bytes / 512 bytes | Disklabel type: gpt | Disk identifier: 9AFA8ACF-D2CD-4224-BA0C-D38A6581D0F9 | | Device Start End Sectors Size Type | /dev/nvme1n1p1 2048 4095 2048 1M BIOS boot | /dev/nvme1n1p2 4096 999423 995328 486M EFI System | /dev/nvme1n1p3 999424 937701375 936701952 446.7G Linux RAID | [...] | | root@grml ~ # lsblk | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS | loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/ngcp.squashfs | /run/live/rootfs/ngcp.squashfs | nvme0n1 259:0 0 447.1G 0 disk | ├─nvme0n1p1 259:5 0 1M 0 part | ├─nvme0n1p2 259:8 0 486M 0 part | └─nvme0n1p3 259:9 0 446.7G 0 part | └─md0 9:0 0 446.5G 0 raid1 | ├─ngcp-root 253:0 0 10G 0 lvm /mnt | ├─ngcp-fallback 253:1 0 10G 0 lvm | └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data | nvme1n1 259:4 0 447.1G 0 disk | ├─nvme1n1p1 259:2 0 1M 0 part | ├─nvme1n1p2 259:6 0 486M 0 part | └─nvme1n1p3 259:7 0 446.7G 0 part | └─md0 9:0 0 446.5G 0 raid1 | ├─ngcp-root 253:0 0 10G 0 lvm /mnt | ├─ngcp-fallback 253:1 0 10G 0 lvm | └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data | | root@grml ~ # cat /proc/mdstat | Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] | md0 : active raid1 nvme0n1p3[1] nvme1n1p3[0] | 468218880 blocks super 1.2 [2/2] [UU] | [==>..................] resync = 12.7% (59516864/468218880) finish=33.1min speed=205685K/sec | bitmap: 4/4 pages [16KB], 65536KB chunk | | unused devices: <none> Change-Id: Iaa7f49eef11ef6ad6209fe962bb8940a75a87c95 |
1 year ago |
|
76893e3acb |
Release new version 12.3.0.0+0~mr12.3.0.0
|
1 year ago |
|
236cb2d1a7 |
MT#58926 Vagrant: ensure to have libxmu6 available
We get the following error message in /var/log/vboxadd-install.log, /var/log/deployment-installer-debug.log, /var/log/daemon.log + /var/log/syslog: | /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory This is caused by missing libxmu6: | [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help | /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient: error while loading shared libraries: libXmu.so.6: cannot open shared object file: No such file or directory | [sipwise-lab-trunk] sipwise@spce:~$ sudo apt install libxmu6 | Reading package lists... Done | Building dependency tree... Done | Reading state information... Done | The following NEW packages will be installed: | libxmu6 | 0 upgraded, 1 newly installed, 0 to remove and 83 not upgraded. | Need to get 60.1 kB of archives. | After this operation, 143 kB of additional disk space will be used. | Get:1 https://debian.sipwise.com/debian bookworm/main amd64 libxmu6 amd64 2:1.1.3-3 [60.1 kB] | Fetched 60.1 kB in 0s (199 kB/s) | [...] | [sipwise-lab-trunk] sipwise@spce:~$ /opt/VBoxGuestAdditions-7.0.6/bin/VBoxClient --help | Oracle VM VirtualBox VBoxClient 7.0.6 | Copyright (C) 2005-2023 Oracle and/or its affiliates | | Usage: VBoxClient --clipboard|--draganddrop|--checkhostversion|--seamless|--vmsvga|--vmsvga-session | [-d|--nodaemon] | | Options: | [...] It looks like lack of libxmu6 doesn't cause any actual problems for our use case (we don't use X.org at all), though given that libxmu6 is a small library package, let's try to get it working as expected and avoid the alarming errors on the logs. Thanks Guillem Jover for spotting and reporting Change-Id: I65f3dd496a4026f04fd9944fd7cc43d6abbdf336 |
1 year ago |
|
0f384353f8 |
Release new version 12.2.0.0+0~mr12.2.0.0
|
1 year ago |
|
8c3ab6b241 |
MT#57559 Always include zstd when bootstrapping systems
During initial deployment of a system, we get warnings about lack of zstd: | Setting up linux-image-6.1.0-13-amd64 (6.1.55-1) ... | I: /vmlinuz.old is now a symlink to boot/vmlinuz-6.1.0-13-amd64 | I: /initrd.img.old is now a symlink to boot/initrd.img-6.1.0-13-amd64 | I: /vmlinuz is now a symlink to boot/vmlinuz-6.1.0-13-amd64 | I: /initrd.img is now a symlink to boot/initrd.img-6.1.0-13-amd64 | /etc/kernel/postinst.d/initramfs-tools: | update-initramfs: Generating /boot/initrd.img-6.1.0-13-amd64 | W: No zstd in /usr/bin:/sbin:/bin, using gzip | [...] The initramfs generation and update overall runs *four* times within the initial bootstrapping of a system (we'll try to do something about this, but this is outside the scope of this). As of initramfs-tools v0.141, initramfs-tools uses zstd as default compression for initramfs. Version 0.142 is shipped with Debian/bookworm, and therefore it makes sense to have it available upfront. Note that also the initrd generation is faster with zstd (~10sec for zstd vs. ~13sec for gzip) and also the resulting initrd is smaller (~33MB for zstd vs ~39MB for gzip). By making sure that zstd is available straight from the very beginning and before ngcp-installer pulls it in later, we can avoid the warning message but also save >10 seconds of install time. Given that zstd is available even in Debian oldoldstable, let's install it unconditionally in all our systems. Thanks: Volodymyr Fedorov for reporting Change-Id: I56674c3c213f7c7a6e6cbce3c8e2e00a4cfbdbd4 |
1 year ago |
|
9cceb8d655 |
MT#58356 ntp: Use ntpsec.service instead of ntp.service
Even though the ntpsec.service contains an Alias for ntp.service, that does not work for us when the service has not yet been installed, so the first run will fail. Use the actual name to avoid this issue. Change-Id: I8f0ee3b38390a7e58c3bbee65fd96bfd4b717dfa |
2 years ago |
|
f483c18b82 |
Release new version 12.1.0.0+0~mr12.1.0.0
|
2 years ago |
|
39949fcd06 |
MT#58356 Update packaging for bookworm
- Add Rules-Requires-Root field. - Switch to Standards-Version 4.6.2. - Update copyright years. Change-Id: Ia24821937c439718750b1832b782cd3832dc9c19 |
2 years ago |
|
d132ecc4bc |
MT#57165 Add ngcp-kernel-firmware package to grml-sipwise
It's better to have this package in grml-sipwise image so any system with this network card can use all it's power even in deployment stage. Change-Id: I765efcf446a410a42ef156b2ccc2e6612a33ddd6 |
2 years ago |
|
1239aeab8b |
Release new version 12.0.1.0+0~mr12.0.1.0
|
2 years ago |
|
366c412c1f |
MT#57980 Add mr11.5 LTS key to bootstrap
Now it contains: pub rsa4096 2015-03-05 [SC] [expires: 2029-10-12] 68A702B1FD8E422AAAA1ADA3773236EFF411A836 uid [ unknown] Sipwise GmbH (Sipwise Repository Key) <support@sipwise.com> sub rsa4096 2015-03-05 [E] [expires: 2029-10-12] pub rsa4096 2011-06-06 [SC] F7B8A739CE638D719A078C9859104633EE5E097D uid [ unknown] Sipwise autobuilder (Used to sign packages for autobuild) <development@sipwise.com> sub rsa4096 2011-06-06 [E] pub rsa4096 2022-05-31 [SCEA] [expires: 2032-05-28] 39EB73D5B54870181632E48786C3B4395CB844A2 uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2023-08-04 [SCEA] [expires: 2033-08-01] F0A595D85C375447BB09F25E34A72CE4979CA98A uid [ unknown] Sipwise autobuilder <development@sipwise.com> pub rsa4096 2021-05-04 [SCEA] [expires: 2031-05-02] AB7FE3DCD53767F6160406442A5CA71B542B9A22 uid [ unknown] Sipwise autobuilder <development@sipwise.com> Change-Id: I33c8a4e666f1a7f8b64d823c3d4e2550ca8dcf11 |
2 years ago |
|
793a93bc43 |
MT#57453 vagrant_configuration: remove fake systemd presence after execution
Let's restore system state of /run/systemd/system for
VBoxLinuxAdditions, to avoid any unexpected side effects.
Followup for git rev
|
2 years ago |
|
561303359e |
MT#57453 Use tty1 for stdin when running under grml-autoconfig service
Recent Grml ISOs, including our Grml-Sipwise ISO (v2023-06-01), include
grml-autoconfig v0.20.3 which execute the grml-autoconfig service under
`StandardInput=null`. This is necessary to not conflict with tty usage,
like used with serial console. See
|
2 years ago |
|
8601193128 |
MT#57453 vagrant_configuration: fake systemd presence
As of git rev
|
2 years ago |
|
6c960afee4 |
TT#104221 Use bookworm repos in ensure_packages_installed appropriately
Support bookworm option in DEBIAN_RELEASE selection. We have support for it already. Use bookworm as fallback since nowadays we jumped to it. Change-Id: I118c1b5cf81fe57394495b5f745fc81032406c78 |
2 years ago |
|
37163532ee |
MT#56773 Use bullseye puppetlabs repository for bookworm
To be able to upgrade our internal systems to Debian/bookworm we need to have puppet packages available. Upstream still doesn't provide any Debian packages (see https://tickets.puppetlabs.com/browse/PA-4995), though their AIO (All In One) packages for Debian/bullseye seem to be working on Debian/bookworm as well (at least for puppet-agent). So until we either migrated to puppet-agent as present in Debian/bookworm or upstream provides according AIO packages, let's use the puppet-agent packages we already use for our Debian/bullseye systems. Change-Id: I2211ffd79f70a2a79873e737b0b512bfb7492328 |
2 years ago |
|
3a942b1b8c |
MT#57453 Switch docker image to bookworm
Change-Id: I9cfc7f0f6062d5e4916c7ba18b72cbc3e8c8ebbb |
2 years ago |
|
1cb15c866e |
Release new version 11.5.0.0+0~mr11.5.0.0
|
2 years ago |
|
0fedba6144 |
MT#57643 Ensure /var/lib/dpkg/available exists on Debian releases <=buster
Since version 1.20.0, dpkg no longer creates /var/lib/dpkg/available (see #647911). Now that we upgraded our Grml-Sipwise deployment system to bookworm, we have dpkg v1.21.22 on our live system, and mmdebstrap relies on dpkg of the host system for execution. But on Debian releases until and including buster, dpkg fails to operate with e.g. `dpkg --set-selections`, if /var/lib/dpkg/available doesn't exist: | The following NEW packages will be installed: | nullmailer | [...] | debconf: delaying package configuration, since apt-utils is not installed | dpkg: error: failed to open package info file '/var/lib/dpkg/available' for reading: No such file or directory We *could* also switch from mmdebstrap to debootstrap for deploying Debian releases <=buster, but this would be slower and we use mmdebstrap since quite some time for everything. So instead let's create /var/lib/dpkg/available after bootstrapping the system. Reported towards mmdebstrap as #1037946. Change-Id: I0a87ca255d5eb7144a9c093051c0a6a3114a3c0b |
2 years ago |
|
eccdc586ae |
MT#57644 puppet/git: allow ssh-rsa pubkey usage
Now that our deployment system is based on Debian/bookworm, but our gerrit/git server still runs on Debian/bullseye, we run into the OpenSSH RSA issue (RSA signatures using the SHA-1 hash algorithm got disabled by default), see https://michael-prokop.at/blog/2023/06/11/what-to-expect-from-debian-bookworm-newinbookworm/ and https://www.jhanley.com/blog/ssh-signature-algorithm-ssh-rsa-error/ We need to enable ssh-rsa usage, otherwise deployment fails with: | Warning: Permanently added '[gerrit.mgm.sipwise.com]:29418' (ED25519) to the list of known hosts. | sign_and_send_pubkey: no mutual signature supported | puppet-r10k@gerrit.mgm.sipwise.com: Permission denied (publickey). | fatal: Could not read from remote repository. Change-Id: I5894170dab033d52a2612beea7b6f27ab06cc586 |
2 years ago |
|
8cfb8c8392 |
MT#57630 Check online connectivity to work around Intel E810 / ice issue
Deploying the Debian/bookworm based NGCP system fails on a Lenovo sr250 v2 node with an Intel E810 network card: | # lshw -c net -businfo | Bus info Device Class Description | ======================================================= | pci@0000:01:00.0 eth0 network Ethernet Controller E810-XXV for SFP | pci@0000:01:00.1 eth1 network Ethernet Controller E810-XXV for SFP | # lshw -c net | *-network:0 | description: Ethernet interface | product: Ethernet Controller E810-XXV for SFP | vendor: Intel Corporation | physical id: 0 | bus info: pci@0000:01:00.0 | logical name: eth0 | version: 02 | serial: [...] | size: 10Gbit/s | capacity: 25Gbit/s | width: 64 bits | clock: 33MHz | capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd | configuration: autonegotiation=off broadcast=yes driver=ice driverversion=1.11.14 duplex=full firmware=2.25 0x80007027 1.2934.0 ip=192.168.90.51 latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s | resources: iomemory:400-3ff iomemory:400-3ff irq:16 memory:4002000000-4003ffffff memory:4006010000-400601ffff memory:a1d00000-a1dfffff memory:4005000000-4005ffffff memory:4006220000-400641ffff We set up the /etc/network/interfaces file by invoking Grml's netcardconfig script in automated mode, like: NET_DEV=eth0 METHOD=static IPADDR=192.168.90.51 NETMASK=255.255.255.248 GATEWAY=192.168.90.49 /usr/sbin/netcardconfig The resulting /etc/network/interfaces gets used as base for usage inside the NGCP chroot/target system. netcardconfig shuts down the network interface (eth0 in the example above) via ifdown, then sleeps for 3 seconds and re-enables the interface (via ifup) with the new configuration. This used to work fine so far, but with the Intel e810 network card and kernel version 6.1.0-9-amd64 from Debian/bookworm we see a link failure and it takes ~10 seconds until the network device is up and running again. The following vagrant_configuration() execution from deployment.sh then fails: | +11:41:01 (netscript.grml:1022): vagrant_configuration(): wget -O /var/tmp/id_rsa_sipwise.pub http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub | --2023-06-11 11:41:01-- http://builder.mgm.sipwise.com/vagrant-ngcp/id_rsa_sipwise.pub | Resolving builder.mgm.sipwise.com (builder.mgm.sipwise.com)... failed: Name or service not known. | wget: unable to resolve host address 'builder.mgm.sipwise.com' However, when we retry it again just a bit later, the network works fine again. During investigation we identified that the network card flips the port, quoting the related log from the connected Cisco nexus 5020 switch (with fast stp learning mode): | nexus5k %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/33 is down (Link failure) It seems to be related to some autonegotiation problem, as when we execute `ethtool -A eth0 rx on tx on` (no matter whether with `on` or `off`), we see: | [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: Autoneg did not complete so changing settings may not result in an actual change. | [Tue Jun 13 08:51:37 2023] ice 0000:01:00.0 eth0: NIC Link is Down | [Tue Jun 13 08:51:45 2023] ice 0000:01:00.0 eth0: NIC Link is up 10 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: NONE, Autoneg Advertised: On, Autoneg Negotiated: False, Flow Control: Rx/Tx FTR: | root@sp1 ~ # ethtool -A eth0 autoneg off | netlink error: Operation not supported | 76 root@sp1 ~ # ethtool eth0 | grep -C1 Auto-negotiation | Duplex: Full | Auto-negotiation: off | Port: FIBRE | root@sp1 ~ # ethtool -A eth0 autoneg on | root@sp1 ~ # ethtool eth0 | grep -C1 Auto-negotiation | Duplex: Full | Auto-negotiation: off | Port: FIBRE | root@sp1 ~ # dmesg -T | tail -1 | [Tue Jun 13 08:53:26 2023] ice 0000:01:00.0 eth0: To change autoneg please use: ethtool -s <dev> autoneg <on|off> | root@sp1 ~ # ethtool -s eth0 autoneg off | root@sp1 ~ # ethtool -s eth0 autoneg on | netlink error: link settings update failed | netlink error: Operation not supported | 75 root@sp1 ~ # As a workaround, at least until we have a better fix/solution, we try to reach the default gateway (or fall back to the repository host if gateway couldn't be identified) via ICMP/ping, and once that works we we continue as usual. But even if that should fail we continue execution, to minimize behavior change but have a workaround for this specific situation available. FTR, broken system: | root@sp1 ~ # ethtool -i eth0 | driver: ice | version: 6.1.0-9-amd64 | firmware-version: 2.25 0x80007027 1.2934.0 | [...] Whereas with kernel 5.10.0-23-amd64 from Debian/bullseye we don't seem to see that behavior: | root@sp1:~# ethtool -i neth0 | driver: ice | version: 5.10.0-23-amd64 | firmware-version: 2.25 0x80007027 1.2934.0 | [...] Also using latest available ice v1.11.14 (from https://sourceforge.net/projects/e1000/files/ice%20stable/1.11.14/) on Kernel version 6.1.0-9-amd64 doesn't bring any change: | root@sp1 ~ # modinfo ice | filename: /lib/modules/6.1.0-9-amd64/updates/drivers/net/ethernet/intel/ice/ice.ko | firmware: intel/ice/ddp/ice.pkg | version: 1.11.14 | license: GPL v2 | description: Intel(R) Ethernet Connection E800 Series Linux Driver | author: Intel Corporation, <linux.nics@intel.com> | srcversion: 818E9C817731C98A25470C0 | alias: pci:v00008086d00001888sv*sd*bc*sc*i* | [...] | alias: pci:v00008086d00001591sv*sd*bc*sc*i* | depends: ptp | retpoline: Y | name: ice | vermagic: 6.1.0-9-amd64 SMP preempt mod_unload modversions | parm: debug:netif level (0=none,...,16=all) (int) | parm: fwlog_level:FW event level to log. All levels <= to the specified value are enabled. Values: 0=none, 1=error, 2=warning, 3=normal, 4=verbose. Invalid values: >=5 | (ushort) | parm: fwlog_events:FW events to log (32-bit mask) | (ulong) | root@sp1 ~ # ethtool -i eth0 | head -3 | driver: ice | version: 1.11.14 | firmware-version: 2.25 0x80007027 1.2934.0 | root@sp1 ~ # Change-Id: Ieafe648be4e06ed0d936611ebaf8ee54266b6f3c |
2 years ago |
|
f4da3e094e |
MT#57049 Ensure SW-RAID device is inactive before re-reading partition table
Re-reading of disks fails if the mdadm SW-RAID device is still active: | root@sp1 ~ # cat /proc/mdstat | Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] | md0 : active raid1 sdb3[1] sda3[0] | 468218880 blocks super 1.2 [2/2] [UU] | [========>............] resync = 42.2% (197855168/468218880) finish=22.4min speed=200756K/sec | bitmap: 3/4 pages [12KB], 65536KB chunk | | unused devices: <none> | root@sp1 ~ # blockdev --rereadpt /dev/sdb | blockdev: ioctl error on BLKRRPART: Device or resource busy | 1 root@sp1 ~ # blockdev --rereadpt /dev/sda | blockdev: ioctl error on BLKRRPART: Device or resource busy | 1 root@sp1 ~ # Only if we stop the mdadm SW-RAID device, then we can re-read the partition table: | root@sp1 ~ # mdadm --stop /dev/md0 | mdadm: stopped /dev/md0 | root@sp1 ~ # blockdev --rereadpt /dev/sda | root@sp1 ~ # This behavior isn't new and unrelated to Debian/bookworm but was spotted while debugging an unrelated issue. FTR: we re-read the partition table (via `blockdev --rereadpt`) to ensure that /etc/fstab of the live system is up2date and matches the current system state. While this isn't stricly needed, we preserve existing behavior and also try to avoid a hard "cut" of a possibly ongoing SW-RAID sync. Change-Id: I735b00423e6efa932f74b78a38ed023576e5d306 |
2 years ago |