Our https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/ matrix no longer provides builds for debian/trixie, because its daily-build-images subproject Jenkins job with its proxmox-vm-clean-fs job failed to run. After running proxmox-vm-clean-fs under `set -x`, and also overriding the ssh_wrapper function with `ssh -v ...`, I managed to grab this from the Jenkins job execution: | + ssh -v -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'ServerAliveInterval 10' -o 'ConnectTimeout 15' 192.168.210.101 'rm -vf /etc/udev/rules.d/70-persistent-net.rules' | OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.14 4 Jun 2024 | debug1: Reading configuration data /var/lib/jenkins/.ssh/config | debug1: /var/lib/jenkins/.ssh/config line 7: Applying options for 192.168.* | debug1: Reading configuration data /etc/ssh/ssh_config | debug1: /etc/ssh/ssh_config line 50: Applying options for * | debug1: /etc/ssh/ssh_config line 57: Deprecated option "useroaming" | debug1: Connecting to 192.168.210.101 [192.168.210.101] port 22. | debug1: fd 3 clearing O_NONBLOCK | debug1: Connection established. | debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise type 0 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise-cert type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa type 0 | debug1: identity file /var/lib/jenkins/.ssh/id_rsa-cert type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_dsa type -1 | debug1: identity file /var/lib/jenkins/.ssh/id_dsa-cert type -1 | debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u3 | debug1: kex_exchange_identification: banner line 0: Not allowed at this time The `Not allowed at this time` pointed to a new OpenSSH feature, which triggered the regression for us. OpenSSH introduced options to penalize undesirable behavior, see https://undeadly.org/cgi?action=article;sid=20240607042157 and https://www.openssh.com/releasenotes.html#9.9p1 and https://sources.debian.org/src/openssh/1:9.9p1-1/sshd.c/?hl=576#L573 This is now present as of openssh-server v1:9.9p1-1 since end of September 2024 also in Debian/trixie. Now, when too many SSH logins fail, a client system can't necessarily no longer connect via SSH due this new penalty behavior. And indeed, within our Jenkins job "daily-build-install-vm" we try to collect several log files through our grab_log and SSH wrapper: | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /tmp/ngcp-installer-cmdline.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-cmdline.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/syslog /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/syslog | + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/boot /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/boot We even execute this grab_log wrapper twice: once for the running Grml live system, and once when we booted into the actually deployed system. This works fine for the Grml live system situation, but as root logins aren't allowed by default in OpenSSH since quite some time, all the sipwise-ssh-copier runs with user/password against a plain Debian system then fail. As a consequence, we lock ourselves out of the system with all those SSH login failures, and the Jenkins job proxmox-vm-clean-fs then runs into the OpenSSH penalty, which causes the trixie/debian job to fail. We use our Debian images as base for further configuration, where we control the sshd_config file through our ngcpcfg system anyways, so the `PerSourcePenalties no` setting is supposed to disappear then. FTR: We could also enable `PermitRootLogin yes` in sshd_config to get the grab_log working, though this didn't have any relevance for us so far. Disabling only the `PerSourcePenalties` feature feels like a better trade-off, at least security wise, for now. Change-Id: Ibf16019b4787cc63d450501c8bccebeac77dd9f1mr13.1.1
parent
4debc55f6b
commit
6eee97de7b
Loading…
Reference in new issue