MT#61265 trixie: avoid SSH login failures due to OpenSSH penalize feature

Our https://jenkins.mgm.sipwise.com/job/daily-build-matrix-debian-boxes/
matrix no longer provides builds for debian/trixie, because its
daily-build-images subproject Jenkins job with its proxmox-vm-clean-fs
job failed to run.

After running proxmox-vm-clean-fs under `set -x`, and also overriding
the ssh_wrapper function with `ssh -v ...`, I managed to grab this from
the Jenkins job execution:

| + ssh -v -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 'ServerAliveInterval 10' -o 'ConnectTimeout 15' 192.168.210.101 'rm -vf /etc/udev/rules.d/70-persistent-net.rules'
| OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.14 4 Jun 2024
| debug1: Reading configuration data /var/lib/jenkins/.ssh/config
| debug1: /var/lib/jenkins/.ssh/config line 7: Applying options for 192.168.*
| debug1: Reading configuration data /etc/ssh/ssh_config
| debug1: /etc/ssh/ssh_config line 50: Applying options for *
| debug1: /etc/ssh/ssh_config line 57: Deprecated option "useroaming"
| debug1: Connecting to 192.168.210.101 [192.168.210.101] port 22.
| debug1: fd 3 clearing O_NONBLOCK
| debug1: Connection established.
| debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise type 0
| debug1: identity file /var/lib/jenkins/.ssh/id_rsa_sipwise-cert type -1
| debug1: identity file /var/lib/jenkins/.ssh/id_rsa type 0
| debug1: identity file /var/lib/jenkins/.ssh/id_rsa-cert type -1
| debug1: identity file /var/lib/jenkins/.ssh/id_dsa type -1
| debug1: identity file /var/lib/jenkins/.ssh/id_dsa-cert type -1
| debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u3
| debug1: kex_exchange_identification: banner line 0: Not allowed at this time

The `Not allowed at this time` pointed to a new OpenSSH feature, which
triggered the regression for us.

OpenSSH introduced options to penalize undesirable behavior, see
https://undeadly.org/cgi?action=article;sid=20240607042157 and
https://www.openssh.com/releasenotes.html#9.9p1 and
https://sources.debian.org/src/openssh/1:9.9p1-1/sshd.c/?hl=576#L573
This is now present as of openssh-server v1:9.9p1-1 since end of
September 2024 also in Debian/trixie.

Now, when too many SSH logins fail, a client system can't necessarily no
longer connect via SSH due this new penalty behavior. And indeed, within
our Jenkins job "daily-build-install-vm" we try to collect several log
files through our grab_log and SSH wrapper:

| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer-debug.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-debug.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/tmp/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/ngcp-installer.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /tmp/ngcp-installer-cmdline.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/ngcp-installer-cmdline.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/deployment.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/deployment.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /mnt/var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/grml-debootstrap.log /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/grml-debootstrap.log
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/syslog /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/syslog
| + timeout 20 sipwise-ssh-copier 192.168.210.101 root sipwise /var/log/boot /buildtmpfs/tmp_jenkins-vm-builder/vmbuilder101/192.168.210.101/boot

We even execute this grab_log wrapper twice: once for the running Grml
live system, and once when we booted into the actually deployed system.
This works fine for the Grml live system situation, but as root logins
aren't allowed by default in OpenSSH since quite some time, all the
sipwise-ssh-copier runs with user/password against a plain Debian system
then fail.

As a consequence, we lock ourselves out of the system with all those SSH
login failures, and the Jenkins job proxmox-vm-clean-fs then runs into
the OpenSSH penalty, which causes the trixie/debian job to fail.

We use our Debian images as base for further configuration, where we
control the sshd_config file through our ngcpcfg system anyways, so the
`PerSourcePenalties no` setting is supposed to disappear then.

FTR: We could also enable `PermitRootLogin yes` in sshd_config to get
the grab_log working, though this didn't have any relevance for us so
far. Disabling only the `PerSourcePenalties` feature feels like a better
trade-off, at least security wise, for now.

Change-Id: Ibf16019b4787cc63d450501c8bccebeac77dd9f1
mr13.1.1
Michael Prokop 7 months ago
parent 4debc55f6b
commit 6eee97de7b

@ -2216,6 +2216,15 @@ case "${DEBIAN_RELEASE}" in
;;
esac
# MT#61265 avoid "penalty: failed authentication" in automated SSH/SCP actions in Jenkins jobs
case "${DEBIAN_RELEASE}" in
trixie)
echo "Disabling PerSourcePenalties in /etc/ssh/sshd_config for Debian release '${DEBIAN_RELEASE}'"
echo '# added by deployment.sh' >> "${TARGET}"/etc/ssh/sshd_config
echo 'PerSourcePenalties no' >> "${TARGET}"/etc/ssh/sshd_config
;;
esac
# MT#7805
if "$NGCP_INSTALLER" ; then
cat << EOT | augtool --root="$TARGET"

Loading…
Cancel
Save