Followup fix for commit 5e6c5363a1
grep interprets the provided argument `-backports` as option. We need to
mark the end of the grep options accordingly via "--", to avoid failure
during grep execution.
Change-Id: I6f0360a34583b2d0e961d282be16b3f6e90445a2
On recent EC2 bookworm AMIs (as observed with our mr13.2.1 EC2 build
with AMI ID ami-09ca7561204a0d1d4), the bookworm-backports repository is
enabled by default:
| admin@ip-10-0-0-204:~$ cat /etc/apt/sources.list.d/debian.sources
| Types: deb deb-src
| URIs: mirror+file:///etc/apt/mirrors/debian.list
| Suites: bookworm bookworm-updates bookworm-backports
| Components: main
| Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
| Types: deb deb-src
| URIs: mirror+file:///etc/apt/mirrors/debian-security.list
| Suites: bookworm-security
| Components: main
| Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
Our own valkey-server + valkey-tools backport has the identical *version*
number with a *higher* apt pinning though:
| admin@ip-10-0-0-204:~$ apt-cache policy valkey-server valkey-tools
| valkey-server:
| Installed: 8.0.1+dfsg1-1~bpo12+1
| Candidate: 8.0.1+dfsg1-1~bpo12+1
| Version table:
| *** 8.0.1+dfsg1-1~bpo12+1 100
| 100 mirror+file:/etc/apt/mirrors/debian.list bookworm-backports/main amd64 Packages
| 100 /var/lib/dpkg/status
| 8.0.1+dfsg1-1~bpo12+1 990
| 990 https://deb.sipwise.com/spce/mr13.2.1 bookworm/main amd64 Packages
| valkey-tools:
| Installed: 8.0.1+dfsg1-1~bpo12+1
| Candidate: 8.0.1+dfsg1-1~bpo12+1
| Version table:
| *** 8.0.1+dfsg1-1~bpo12+1 100
| 100 mirror+file:/etc/apt/mirrors/debian.list bookworm-backports/main amd64 Packages
| 100 /var/lib/dpkg/status
| 8.0.1+dfsg1-1~bpo12+1 990
| 990 https://deb.sipwise.com/spce/mr13.2.1 bookworm/main amd64 Packages
This causes installer to fail therefor with:
| Making sure all packages are up to date ...
| Reading package lists...
| Building dependency tree...
| Reading state information...
| Calculating upgrade...
| The following packages will be DOWNGRADED:
| valkey-server valkey-tools
| 0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 0 not upgraded.
| E: Packages were downgraded and -y was used without --allow-downgrades.
| ABORTED: Error upgrading packages
We need to make sure that such a backports repository isn't enabled by
default.
Change-Id: I3fbba0b897479109ba44df43bafd7fff89647a25
Usage of deb.debian.org for our EC2 instances was introduced in commit
b05a298 back in 2017, when cdn-aws.deb.debian.org often failed with 503
http errors. This is no longer true, and cdn-aws.deb.debian.org seems to
work reliable.
Furthermore, nowadays the Debian bookworm AMIs use the deb822 format:
| $ cat /etc/apt/sources.list.d/debian.sources
| Types: deb deb-src
| URIs: mirror+file:///etc/apt/mirrors/debian.list
| Suites: bookworm bookworm-updates bookworm-backports
| Components: main
|
| Types: deb deb-src
| URIs: mirror+file:///etc/apt/mirrors/debian-security.list
| Suites: bookworm-security
| Components: main
|
| $ cat /etc/apt/mirrors/debian.list
| https://cdn-aws.deb.debian.org/debian
|
| $ cat /etc/apt/mirrors/debian-security.list
| https://cdn-aws.deb.debian.org/debian-security
|
| $ cat /etc/apt/sources.list
| # See /etc/apt/sources.list.d/debian.sources
Given that /etc/apt/sources.list consists only of comments and that
cdn-aws.deb.debian.org works as expected, let's drop this NOOP command.
Change-Id: If56d9d2a030db52e805286a0115f6e6e561ac6bf
The variable "${vmversion}" is not available within this script and comes
from a c/p bug introduced in commit 8c26300a.
This fixes a build failure for all NGCP releases that don't provide status
information on port 4242 (which is available only with >=mr7.5) but instead
need to be checked via /etc/sipwise_ngcp_version.
Change-Id: Ic2b233a69737c5424e30d6971ed642c19b3adac7
This reverts commit 81fc3d2433.
This doesn't fix our issue, so revert this unncessary change
which might actually slow down SSH connections.
Change-Id: Ib6b2fdb81ce471d457788239468fb53ab75fbe1e
We've failing builds for mr6.5.8 and it is yet unclear why we're running
into the timeout of the "Waiting when system is configured" check, because
the system gets configured fine:
| admin@ip-10-0-0-15:~$ cat /etc/sipwise_ngcp_version
| System installed. NGCP version mr6.5.8 on 2020-03-17 09:18:36
| System configured. NGCP version mr6.5.8 on 2020-03-17 09:26:18
We don't see any SSH connection attempts between invocation of
ngcp-initial-configuration and running into the check timeout, when we then
grab /var/log/ngcp-installer*log.
To avoid having an SSH connection running with ControlMaster let's
see whether explicitly disabling this modifies the behavior.
Change-Id: I7af5e7113b5c031acc5cdf78c9904dabf00e9ada
We need to reboot the system before the configuration as there can be
upgrade of kernel. If it happens ngcp-initial-configuration tries to
start ngcp-license-client.service and fails as there is no module for
current kernel.
Use a wrapper for ssh commands to reduce the length of the lines.
Change-Id: I68042891aa6193abe44baf873c273d068c3191f6
ec2-api-tools are deprecated since quite some time and they don't
support Java v11 (as present on Debian/buster). So let's (finally!)
port our code from ec2-api-tools to awscli.
awscli gives us the option to use json output. This makes it possible
to parse it properly, instead of relying on the position of some key
within some section.
While at it update the defaults for BASE_AMI and INSTANCE_TYPE to
current sane settings and update the coding style to match current best
practices (like "${FOOBAR}" instead of "$FOOBAR") and fix shellcheck
issues.
Nowadays there are also several new AWS regions (namely ca-central-1,
ap-east-1, ap-northeast-2, ap-south-1, eu-north-1, eu-west-2, eu-west-3,
me-south-1 + us-east-2) which we could enable but don't *yet*.
Change-Id: I147c8a6c2ae1fca4e680df8d2fde170b2f33f856
While debugging our AMI issues it turned out it would be useful to be
able to provide a custom ngcp-installer.deb during ec2-ami-ce execution.
Provide according support via `--installer-url URL`.
Change-Id: Icf9b6b98dcb956469004a2fe182b9a10ba2ccdec
Instead of hardcoding the IP address for the network interface
that was present at the time of invocation of ngcp-initial-configuration
let's use DHCP instead. This should prevent the system from assigning
and using an IP address that is no longer present, once it was changed
from within the AWS EC2 instance.
Change-Id: I298a2ac1a08f57e45de51c46c02d8832d2db740a
For mr6.5.1+ it is necessary to modify config_deploy.inc and then run
installer and configuration.
For <mr6.5.1 it is necessary to run installer with env variable
FORCE=yes.
Change-Id: Ib403f6686bb77f3c3465187bab01609358869237
At the moment the following error happens:
> + ec2-associate-address --region eu-west-1 --allocation-id eipalloc-0a96e07d4355e12fb --instance i-0895e8ae5e15da659
> Client.Resource.AlreadyAssociated: resource eipalloc-0a96e07d4355e12fb is already associated with associate-id eipassoc-02667670aacf004de (Service: AmazonEC2; Status Code: 400; Error Code: Resource.AlreadyAssociated; Request ID: 4f646755-90eb-49bc-9100-2603fa5eb3f9)
Which caused the non-accessible DNS name resolution:
> ++ ec2-describe-instances --region eu-west-1 --filter instance-id=i-0895e8ae5e15da659 | awk '/INSTANCE/ {print $4}'
> + HOSTNAME=ip-10-0-0-109.eu-west-1.compute.internal
> ...
> + ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /var/lib/jenkins/.ssh/jenkins-ngcp-create-ami.pem admin@ip-10-0-0-109.eu-west-1.compute.internal 'grep -q '\''Installation finished. Thanks for choosing NGCP'\'' /var/log/ngcp-installer-debug.log'
> ssh: Could not resolve hostname ip-10-0-0-109.eu-west-1.compute.internal: Name or service not known
We need to ensure public IP will be reassociated from the old Amazon machine to
newly started Amazon VM, using option --allow-reassociation here:
> --allow-reassociation
> Allows an Elastic IP address that is already associated with an
> instance or a network interface to be re-associated with the
> specified instance or network interface. Otherwise, the operation
> fails.
Change-Id: I5b813c7de313743e6ad8a44abb57c3c40216d067
AWS instances usually move from state 'stopped' to 'pending' and
to 'running'. If we try to assign an elastic IP address to an
instance that's still in state 'pending', it fails with:
| Client.InvalidInstanceID: The pending instance 'i-0a6a707e93073a3a0' is not in a valid state for this operation. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidInstanceID; Request ID: 0ac151fc-5d89-45c1-b2da-9b3bd677c7f2)
Change-Id: I4c62da9c7ba5b2e65e0324abb2260c9f268b3865
If ip is assigned later ip field in ec2-describe-instances output
is not stable and sometimes it's shown as automatically assigned
one (for instance ip-10-101-136-154.eu-west-1.compute.internal)
(status is in field 6 then) and sometimes it's just empty
(status is in field 5 then). So it looks that having ip assigned
on earlier stage(like it was) makes sense and does
ec2-describe-instances output stable under all circumstances.
So reverting the change here.
This reverts commit 97b7d218e7.
Change-Id: I26bc165ddfe7c93d9bbaee0ad80811f77af89d6a
In commit 24ee66dd5a we have placed
IP assigning after status check which calls ec2-describe-instances
and checks 6th column.
But in such a case (without IP) the status is 5th column so change
the number.
Change-Id: I6e528ba9dd0def88d588cf8dd7b31a0996098096
Sometime it takes a lot of time for instance to get running state so
ec2-associate-address failed with InvalidInstanceID error.
Change-Id: I9481a8b46e5e0ce061a31278f9c0338d53ca7294
Using the current hardware generation depends on usage of VPC.
Support options --allocation-id + --subnet to specify
the according allocation + subnet IDs for usage with a VPC.
While at it update the default base-ami to current Debian/stretch
and set default instance-type to t2.medium.
Change-Id: I760889b938ee8ec62d82c15211a297dbbabc9bd6
This is supposed to fix the hanging SSH connection when
triggering the reboot:
| jenkins@jenkins-slave7:~$ ssh -o ServerAliveInterval=5 -o ServerAliveCountMax=1 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i [...] sudo reboot -f
| [...]
| Rebooting.
| Timeout, server ec2-[...].eu-west-1.compute.amazonaws.com not responding.
Without ServerAliveInterval + ServerAliveCountMax options
the session might get stuck at "Rebooting", timing out only
after 2 hours at worst (which is the TCP keepalive default
timeout, see /proc/sys/net/ipv4/tcp_keepalive_time).
Also drop the force option of reboot, which is causing troubles
on sysv systems. The ssh process isn't terminated cleanly then,
but ssh is running into the TCP keepalive timeout then.
It seems also our systemd backport with the async reboot/halt/poweroff
patch works fine, while with the "-f" option it's failing similar
to the sysv situation.
Change-Id: I3c2c36234d6282f96073ef558fe287d0ac3fa192
When triggering 'sudo reboot' it fails with:
| System has not been booted with systemd as init system (PID 1). Can't operate.
We need to reboot the system though to get a
functional system we can ran tests against.
Quoting halt(8):
| -f, --force
|
| Force immediate halt, power-off, or reboot. When
| specified once, this results in an immediate but clean
| shutdown by the system manager. When specified twice,
| this results in an immediate shutdown without
| contacting the system manager. See the description of
| --force in systemctl(1) for more details.
Quoting systemctl(1):
| -f, --force
|
| [...]
| When used with halt, poweroff, reboot or kexec,
| execute the selected operation without shutting down
| all units. However, all processes will be killed
| forcibly and all file systems are unmounted or
| remounted read-only. This is hence a drastic but
| relatively safe option to request an immediate reboot.
Change-Id: Ic3dbadb81fe91540ee4d53d48bdb9e7d8bb6668c
In end of october we had many running instances because of aborted
ec2-ami-ce Jenkins jobs. To ensure this doesn't happen again
let's check for already running instances if we find any of them
abort within ec2-create-ce before creating any ones.
While this limits us to not have several of them running at the
same time (within the same region and using the same tag!), this
shouldn't be a problem for us for the time being.
Change-Id: I5707f611c6c8a08664a1dbbef08332c36dd2150f
Stretch based amazon image has cdn-aws.deb.debian.org as debian mirror.
It fails very often with 503 http error. For previous amazon
jessie-based images different mirror cloudfront.debian.net was used
and it didn't fail even though they both terminate via/on
cloudfront.net. Let's use deb.debian.org instead.
Change-Id: I100a79e2e6fe4c1766b93586166aa29d1b48d543
Here we add @reboot crontab before reboot and then remove it after
installation finished after reboot
Change-Id: I76c0111cc2f0b76f6f8ff905b69816138ce14281
Otherwise might fail with:
| $ cat /var/log/cloud-init-output.log
| [...]
| Package sysvinit-core is not available, but is referred to by another package.
| This may mean that the package is missing, has been obsoleted, or
| is only available from another source
| However the following packages replace it:
| systemd-sysv
|
| E: Package 'sysvinit-core' has no installation candidate
While at it switch from apt-get to apt.
Change-Id: I6654a1a09fea2a963ca4fc7da26fe1d343462a07
Otherwise we're failing at:
| ABORTED: Error while installing sudo coreutils ssh mawk debsums libtemplate-perl sed pwgen ngcp-system-tools-ce sysvinit-core systemd-shim- systemd- libpam-systemd- cgmanager-ci-info
due to:
| systemd is the active init system, please switch to another before removing systemd.
While at it fix shellcheck related warnings to be able to
bypass gerrit review.
Change-Id: Ice0b8760c502c5b6d2acff2fe0990e03717f6f5c
Since quite some time nginx listens on (a) specific IP
address(es) instead of just listening on all interfaces (see
previous commit a225c3a15f3e6d0 AKA Change-Id: I42257d8b2a1eedf7433158477983cfc5f9c97315).
Because we didn't error out if SSH-ing fails or port 1443 isn't
in LISTEN state the script just checked until it ran into the
retry limit and then continued without failing. Instead let's
report this error to the Jenkins job/user by failing the run if
either SSH doesn't work or port 1443 isn't in LISTEN state.
While at it move the "retry" variable assignment in front
of the reporting, so we don't end it with "1 retries left"
and also get rid of `unnecessary on arithmetic variables`
in subshell (the $retry usage).
Change-Id: Ic7a4a39e460c4c81866c12b4b8b90c69f213538f
If e.g. the ngcp installation fails the ec2-ami-ce Jenkins job
fails to generate the ec2_report.txt file and then ec2-ami-stop
doesn't get the according information which instance ID should be
stopped, causing unnecessary instances still running.
Example run available at
https://jenkins.mgm.sipwise.com/job/ec2-ami-ce/85/consoleFull
with its downstream job
https://jenkins.mgm.sipwise.com/job/ec2-ami-stop/70/console ->
| 12:11:35 Copied 0 artifacts from "ec2-ami-ce" build number 85
| 12:11:35 ERROR: Failed to copy artifacts from ec2-ami-ce with filter: ec2_report.txt
Change-Id: I3a6a47c3fd2e19ef5a2e724451e8221bab4a8817
The ngcp-installer has failed, the error is:
> 2016-01-29 11:43:21: Executing sync-db:
> insert new rtp_interface ext=ext from config into db
> fatal: $HOME not set
> fatal: $HOME not set
It can be related to the new Jessie base AMI image we have switched,
the old one Wheezy AMI image has probably HOME defined.
While we are not sure here.
Change-Id: I3ab7dbb59f6081fb680c8447c18d29041a2f6563
We have added one more question for CE users in mr4.2:
> Do you want to proceed with replacing 'systemd' with 'sysv'? (y/N)
Which blocks ngcp-installer during AMI image creation.
Change-Id: Ib56dd8c7048837e01f27a8b39026b275ce4bf7f8
It fails if the AMIs are pending:
| Client.InvalidAMIID.Unavailable: The AMI ID '...' is currently pending and may not be used for this operation
and we don't want to block (yet).