TT#118659 Fix re-deploying over existing SW-RAID arrays

Fresh deployments with SW-RAID (Software-RAID) might fail if the present
disks were already part of an SW-RAID setup:

| Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.

We could also reproduce this inside PVE VMs:

| mdadm: /dev/md/127 has been started with 2 drives.
| Error: disk sda seems to be part of an existing SW-RAID setup.

This is caused by the following behavior:

| + SWRAID_DEVICE="/dev/md0"
| [...]
| + mdadm --assemble --scan
| + true
| + [[ -b /dev/md0 ]]
| + for disk in "${SWRAID_DISK1}" "${SWRAID_DISK2}"
| + grep -q nvme1n1 /proc/mdstat
| + die 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.'
| + echo 'Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.'
| Error: disk nvme1n1 seems to be part of an existing SW-RAID setup.

By default we expect and set the SWRAID_DEVICE to be /dev/md0. But only
"local" arrays get assembled as /dev/md0 and upwards, whereas "foreign"
arrays start at md127 downwards. This is exactly what we get when
booting our deployment live system on top of an existing installation,
and assemble existing SW-RAIDs (to not overwrite unexpected disks by
mistake):

| root@grml ~ # lsblk
| NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
| loop0         7:0    0 428.8M  1 loop  /usr/lib/live/mount/rootfs/ngcp.squashfs
|                                        /run/live/rootfs/ngcp.squashfs
| nvme0n1     259:0    0 447.1G  0 disk
| └─md127       9:127  0 447.1G  0 raid1
|   ├─md127p1 259:14   0    18G  0 part
|   ├─md127p2 259:15   0    18G  0 part
|   ├─md127p3 259:16   0 405.6G  0 part
|   ├─md127p4 259:17   0   512M  0 part
|   ├─md127p5 259:18   0     4G  0 part
|   └─md127p6 259:19   0     1G  0 part
| nvme1n1     259:7    0 447.1G  0 disk
| └─md127       9:127  0 447.1G  0 raid1
|   ├─md127p1 259:14   0    18G  0 part
|   ├─md127p2 259:15   0    18G  0 part
|   ├─md127p3 259:16   0 405.6G  0 part
|   ├─md127p4 259:17   0   512M  0 part
|   ├─md127p5 259:18   0     4G  0 part
|   └─md127p6 259:19   0     1G  0 part
|
| root@grml ~ # lsblk -l -n -o TYPE,NAME
| loop  loop0
| raid1 md127
| disk  nvme0n1
| disk  nvme1n1
| part  md127p1
| part  md127p2
| part  md127p3
| part  md127p4
| part  md127p5
| part  md127p6
|
| root@grml ~ # cat /proc/cmdline
| vmlinuz initrd=initrd.img swraiddestroy swraiddisk2=nvme0n1 swraiddisk1=nvme1n1 [...]

Let's identify existing RAID devices and check their configuration by
going through the disks and comparing them with our SWRAID_DISK1 and
SWRAID_DISK2. If they don't match with each other, we stop execution to
prevent any possible data damage.

Furthermore, we need to assemble the mdadm array without relying on a
possibly existing local `/etc/mdadm/mdadm.conf` configuration file.
Otherwise assembling might fail:

| root@grml ~ # cat /proc/mdstat
| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
| unused devices: <none>
| root@grml ~ # lsblk -l -n -o TYPE,NAME | awk '/^raid/ {print $2}'
| root@grml ~ # grep ARRAY /etc/mdadm/mdadm.conf
| ARRAY /dev/md/127  metadata=1.0 UUID=0d44774e:7269bac6:2f02f337:4551597b name=localhost:127
| root@grml ~ # mdadm --assemble --scan
| 2 root@grml ~ # mdadm --assemble --scan --verbose
| mdadm: looking for devices for /dev/md/127
| mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got 800989c0)
| mdadm: no RAID superblock on /dev/loop0
| mdadm: No super block found on /dev/nvme1n1p3 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme1n1p3
| mdadm: No super block found on /dev/nvme1n1p2 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme1n1p2
| mdadm: No super block found on /dev/nvme1n1p1 (Expected magic a92b4efc, got 000080fe)
| mdadm: no RAID superblock on /dev/nvme1n1p1
| mdadm: No super block found on /dev/nvme1n1 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme1n1
| mdadm: No super block found on /dev/nvme0n1p3 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme0n1p3
| mdadm: No super block found on /dev/nvme0n1p2 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme0n1p2
| mdadm: No super block found on /dev/nvme0n1p1 (Expected magic a92b4efc, got 000080fe)
| mdadm: no RAID superblock on /dev/nvme0n1p1
| mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc, got 00000000)
| mdadm: no RAID superblock on /dev/nvme0n1
| 2 root@grml ~ # mdadm --assemble --scan --config /dev/null
| mdadm: /dev/md/grml:127 has been started with 2 drives.
| root@grml ~ # lsblk -l -n -o TYPE,NAME | awk '/^raid/ {print $2}'
| md127

By running mdadm assemble with `--config /dev/null`, we prevent
consideration and usage of a possibly existing /etc/mdadm/mdadm.conf
configuration file.

Example output of running the new code:

| [...]
| mdadm: No arrays found in config file or automatically
| NOTE: default SWRAID_DEVICE set to /dev/md0 though we identified active md127
| NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup
| Wiping signatures from /dev/md127
| /dev/md127: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
| Removing mdadm device /dev/md127
| Stopping mdadm device /dev/md127
| mdadm: stopped /dev/md127
| Zero-ing superblock from /dev/nvme1n1
| mdadm: Unrecognised md component device - /dev/nvme1n1
| Zero-ing superblock from /dev/nvme0n1
| mdadm: Unrecognised md component device - /dev/nvme0n1
| NOTE: modified RAID array detected, setting SWRAID_DEVICE back to original setting '/dev/md0'
| Removing possibly existing LVM/PV label from /dev/nvme1n1
|   Cannot use /dev/nvme1n1: device is partitioned
| Removing possibly existing LVM/PV label from /dev/nvme1n1p1
|   Cannot use /dev/nvme1n1p1: device is too small (pv_min_size)
| Removing possibly existing LVM/PV label from /dev/nvme1n1p2
|   Labels on physical volume "/dev/nvme1n1p2" successfully wiped.
| Removing possibly existing LVM/PV label from /dev/nvme1n1p3
|   Cannot use /dev/nvme1n1p3: device is an md component
| Wiping disk signatures from /dev/nvme1n1
| /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
| /dev/nvme1n1: calling ioctl to re-read partition table: Success
| 1+0 records in
| 1+0 records out
| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0027866 s, 376 MB/s
| Removing possibly existing LVM/PV label from /dev/nvme0n1
|   Cannot use /dev/nvme0n1: device is partitioned
| Removing possibly existing LVM/PV label from /dev/nvme0n1p1
|   Cannot use /dev/nvme0n1p1: device is too small (pv_min_size)
| Removing possibly existing LVM/PV label from /dev/nvme0n1p2
|   Labels on physical volume "/dev/nvme0n1p2" successfully wiped.
| Removing possibly existing LVM/PV label from /dev/nvme0n1p3
|   Cannot use /dev/nvme0n1p3: device is an md component
| Wiping disk signatures from /dev/nvme0n1
| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
| /dev/nvme0n1: calling ioctl to re-read partition table: Success
| 1+0 records in
| 1+0 records out
| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00278955 s, 376 MB/s
| Creating partition table
| Get path of EFI partition
| pvdevice is now available: /dev/nvme1n1p2
| The operation has completed successfully.
| The operation has completed successfully.
| pvdevice is now available: /dev/nvme1n1p3
| pvdevice is now available: /dev/nvme0n1p3
| mdadm: /dev/nvme1n1p3 appears to be part of a raid array:
|        level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024
| mdadm: Note: this array has metadata at the start and
|     may not be suitable as a boot device.  If you plan to
|     store '/boot' on this device please ensure that
|     your boot-loader understands md/v1.x metadata, or use
|     --metadata=0.90
| mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
|        level=raid1 devices=2 ctime=Wed Jan 24 10:31:43 2024
| mdadm: size set to 468218880K
| mdadm: automatically enabling write-intent bitmap on large array
| Continue creating array? mdadm: Defaulting to version 1.2 metadata
| mdadm: array /dev/md0 started.
| Creating PV + VG on /dev/md0
|   Physical volume "/dev/md0" successfully created.
|   Volume group "ngcp" successfully created
|   0 logical volume(s) in volume group "ngcp" now active
| Creating LV 'root' with 10G
| [...]
|
| mdadm: stopped /dev/md127
| mdadm: No arrays found in config file or automatically
| NOTE: will continue with '/dev/md127' as SWRAID_DEVICE for mdadm cleanup
| Removing mdadm device /dev/md127
| Stopping mdadm device /dev/md127
| mdadm: stopped /dev/md127
| mdadm: Unrecognised md component device - /dev/nvme1n1
| mdadm: Unrecognised md component device - /dev/nvme0n1
| mdadm: /dev/nvme1n1p3 appears to be part of a raid array:
| mdadm: Note: this array has metadata at the start and
| mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
| mdadm: size set to 468218880K
| mdadm: automatically enabling write-intent bitmap on large array
| Continue creating array? mdadm: Defaulting to version 1.2 metadata
| mdadm: array /dev/md0 started.
|   lvm2 mdadm wget
| Get:1 http://http-proxy.lab.sipwise.com/debian bookworm/main amd64 mdadm amd64 4.2-5 [443 kB]
| Selecting previously unselected package mdadm.
| Preparing to unpack .../0-mdadm_4.2-5_amd64.deb ...
| Unpacking mdadm (4.2-5) ...
| Setting up mdadm (4.2-5) ...
| [...]
| mdadm: stopped /dev/md0

Change-Id: Ib5875248e9c01dd4251bfab2cc4c94daace503fa
mr12.3
Michael Prokop 1 year ago
parent e9244a289b
commit fc9b43f92e

@ -609,10 +609,38 @@ set_up_partition_table_noswraid() {
}
set_up_partition_table_swraid() {
# make sure we don't overlook unassembled SW-RAIDs
local raidev1
local raidev2
mdadm --assemble --scan || true # fails if there's nothing to assemble
local orig_swraid_device raidev1 raidev2 raid_device raid_disks
# make sure we don't overlook unassembled SW-RAIDs:
mdadm --assemble --scan --config /dev/null || true # fails if there's nothing to assemble
# "local" arrays get assembled as /dev/md0 and upwards,
# whereas "foreign" arrays start ad md127 downwards;
# since we need to also handle those, identify them:
raid_device=$(lsblk --list --noheadings --output TYPE,NAME | awk '/^raid/ {print $2}' | head -1)
# only consider changing SWRAID_DEVICE if we actually identified an RAID array:
if [[ -n "${raid_device:-}" ]] ; then
if ! [[ -b /dev/"${raid_device}" ]] ; then
die "Error: identified SW-RAID device '/dev/${raid_device}' not a valid block device."
fi
# identify which disks are part of the RAID array:
raid_disks=$(lsblk -l -n -s /dev/"${raid_device}" | grep -vw "^${raid_device}" | awk '{print $1}')
for d in ${raid_disks} ; do
# compare against expected SW-RAID disks to avoid unexpected behavior:
if ! printf "%s\n" "$d" | grep -qE "(${SWRAID_DISK1}|${SWRAID_DISK2})" ; then
die "Error: unexpected disk in RAID array /dev/${raid_device}: $d [expected SW-RAID disks: $SWRAID_DISK1 + $SWRAID_DISK2]"
fi
done
# remember the original setting, so we can use it after mdadm cleanup:
orig_swraid_device="${SWRAID_DEVICE}"
echo "NOTE: default SWRAID_DEVICE set to ${SWRAID_DEVICE} though we identified active ${raid_device}"
SWRAID_DEVICE="/dev/${raid_device}"
echo "NOTE: will continue with '${SWRAID_DEVICE}' as SWRAID_DEVICE for mdadm cleanup"
fi
if [[ -b "${SWRAID_DEVICE}" ]] ; then
if [[ "${SWRAID_DESTROY}" = "true" ]] ; then
@ -638,6 +666,11 @@ set_up_partition_table_swraid() {
fi
fi
if [[ -n "${orig_swraid_device}" ]] ; then
echo "NOTE: modified RAID array detected, setting SWRAID_DEVICE back to original setting '${orig_swraid_device}'"
SWRAID_DEVICE="${orig_swraid_device}"
fi
for disk in "${SWRAID_DISK1}" "${SWRAID_DISK2}" ; do
if grep -q "$disk" /proc/mdstat ; then
die "Error: disk $disk seems to be part of an existing SW-RAID setup."

Loading…
Cancel
Save