TT#118659 Wipe disk signatures more reliably with SW-RAID and NVMe setup

Deployed current NGCP trunk on NVMe powered SW-RAID setup failed with:

| mdadm: size set to 468218880K
| mdadm: automatically enabling write-intent bitmap on large array
| Continue creating array? mdadm: Defaulting to version 1.2 metadata
| mdadm: array /dev/md0 started.
| Creating PV + VG on /dev/md0
|   Cannot use /dev/md0: device is partitioned

This is caused because /dev/md0 still contains partition data, and
its nvme1n1p3 also still has disk signature about linux_raid_member.

So it's *not* enough to stop the mdadm array, remove PV/LVM information
from the partitions and finally wipe SW-RAID disks /dev/nvme1n1 +
/dev/nvme0n1 (example output from such a failing run):

| mdadm: /dev/md/0 has been started with 2 drives.
| mdadm: stopped /dev/md0
| mdadm: Unrecognised md component device - /dev/nvme1n1
| mdadm: Unrecognised md component device - /dev/nvme0n1
| Removing possibly existing LVM/PV label from /dev/nvme1n1
|   Cannot use /dev/nvme1n1: device is partitioned
| Removing possibly existing LVM/PV label from /dev/nvme1n1p1
|   Cannot use /dev/nvme1n1p1: device is too small (pv_min_size)
| Removing possibly existing LVM/PV label from /dev/nvme1n1p2
|   Labels on physical volume "/dev/nvme1n1p2" successfully wiped.
| Removing possibly existing LVM/PV label from /dev/nvme1n1p3
|   Cannot use /dev/nvme1n1p3: device is an md component
| Wiping disk signatures from /dev/nvme1n1
| /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
| /dev/nvme1n1: calling ioctl to re-read partition table: Success
| 1+0 records in
| 1+0 records out
| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00314195 s, 334 MB/s
| Removing possibly existing LVM/PV label from /dev/nvme0n1
|   Cannot use /dev/nvme0n1: device is partitioned
| Removing possibly existing LVM/PV label from /dev/nvme0n1p1
|   Cannot use /dev/nvme0n1p1: device is too small (pv_min_size)
| Removing possibly existing LVM/PV label from /dev/nvme0n1p2
|   Labels on physical volume "/dev/nvme0n1p2" successfully wiped.
| Removing possibly existing LVM/PV label from /dev/nvme0n1p3
|   Cannot use /dev/nvme0n1p3: device is an md component
| Wiping disk signatures from /dev/nvme0n1
| /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54
| /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
| /dev/nvme0n1: calling ioctl to re-read partition table: Success
| 1+0 records in
| 1+0 records out
| 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00893285 s, 117 MB/s
| Creating partition table
| Get path of EFI partition
| pvdevice is now available: /dev/nvme1n1p2
| The operation has completed successfully.
| The operation has completed successfully.
| pvdevice is now available: /dev/nvme1n1p3
| pvdevice is now available: /dev/nvme0n1p3
| mdadm: /dev/nvme1n1p3 appears to be part of a raid array:
|        level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023
| mdadm: Note: this array has metadata at the start and
|     may not be suitable as a boot device.  If you plan to
|     store '/boot' on this device please ensure that
|     your boot-loader understands md/v1.x metadata, or use
|     --metadata=0.90
| mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
|        level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023
| mdadm: size set to 468218880K
| mdadm: automatically enabling write-intent bitmap on large array
| Continue creating array? mdadm: Defaulting to version 1.2 metadata
| mdadm: array /dev/md0 started.
| Creating PV + VG on /dev/md0
|   Cannot use /dev/md0: device is partitioned

Instead we also need to wipe signatures from the SW-RAID device (like
/dev/md0), only then stop it, ensure we wipe disk signatures also from
all the partitions (like /dev/nvme1n1p3) and only then finally remove
the disk signatures from the main block device (like /dev/nvme1n1).
Example from a successful run with this change:

| root@grml ~ # grep -e mdadm -e Wiping /tmp/deployment-installer-debug.log
| mdadm: /dev/md/0 has been started with 2 drives.
| Wiping signatures from /dev/md0
| Removing mdadm device /dev/md0
| Stopping mdadm device /dev/md0
| mdadm: stopped /dev/md0
| mdadm: Unrecognised md component device - /dev/nvme1n1
| mdadm: Unrecognised md component device - /dev/nvme0n1
| Wiping disk signatures from partition /dev/nvme1n1p1
| Wiping disk signatures from partition /dev/nvme1n1p2
| Wiping disk signatures from partition /dev/nvme1n1p3
| Wiping disk signatures from /dev/nvme1n1
| Wiping disk signatures from partition /dev/nvme0n1p1
| Wiping disk signatures from partition /dev/nvme0n1p2
| Wiping disk signatures from partition /dev/nvme0n1p3
| Wiping disk signatures from /dev/nvme0n1
| mdadm: Note: this array has metadata at the start and
| mdadm: size set to 468218880K
| mdadm: automatically enabling write-intent bitmap on large array
| Continue creating array? mdadm: Defaulting to version 1.2 metadata
| mdadm: array /dev/md0 started.
|   Wiping ext3 signature on /dev/ngcp/root.
|   Wiping ext4 signature on /dev/ngcp/fallback.
|   Wiping ext4 signature on /dev/ngcp/data.

While at it, be more verbose about the executed steps.

FTR, disk and setup information of such a system where we noticed the
failure and worked on this change:

| root@grml ~ # fdisk -l
| Disk /dev/nvme0n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors
| Disk model: DELL NVME ISE PE8010 RI M.2 480GB
| Units: sectors of 1 * 512 = 512 bytes
| Sector size (logical/physical): 512 bytes / 512 bytes
| I/O size (minimum/optimal): 512 bytes / 512 bytes
| Disklabel type: gpt
| Disk identifier: 5D296676-52CF-49CF-863A-6D3A3BD0604F
|
| Device          Start       End   Sectors   Size Type
| /dev/nvme0n1p1   2048      4095      2048     1M BIOS boot
| /dev/nvme0n1p2   4096    999423    995328   486M EFI System
| /dev/nvme0n1p3 999424 937701375 936701952 446.7G Linux RAID
|
|
| Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors
| Disk model: DELL NVME ISE PE8010 RI M.2 480GB
| Units: sectors of 1 * 512 = 512 bytes
| Sector size (logical/physical): 512 bytes / 512 bytes
| I/O size (minimum/optimal): 512 bytes / 512 bytes
| Disklabel type: gpt
| Disk identifier: 9AFA8ACF-D2CD-4224-BA0C-D38A6581D0F9
|
| Device          Start       End   Sectors   Size Type
| /dev/nvme1n1p1   2048      4095      2048     1M BIOS boot
| /dev/nvme1n1p2   4096    999423    995328   486M EFI System
| /dev/nvme1n1p3 999424 937701375 936701952 446.7G Linux RAID
| [...]
|
| root@grml ~ # lsblk
| NAME                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
| loop0                 7:0    0 428.8M  1 loop  /usr/lib/live/mount/rootfs/ngcp.squashfs
|                                                /run/live/rootfs/ngcp.squashfs
| nvme0n1             259:0    0 447.1G  0 disk
| ├─nvme0n1p1         259:5    0     1M  0 part
| ├─nvme0n1p2         259:8    0   486M  0 part
| └─nvme0n1p3         259:9    0 446.7G  0 part
|   └─md0               9:0    0 446.5G  0 raid1
|     ├─ngcp-root     253:0    0    10G  0 lvm   /mnt
|     ├─ngcp-fallback 253:1    0    10G  0 lvm
|     └─ngcp-data     253:2    0 383.9G  0 lvm   /mnt/ngcp-data
| nvme1n1             259:4    0 447.1G  0 disk
| ├─nvme1n1p1         259:2    0     1M  0 part
| ├─nvme1n1p2         259:6    0   486M  0 part
| └─nvme1n1p3         259:7    0 446.7G  0 part
|   └─md0               9:0    0 446.5G  0 raid1
|     ├─ngcp-root     253:0    0    10G  0 lvm   /mnt
|     ├─ngcp-fallback 253:1    0    10G  0 lvm
|     └─ngcp-data     253:2    0 383.9G  0 lvm   /mnt/ngcp-data
|
| root@grml ~ # cat /proc/mdstat
| Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
| md0 : active raid1 nvme0n1p3[1] nvme1n1p3[0]
|       468218880 blocks super 1.2 [2/2] [UU]
|       [==>..................]  resync = 12.7% (59516864/468218880) finish=33.1min speed=205685K/sec
|       bitmap: 4/4 pages [16KB], 65536KB chunk
|
| unused devices: <none>

Change-Id: Iaa7f49eef11ef6ad6209fe962bb8940a75a87c95
mr12.3
Michael Prokop 1 year ago
parent 76893e3acb
commit e9244a289b

@ -490,6 +490,14 @@ clear_partition_table() {
pvremove "$disk" --force --force --yes || true
done
# ensure we remove signatures from partitions like /dev/nvme1n1p3 first,
# and only then get rid of signaturs from main blockdevice /dev/nvme1n1
for partition in $(lsblk --noheadings --output KNAME "${blockdevice}" | grep -v "^${blockdevice#\/dev\/}$") ; do
[ -b "${partition}" ] || continue
echo "Wiping disk signatures from partition ${partition}"
wipefs -a "${partition}"
done
echo "Wiping disk signatures from ${blockdevice}"
wipefs -a "${blockdevice}"
@ -608,9 +616,19 @@ set_up_partition_table_swraid() {
if [[ -b "${SWRAID_DEVICE}" ]] ; then
if [[ "${SWRAID_DESTROY}" = "true" ]] ; then
echo "Wiping signatures from ${SWRAID_DEVICE}"
wipefs -a "${SWRAID_DEVICE}"
echo "Removing mdadm device ${SWRAID_DEVICE}"
mdadm --remove "${SWRAID_DEVICE}"
echo "Stopping mdadm device ${SWRAID_DEVICE}"
mdadm --stop "${SWRAID_DEVICE}"
echo "Zero-ing superblock from /dev/${SWRAID_DISK1}"
mdadm --zero-superblock "/dev/${SWRAID_DISK1}"
echo "Zero-ing superblock from /dev/${SWRAID_DISK2}"
mdadm --zero-superblock "/dev/${SWRAID_DISK2}"
else
echo "NOTE: if you are sure you don't need it SW-RAID device any longer, execute:"

Loading…
Cancel
Save