Deployed current NGCP trunk on NVMe powered SW-RAID setup failed with: | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Creating PV + VG on /dev/md0 | Cannot use /dev/md0: device is partitioned This is caused because /dev/md0 still contains partition data, and its nvme1n1p3 also still has disk signature about linux_raid_member. So it's *not* enough to stop the mdadm array, remove PV/LVM information from the partitions and finally wipe SW-RAID disks /dev/nvme1n1 + /dev/nvme0n1 (example output from such a failing run): | mdadm: /dev/md/0 has been started with 2 drives. | mdadm: stopped /dev/md0 | mdadm: Unrecognised md component device - /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | Removing possibly existing LVM/PV label from /dev/nvme1n1 | Cannot use /dev/nvme1n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme1n1p1 | Cannot use /dev/nvme1n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme1n1p2 | Labels on physical volume "/dev/nvme1n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme1n1p3 | Cannot use /dev/nvme1n1p3: device is an md component | Wiping disk signatures from /dev/nvme1n1 | /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme1n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00314195 s, 334 MB/s | Removing possibly existing LVM/PV label from /dev/nvme0n1 | Cannot use /dev/nvme0n1: device is partitioned | Removing possibly existing LVM/PV label from /dev/nvme0n1p1 | Cannot use /dev/nvme0n1p1: device is too small (pv_min_size) | Removing possibly existing LVM/PV label from /dev/nvme0n1p2 | Labels on physical volume "/dev/nvme0n1p2" successfully wiped. | Removing possibly existing LVM/PV label from /dev/nvme0n1p3 | Cannot use /dev/nvme0n1p3: device is an md component | Wiping disk signatures from /dev/nvme0n1 | /dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54 | /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa | /dev/nvme0n1: calling ioctl to re-read partition table: Success | 1+0 records in | 1+0 records out | 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00893285 s, 117 MB/s | Creating partition table | Get path of EFI partition | pvdevice is now available: /dev/nvme1n1p2 | The operation has completed successfully. | The operation has completed successfully. | pvdevice is now available: /dev/nvme1n1p3 | pvdevice is now available: /dev/nvme0n1p3 | mdadm: /dev/nvme1n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 | mdadm: Note: this array has metadata at the start and | may not be suitable as a boot device. If you plan to | store '/boot' on this device please ensure that | your boot-loader understands md/v1.x metadata, or use | --metadata=0.90 | mdadm: /dev/nvme0n1p3 appears to be part of a raid array: | level=raid1 devices=2 ctime=Wed Dec 20 20:35:21 2023 | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Creating PV + VG on /dev/md0 | Cannot use /dev/md0: device is partitioned Instead we also need to wipe signatures from the SW-RAID device (like /dev/md0), only then stop it, ensure we wipe disk signatures also from all the partitions (like /dev/nvme1n1p3) and only then finally remove the disk signatures from the main block device (like /dev/nvme1n1). Example from a successful run with this change: | root@grml ~ # grep -e mdadm -e Wiping /tmp/deployment-installer-debug.log | mdadm: /dev/md/0 has been started with 2 drives. | Wiping signatures from /dev/md0 | Removing mdadm device /dev/md0 | Stopping mdadm device /dev/md0 | mdadm: stopped /dev/md0 | mdadm: Unrecognised md component device - /dev/nvme1n1 | mdadm: Unrecognised md component device - /dev/nvme0n1 | Wiping disk signatures from partition /dev/nvme1n1p1 | Wiping disk signatures from partition /dev/nvme1n1p2 | Wiping disk signatures from partition /dev/nvme1n1p3 | Wiping disk signatures from /dev/nvme1n1 | Wiping disk signatures from partition /dev/nvme0n1p1 | Wiping disk signatures from partition /dev/nvme0n1p2 | Wiping disk signatures from partition /dev/nvme0n1p3 | Wiping disk signatures from /dev/nvme0n1 | mdadm: Note: this array has metadata at the start and | mdadm: size set to 468218880K | mdadm: automatically enabling write-intent bitmap on large array | Continue creating array? mdadm: Defaulting to version 1.2 metadata | mdadm: array /dev/md0 started. | Wiping ext3 signature on /dev/ngcp/root. | Wiping ext4 signature on /dev/ngcp/fallback. | Wiping ext4 signature on /dev/ngcp/data. While at it, be more verbose about the executed steps. FTR, disk and setup information of such a system where we noticed the failure and worked on this change: | root@grml ~ # fdisk -l | Disk /dev/nvme0n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | Disk model: DELL NVME ISE PE8010 RI M.2 480GB | Units: sectors of 1 * 512 = 512 bytes | Sector size (logical/physical): 512 bytes / 512 bytes | I/O size (minimum/optimal): 512 bytes / 512 bytes | Disklabel type: gpt | Disk identifier: 5D296676-52CF-49CF-863A-6D3A3BD0604F | | Device Start End Sectors Size Type | /dev/nvme0n1p1 2048 4095 2048 1M BIOS boot | /dev/nvme0n1p2 4096 999423 995328 486M EFI System | /dev/nvme0n1p3 999424 937701375 936701952 446.7G Linux RAID | | | Disk /dev/nvme1n1: 447.13 GiB, 480103981056 bytes, 937703088 sectors | Disk model: DELL NVME ISE PE8010 RI M.2 480GB | Units: sectors of 1 * 512 = 512 bytes | Sector size (logical/physical): 512 bytes / 512 bytes | I/O size (minimum/optimal): 512 bytes / 512 bytes | Disklabel type: gpt | Disk identifier: 9AFA8ACF-D2CD-4224-BA0C-D38A6581D0F9 | | Device Start End Sectors Size Type | /dev/nvme1n1p1 2048 4095 2048 1M BIOS boot | /dev/nvme1n1p2 4096 999423 995328 486M EFI System | /dev/nvme1n1p3 999424 937701375 936701952 446.7G Linux RAID | [...] | | root@grml ~ # lsblk | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS | loop0 7:0 0 428.8M 1 loop /usr/lib/live/mount/rootfs/ngcp.squashfs | /run/live/rootfs/ngcp.squashfs | nvme0n1 259:0 0 447.1G 0 disk | ├─nvme0n1p1 259:5 0 1M 0 part | ├─nvme0n1p2 259:8 0 486M 0 part | └─nvme0n1p3 259:9 0 446.7G 0 part | └─md0 9:0 0 446.5G 0 raid1 | ├─ngcp-root 253:0 0 10G 0 lvm /mnt | ├─ngcp-fallback 253:1 0 10G 0 lvm | └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data | nvme1n1 259:4 0 447.1G 0 disk | ├─nvme1n1p1 259:2 0 1M 0 part | ├─nvme1n1p2 259:6 0 486M 0 part | └─nvme1n1p3 259:7 0 446.7G 0 part | └─md0 9:0 0 446.5G 0 raid1 | ├─ngcp-root 253:0 0 10G 0 lvm /mnt | ├─ngcp-fallback 253:1 0 10G 0 lvm | └─ngcp-data 253:2 0 383.9G 0 lvm /mnt/ngcp-data | | root@grml ~ # cat /proc/mdstat | Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] | md0 : active raid1 nvme0n1p3[1] nvme1n1p3[0] | 468218880 blocks super 1.2 [2/2] [UU] | [==>..................] resync = 12.7% (59516864/468218880) finish=33.1min speed=205685K/sec | bitmap: 4/4 pages [16KB], 65536KB chunk | | unused devices: <none> Change-Id: Iaa7f49eef11ef6ad6209fe962bb8940a75a87c95mr12.3
parent
76893e3acb
commit
e9244a289b
Loading…
Reference in new issue