-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Describe the bug
Hey all,
As the title says we've ran into a very weird bug where the exact same installation media will work fine on a system with a SATA SSD, but fails after the first boot on an otherwise identical system except for an NVME SSD.
The main reason seems to be that coreos-boot-edit
injects the wrong root UUID into the kernel arguments. With NVME SSDs with LUKS enabled the UUID of the decrypted partition (dm-0
) changes after it is grown during the first boot. With an NVME SSD with LUKS disabled, or a SATA SSD regardless of LUKS state this UUID does not change.
This UUID changing does not seem to affect anything on the "base" fedora-coreos-42.20250901.3.0-live-iso.x86_64.iso
image, but our use case requires an offline installation and our devices are usually not connected to the internet. This means we do customize the base ISO to preinstall a few packages (see reproduction steps). Once we have added a certain amount/combination of packages, the issue seems to occur.
Here are a few logs and lsblk -f
outputs from various installs for comparisons:
installing the "broken" image on an NVME disk
lsblk output:
$ lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1
├─nvme0n1p1
├─nvme0n1p2 vfat FAT16 EFI-SYSTEM 7B77-95E7
├─nvme0n1p3 ext4 1.0 boot 585cb74e-6e78-43a7-8df7-8de77e6a84d4 102.9M 64% /boot
└─nvme0n1p4 crypto_LUKS 2 luks-root c3da1c0a-34bc-4f75-8de2-dac91ff126cd
└─root xfs root 988645bb-c917-4d20-a71d-2e7f0df33413 464.3G 2% /var
/sysroot/ostree/deploy/fedora-coreos/var
/etc
/sysroot
installing the exact same image as above on a SATA disk, where the UUID does not change
lsblk output:
# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1
├─sda2 vfat FAT16 EFI-SYSTEM 7B77-95E7
├─sda3 ext4 1.0 boot fe5d02e4-c11e-4423-8b29-ad8ba7b06314 102.9M 64% /boot
└─sda4 crypto_LUKS 2 luks-root c5dea5fc-4ad3-455f-bfc4-60be6331e98c
└─root xfs root 22db5ae9-37fc-4836-a795-9d58ba6d7592 113.5G 4% /var
/sysroot/ostree/deploy/fedora-coreos/var
/etc
/sysroot
installing the same image on a NVME drive, but with the ntfs-3g
, weston
, weston-demo
and chromium
packages taken out, where the system will continue to work after first boot
lsblk output:
# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1
├─nvme0n1p1
├─nvme0n1p2 vfat FAT16 EFI-SYSTEM 7B77-95E7
├─nvme0n1p3 ext4 1.0 boot dd7a33a9-0bb4-4c4f-8e9d-608c7c62ef0c 104.1M 64% /boot
└─nvme0n1p4 crypto_LUKS 2 luks-root 336cabb0-35d7-4e9e-b11b-8de44ab36a8b
└─root xfs root 7399c039-3242-4187-b43c-d6fe381c4b90 465.2G 2% /var
/sysroot/ostree/deploy/fedora-coreos/var
/etc
/sysroot
What stands out to me after several builds of trial and error and comparing logs is that only on NVME drives with LUKS enabled the UUID will change after Ignition OSTree: Grow Root Filesystem
has ran. When LUKS is disabled or when using a SATA drive, the UUID does not change after this step.
Excerpts from the first log:
ignition-ostree-growfs.service
runs and resizes dm-0
(dff64afa-e35d-4ce2-9887-c3c050306ddd
):
[ 36.300870] systemd[1]: Starting ignition-ostree-growfs.service - Ignition OSTree: Grow Root Filesystem...
[ 36.353463] XFS (dm-0): Mounting V5 Filesystem dff64afa-e35d-4ce2-9887-c3c050306ddd
[ 36.361335] XFS (dm-0): Ending clean mount
[ 37.513113] ignition-ostree-growfs[4037]: CHANGED: partition=4 start=1050624 old: size=7815168 end=8865791 new: size=999164559 end=1000215182
...
[ 38.827230] XFS (dm-0): Unmounting Filesystem dff64afa-e35d-4ce2-9887-c3c050306ddd
[ 38.837478] systemd[1]: Finished ignition-ostree-growfs.service - Ignition OSTree: Grow Root Filesystem.
ignition-ostree-transposefs-restore.service
runs and reveals dm-0
has received a new UUID (988645bb-c917-4d20-a71d-2e7f0df33413
):
[ 39.144992] systemd[1]: Starting ignition-ostree-transposefs-restore.service - Ignition OSTree: Restore Partitions...
[ 39.173496] ignition-ostree-transposefs[4353]: Restoring rootfs from RAM...
[ 39.185350] ignition-ostree-transposefs[4353]: Mounting /dev/disk/by-label/root rw (/dev/dm-0) to /sysroot
[ 39.199438] XFS (dm-0): Mounting V5 Filesystem 988645bb-c917-4d20-a71d-2e7f0df33413
[ 39.212860] XFS (dm-0): Ending clean mount
[ 42.137903] ignition-ostree-transposefs[4386]: changing security context of '/sysroot'
[ 43.905756] XFS (dm-0): Unmounting Filesystem 988645bb-c917-4d20-a71d-2e7f0df33413
[ 43.976155] systemd[1]: Finished ignition-ostree-transposefs-restore.service - Ignition OSTree: Restore Partitions.
coreos-boot-edit.service
uses the UUID from before the resize (dff64afa-e35d-4ce2-9887-c3c050306ddd
), resulting in a broken state that will not boot anymore:
[ 44.921380] systemd[1]: Starting coreos-boot-edit.service - CoreOS Boot Edit...
...
[ 45.024760] coreos-boot-edit[4572]: Injected kernel arguments into BLS: rd.luks.name=c3da1c0a-34bc-4f75-8de2-dac91ff126cd=root root=UUID=dff64afa-e35d-4ce2-9887-c3c050306ddd rw rootflags=prjquota
[ 45.025689] coreos-boot-edit[4563]: Prepared rootmap
[ 45.086114] coreos-boot-edit[4589]: Relabeled /sysroot//boot/.root_uuid from <no context> to system_u:object_r:boot_t:s0
[ 45.094043] coreos-boot-edit[4591]: Relabeled /sysroot//boot/grub2/bootuuid.cfg from <no context> to system_u:object_r:boot_t:s0
[ 45.095136] systemd[1]: Finished coreos-boot-edit.service - CoreOS Boot Edit.
Reproduction steps
- Build customized base image
cosa init --branch stable https://github.com/coreos/fedora-coreos-config
- Modify
src/config/manifests/fedora-coreos.yaml
to remove the excludes forplymouth
andpython3
, add a new include for a custom manifest - Create custom manifest:
repos:
- fedora
- fedora-updates
- fedora-coreos-pool
packages:
- plymouth
- plymouth-plugin-script
- plymouth-graphics-libs
- plymouth-plugin-label
- lm_sensors
- weston
- weston-demo
- chromium
- podman-compose
- ntfs-3g
- google-noto-emoji-fonts
- dejavu-fonts-all
cosa fetch --with-cosa-overrides
cosa build
cosa osbuild live
- Create a minimal butane file (ours just has a user for debugging) and most importantly:
boot_device:
luks:
tpm2: true
- Compile the butane file and build the image
podman run --interactive --rm --security-opt label=disable \
--volume ${PWD}:/pwd --workdir /pwd quay.io/coreos/butane:release \
--pretty --files-dir local-files --strict install.bu > config.ign
podman run --pull=always --privileged --rm \
-v /dev:/dev -v /run/udev:/run/udev -v .:/data -w /data \
quay.io/coreos/coreos-installer:release \
iso customize \
--dest-ignition config.ign \
--dest-device /dev/nvme0n1
--dest-console tty0 \
--dest-karg-append quiet \
--dest-karg-append rhgb \
--live-karg-append quiet \
--live-karg-append rhgb \
-o test.iso fedora-coreos-42.20250917.dev.1-live-iso.x86_64.iso
- Write test.iso to a flashdrive and install it on device
- Reboot device after the first boot completes
Expected behavior
coreos-boot-edit should inject the correct UUID and reboot successfully after first boot
Actual behavior
coreos-boot-edit injects the wrong UUID (seemingly the UUID from before growing the Root Filesystem?) and fails to boot again after the first boot
System details
- Bare metal
- Fedora CoreOS 42 stable
Butane or Ignition config
variant: fcos
version: 1.6.0
boot_device:
luks:
tpm2: true
kernel_arguments:
should_not_exist:
- console=ttyS0,115200
should_exist:
- quiet
- rhgb
passwd:
users:
- name: testuser
ssh_authorized_keys:
- [...]
home_dir: /home/testuser
no_create_home: false
password_hash: [...]
groups:
- wheel
shell: /bin/bash
Additional information
No response