Renaming multiple network interfaces in Bullseye is broken

Shdwdrgn@mander.xyz · 1 year ago

Renaming multiple network interfaces in Bullseye is broken

Shdwdrgn@mander.xyz · 1 year ago

So I've run across some info on this problem, and it seems this was an intentional choice by the systemd developers. It appears they are so opposed to anyone even using the eth* naming scheme that they're not even going to try to fix the problem they caused (judging by the closed tickets I've been running across).

Here's the problem... there's no method to rearrange the network device names without them stepping on each other. If you ask for the original device names they are given out in the order each interface is found, so for example you cannot change eth3 to eth0 because eth0 already exists. If you let systemd give each interface a predictable name, you cannot then change that name to fill in the eth* sequence because systemd randomly starts renaming interfaces while in the middle of still finding device drivers (and in fact this was my problem). And apparently there is no intention of adding a method where you can reference the original device name to sort out the overlapping given names.

So why did I apparently have such random results in my various boot processes? I had to take a very close look at dmesg to find the answer. It turns out that if I just let the biosnames populate, everything comes up in a predictable order. However if I add link files to rename the interfaces, it delays loading ONE of the network drivers by 0.3s, causing the renaming to occur right in the middle of loading the next set of biosnames. For whatever reason this delay didn't happen until the second time I rebooted, so the first boot everything came up as expected, then the delay occurred the next boot and I lost my local interface connection again.

Now this whole issue could be easily sorted if the MAC address were checked against each individual device as it was detected (of course requiring that I have link files for ALL of my devices to ensure none of them could step on each other). The problem could even be sorted if systemd didn't immediately revert back to using biosnames for all devices when even one of them was given a fixed name (by allowing me to do my renaming away from the predictable names instead of now causing biosnames to step on each other again). The whole point of this new naming scheme seems to be avoiding race conditions with renaming devices so I really don't understand why they've built it to revert back to biosnames if I try to rename even a single interface, and yet I spent a couple hours trying to get exactly that scenario to work only to discover the grub parameters had no effect even when I specifically set net.ifnames=1 biosdevname=1... maybe that's not the correct values to use any more but the info I could find said setting both to zero used biosnames, and setting both to 1 forced the use of predictable names.

Regardless, at this time there is only one possible solution -- you simply cannot use eth* names reliably even if you only have two interfaces that you need to swap names. I ended up naming mine as wan, dmz, lan, wlan, and test, which seems painfully disorganized but I suppose it makes it easier for hackers to figure out where they need to go if they get into my firewall.

Oh, and in case anyone is wondering why I don't simply use the predictable names? I have firewall scripts, I have load balancer scripts, dhcp and other stuff, plus various test scripts to monitor the interfaces. What happens when the four-port NIC dies (as happened just a couple years ago)? Up until Buster, I booted up with the new hardware, updated udev with the new MAC addresses, rebooted, and I'm up and running. Now I'm moving up to newer servers, machines that take more time to start loading the operating system than the whole update process including a couple reboots took on the old hardware. If the firewall goes down, services go offline, people can't get work done. As it stands now, a hardware change is looking at a minimum of twenty minutes of down time (which is making me consider sticking with the old rackserver)... Now consider the case for predictable interface names which are almost guaranteed to change when you replace the card. This means grepping through all of the scripts on this machine, finding every reference to the old names, and replacing them. Assuming I don't miss anything that would require a third reboot.

The point is, where I used to be able to change four bits of info in a single file (very small chance of error), I would now have to update quite a few individual configs and scripts (a much larger chance of error, and also working with items that should have no reason to ever be touched).