I have been struggling with this for over a month and still keep running into a brick wall. I am building a new firewall which has six network interfaces, and want to rename them to a known order (wan[0-1], and eth[0-3]). Since Bullseye has stopped honoring udev rules, I have created link files under /etc/systemd/network/ for each interface based on their MAC address. The two WAN interfaces seem to be working reliably but they're not actually plugged into anything yet (this may be an important but untested distinction).
What I've found is that I might get the interfaces renamed correctly when logging in from the keyboard, and this continues to work for multiple reboots. However if I SSH into the machine (which of course is my standard method of working on my servers) it seems to destroy systemd's ability to rename the interface on the next boot. I have played around with the order of the link file numbers to ensure the renumbering doesn't have the devices trying to step on each other, but to no avail. Fixing this problem seems to come down to three different solutions...
- I can simply
touch
the eth*.link files and I'm back up afte a reboot. - Sometimes I have to get more drastic, actually opening and saving each of the files (without making any changes). WHY these two methods give me different results, I cannot say.
- When nothing else works, I simply rename one or more of the eth*.link files, giving them a different numerical order. So far it doesn't seem to matter which of the files I rename, but systemd sees that something has changed and re-reads them.
Another piece of information I ran across is that systemd does the interface renaming very early in the boot process, even before the filesystems are mounted, and that you need to run update-initramfs -u
to create a new initrd.img file for grub. OK, sounds reasonable... however I would expect the boot behavior to be identical every time I reboot the machine, and not randomly stop working after I sign in remotely. I've also found that generating a new initrd.img does no good unless I also touch or change the link files first, so perhaps this is a false lead.
This behavior just completely baffles me. Renaming interfaces based on MAC addresses should be an extremely simple task, and yet systemd is completely failing unless I change the link files every time I remote connect? Surely someone must have found a reliable way to change multiple interface names in the years since Bullseye was released?
Sorry, I know this is a rant against systemd and this whole "predictable" naming scheme, but all of this stuff worked just fine for the last 24 years that I've been running linux servers, it's not something that should require any effort at all to set up. What do I need to change so that systemd does what it is configured to do, and why is something as simple as a remote connection enough to completely break it when I do get it to work? Please help save my sanity!
(I realize essential details are missing, but this post is already way too long -- ask what you need and I shall provide!)
tl;dr -- Systemd fails to rename network interfaces on the next cycle if I SSH in and type 'reboot'
So I've run across some info on this problem, and it seems this was an intentional choice by the systemd developers. It appears they are so opposed to anyone even using the eth* naming scheme that they're not even going to try to fix the problem they caused (judging by the closed tickets I've been running across).
Here's the problem... there's no method to rearrange the network device names without them stepping on each other. If you ask for the original device names they are given out in the order each interface is found, so for example you cannot change eth3 to eth0 because eth0 already exists. If you let systemd give each interface a predictable name, you cannot then change that name to fill in the eth* sequence because systemd randomly starts renaming interfaces while in the middle of still finding device drivers (and in fact this was my problem). And apparently there is no intention of adding a method where you can reference the original device name to sort out the overlapping given names.
So why did I apparently have such random results in my various boot processes? I had to take a very close look at dmesg to find the answer. It turns out that if I just let the biosnames populate, everything comes up in a predictable order. However if I add link files to rename the interfaces, it delays loading ONE of the network drivers by 0.3s, causing the renaming to occur right in the middle of loading the next set of biosnames. For whatever reason this delay didn't happen until the second time I rebooted, so the first boot everything came up as expected, then the delay occurred the next boot and I lost my local interface connection again.
Now this whole issue could be easily sorted if the MAC address were checked against each individual device as it was detected (of course requiring that I have link files for ALL of my devices to ensure none of them could step on each other). The problem could even be sorted if systemd didn't immediately revert back to using biosnames for all devices when even one of them was given a fixed name (by allowing me to do my renaming away from the predictable names instead of now causing biosnames to step on each other again). The whole point of this new naming scheme seems to be avoiding race conditions with renaming devices so I really don't understand why they've built it to revert back to biosnames if I try to rename even a single interface, and yet I spent a couple hours trying to get exactly that scenario to work only to discover the grub parameters had no effect even when I specifically set
net.ifnames=1 biosdevname=1
... maybe that's not the correct values to use any more but the info I could find said setting both to zero used biosnames, and setting both to 1 forced the use of predictable names.Regardless, at this time there is only one possible solution -- you simply cannot use eth* names reliably even if you only have two interfaces that you need to swap names. I ended up naming mine as wan, dmz, lan, wlan, and test, which seems painfully disorganized but I suppose it makes it easier for hackers to figure out where they need to go if they get into my firewall.
Oh, and in case anyone is wondering why I don't simply use the predictable names? I have firewall scripts, I have load balancer scripts, dhcp and other stuff, plus various test scripts to monitor the interfaces. What happens when the four-port NIC dies (as happened just a couple years ago)? Up until Buster, I booted up with the new hardware, updated udev with the new MAC addresses, rebooted, and I'm up and running. Now I'm moving up to newer servers, machines that take more time to start loading the operating system than the whole update process including a couple reboots took on the old hardware. If the firewall goes down, services go offline, people can't get work done. As it stands now, a hardware change is looking at a minimum of twenty minutes of down time (which is making me consider sticking with the old rackserver)... Now consider the case for predictable interface names which are almost guaranteed to change when you replace the card. This means grepping through all of the scripts on this machine, finding every reference to the old names, and replacing them. Assuming I don't miss anything that would require a third reboot.
The point is, where I used to be able to change four bits of info in a single file (very small chance of error), I would now have to update quite a few individual configs and scripts (a much larger chance of error, and also working with items that should have no reason to ever be touched).