I have a Ryzen 3 1300X at the moment and it's always had this soft lock freezing bug on Linux. I used to dual-boot Windows on this machine and Windows never had the same problem, so I think it is an issue with the Linux kernel (I've also replaced nearly every bit of hardware that I originally built the PC with, except for the CPU and motherboard, so it probably is an issue the kernel has with my CPU, or possibly the motherboard firmware).

I've changed the kernel parameters as suggested by the Arch Wiki. The bug is pretty inconsistent about happening so only time will tell if this solves the issue. But if it doesn't solve the issue, I'd honestly consider just getting a new CPU that doesn't have this issue, as completely freezing up, unable to get to a tty or anything, and only being able to power off by physically holding down the power button, is a pretty major issue, even if it only happens sometimes.

So if I do get a new CPU, or maybe just for when I'm next buying a CPU for reasons unrelated to this bug (been considering an upgrade to something that's better for compiling anyway), are there any good options out there? Intel is investing $25 billion into Israel and the BNC has called for "divestment and exclusion" from it (it's not officially on the BDS consumer boycott list, but I'm still very much not comfortable buying from Intel). But the Arch Wiki article seems to suggest this bug is applicable to Ryzen CPUs in general, or at least it never specifies a particular model or range of models. So maybe I'm limited to non-Ryzen AMD CPUs?

I'm guessing this is one of the situations where two companies have a complete duopoly over the market and there isn't an all-round good solution, but thought I'd ask in case anyone had some useful input.

  • PaX [comrade/them, they/them]
    hexbear
    1
    3 months ago

    What motherboard do you have? Also what happens exactly when the lock-ups happen? Have you ever been playing audio when the lock-ups happen and does it loop or stop or keep playing?

    I recently had to "fix" (workaround) a similar issue in the OpenBSD kernel with a specific hardware peripheral on my PC (running a 2nd-gen Ryzen), the High Definition Audio controller. For whatever reason (and only when I was running OpenBSD) interrupts from the HDA controller (to let the CPU know to refill audio buffers) would just randomly stop making it to the CPU and audio would loop for a few seconds and then shut off. I spent a long time trying to figure out what causes it and reading Linux driver code but I couldn't find a cause or why only OpenBSD would trigger it. I ended up having to write kind of a hacky polling mode into the HDA driver. My only guess is some of these AMD-chipset-having motherboards have faulty interrupt controllers.

    Maybe there is a similar issue with your system and timer interrupts aren't making it to your CPU or something. But I'm not really an expert on PC architecture and idek if it even works like that on PCs lol

    Sorry for so many questions but do you also have any kernel logs available from when this happens?

    • @communism@lemmy.ml
      hexagon
      hexbear
      2
      3 months ago

      This is my mobo

      Also what happens exactly when the lock-ups happen?

      Screen is frozen, doesn't respond to keyboard or mouse input, including unable to switch to a tty or kill the graphical session (I have a keybind to exit my Wayland compositor, which I launch from the tty, so when I use the keybind it sends me to the tty—that is, if my computer isn't locked up lol).

      I don't remember if this has ever happened with audio playing, idk what happens to audio if it happens with audio playing.

      I think I did post kernel logs to a forum way back in the day when I first got this PC and started having this issue, to no avail—at this point I'd rather just get a new CPU and save the headache and stress, especially since this is a known issue with Ryzens

      • PaX [comrade/them, they/them]
        hexbear
        2
        edit-2
        3 months ago

        I see. Our motherboards have different chipsets (I have an X570 in mine). It probably has nothing to do with my issue...

        Hoping those kernel parameters fix it. I wish I could help further. PCs are just a bottomless, mostly undocumented rabbithole :(

        • @communism@lemmy.ml
          hexagon
          hexbear
          4
          3 months ago

          Afraid the kernel params didn't fix it. Have invested in a newer Ryzen cpu as people are saying that the first gen ones were particularly buggy so I'm hoping it's fixed in the newer ones.