As of Linux 6.7 I’m getting hard freezes that require a power cut to reset (sysrq doesn’t work.) Happens at both idle and load anywhere from 5 minutes in to an hour. Running journalctl --follow and dmesg -w (both as root) reveal nothing at the time of the crash. Kernel version 6.6 continues to be 100% stable.

System:

  • Distro/Kernel: Arch Linux 6.7.arch3-1
  • CPU: AMD Ryzen 5 2600X
  • GPU: AMD RX580 8GB via AMDGPU
  • RAM: Some configuration of 16GB at 2667 MT/s.
  • WM: SwayWM

I’m unsure how to go about properly reporting a bug if no errors are being generated.

Any advice?

I’m not alone on this apparently (warning, it’s reddit.)

    • 0x0@social.rocketsfall.netOP
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      This will be my last resort mostly because I’m fairly certain it’s a kernel issue, but yes, I’ve never ran an extended memtest on this build and should probably let it run overnight at some point just to make sure.

  • lemmyreader@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    8 months ago

    In the comments of the web link you shared (The link you wrote didn’t work for me but I looked up the original and adding it here so that others can choose to use their preferred libreddit or teddit) at least three comments mention that 6.7 zen kernel works fine for them. Care to try that ?

    • 0x0@social.rocketsfall.netOP
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      Fixed the link. Thanks!

      I’ve also tried linux-tkg, which I believe rolls in the Zen patches. If it doesn’t, I’ll definitely try it.

  • Rockslide0482@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 months ago

    TLDR: do memtest on your RAM

    I recently had an issue for quite some time where my computer would occasionally just hard crash. When it first started happening I tried many of the common tests including memcheck but found nothing. For a while it wasnt super common so I just lived through it. I thought it was an OS thing but it occurred on a different Linux distro and even on the ancient Windows 10 install I have but rarely use. I was just about to pull the trigger on replacing mobo and maybe even CPU+RAM. Before I did that I followed someone’s suggestion to do a mem test. I could have at least sworn that I already did that and it came clean but it was an easy enough test to run, so why not.

    Sure enough, found an error. I isolated the faulted DIMM, pulled it out and I haven’t had a crash since. Crazy since I’m all but certain I did both memtest from a Linux live iso and the Windows memory checking utility.

    In short, test your RAM. Do multiple passes. Maybe even just try swapping out single DIMMs and running on that for a reasonable ammount of time to see if you can isolate a culprit. It was my first thought when the issue first occurred because it’s usually what causes stuff like that. When the tests came up clean originally I assumed it had to be something else. I was wrong.

    • 0x0@social.rocketsfall.netOP
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      This is what I’ll try next. I do think memory is the problem now that I’ve had a few more hours of research. Kernel 6.7 has issues with elevated RAM usage, so it’s absolutely doing something funky with memory that might be exposing underlying hardware issues. I also realized my stable kernel was a version or two away from 6.6.13 (6.6.10), so I’m running it now to see if the issue was introduced late in the 6.6 release cycle, which would be easier to bisect than 6.7.

  • Shadow@lemmy.ca
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    If you can’t sysrq then you’re down to bisecting kernel releases to find the patch that introduced the issue. You could also review for any new features that are enabled by default in 6.7

    Have you upgraded all bios / fw versions?

    • 0x0@social.rocketsfall.netOP
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      I was afraid of that. Since I’m not the only one, maybe someone else is doing it already. But if it’s still an issue in a few weeks, maybe I’ll take it on as a weekend project. As for the motherboard, I believe the latest version is currently on it (2022 or 2023.)