This has apparently been a problem for a few months now, and can effect Intel and Nvidia graphics too, but AMD is the most susceptible, and Gnome on Wayland seems to trigger the issue the most. A developer of Kwin explains in a comment on the bug report what’s causing it.
My “Favorite”: Pageflip Timeouts
Judging by how often I come across this issue in bug triage, if you’re reading this, chances aren’t too terrible that you’ve heard of this one already, possibly even seen it yourself in the form of
kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver kwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/is>sues kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma->kwin_wayland > --boot 0'
in your own system logs at some point. To be clear, this is just an example and it does not only affect amdgpu. I’ve seen the same with NVidia and Intel too, but as amdgpu’s GPU resets have been a lot less reliable in >the past, it’s been a bigger issue for them.
Basically, pageflip timeouts are when the compositor does an atomic commit through KMS, and then waits for that to complete… forever. When this happens, the kernel literally doesn’t allow the compositor to present to the screen anymore, so the screen is completely frozen forever, which is >very bad, to state the obvious.
Fixing all the individual causes of the problem hasn’t really worked out so well, and this is a bad enough situation that there should be a way out when it does happen. We discussed how to do this, and I’m happy >to report that we figured out a way forward:
- we need a new callback in KMS that tells compositors when a pageflip failed and will never arrive
- drivers need to support resetting the display-driver bits of the GPU to recover it
- if the driver entirely fails to recover in the absolute worst case, it should send a device wedged event, which tells the compositor it should try to reload the entire driver / device
Oh THAT’s what the random freezes were. Got them sporadically on my Arch machine for a couple of years, and never figured out exactly why. (Now I’m using a different laptop that still has AMD graphics, but I haven’t got a freeze yet.)
I’ve been able to avoid it from happening as long as the DE is using X11, seemingly.
I might have been experiencing this issue for the longest time. System fully locks up and is completely unresponsive. Happened on every distro I used.
Last distro I had it on was Artix Linux. Then I tried Alpine and I don’t think I’ve had it happen since.
I used to have an issue with my screen either freezing or going blank and staying this way until after initiating a soft reboot. Disabling Panel Self Refresh (PSR) on my daily driver is what fixed that for me, but this workaround only applies to laptops/tablets and seems to not be the exact same issue reported here.
I have this issue and it was plagueing me for 2 days straight and freezing/restarting my machine. Has gone away for a few days now on its own 🤷♂️
I had amdgpu complete freezes for the longest time. Logs said it was ‘lost from bus’. Turned out it only happened while running Libre Office. Never found a fix/workaround, so I basically don’t do work in Linux on my amd machine.
To this day I still wonder why they manage to do a reliable GPU reset on Windows but not on Linux.
With every timeout there’s a 90% chance that it takes the whole system with it on AMD.
Luckily I’m on Plasma and the timeouts have gotten really rare on a 7900 XTX. Most of the time my cause is exceeding the VRAM limit, which eventually causes a freeze, pretty much every time.
How do you exceed the 24 GB VRAM?
ollama, ComfyUI, vLLM, opensplat, and nerfstudio can exceed 24 GB VRAM fairly easily. Memory leaks in games are also sometimes an issue.
If this is the same bug I’ve been encountering, then the fix is to ssh in from another machine and restart gdm3. No restart necessary.
This isn’t a new thing. I believe the issue stems from a specific fix for a security flag that got implemented in 2021, but I can’t find it now.
Basically they’re implementing a flag to stop an overrun attack. This is still a thing that happens on Windows BTW, but the Radeon userspace software is supposed to intervene. There is no counterpart for any other platforms.
Interesting. I feel like 2021 might be the time I first noticed this freezing/crashing on my PC, but not my laptop. I always thought it was the GPU, but after switching to another AMD GPU it still happens.
The freezes happen irregularly, i.e. there’s been times I thought it was fixed for it just to happen again.
I thought maybe this is the issue I have been having but this seems different. I think mine is a memory usage issue with Firefox, potentially with using YouTube through Firefox specifically
@ProdigalFrog Interesting, also yes. Now if only I could read the bug report to see the details to see if I can resolve it without creating an account on their gitlab instance :/
Sorry! Messed the link up. Should work now.
> When this happens, the kernel literally doesn’t allow the compositor to present to the screen anymore, so the screen is completely frozen forever
Yup, that sounds awfully familiar! Also the log reports from that seem quite familiar too, (Gnome/Wayland + 6.12.2 kernel over here). The 7900XTX seems less prone to the issue however so it doesn’t happen to me _too_ terribly often.
I’ve noticed it happens more often when I’m spawning / closing full screen 3d applications such as games, ie when working on mods.
Unfortunately they don’t seem to have a solid plan on how to resolve performing a reliable reset on the driver and the work-arounds don’t seem to reliably work. Guess I’ll just have to be patient and save frequently.
I don’t know if I’m having this exact issue but what stopped my AMD APU laptop with 2 external displays from freezing all the time is setting my main external display to 120Hz instead of the maximum 170. Pretty much never had a freeze after that.