- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.
Apparently caused by a bad CrowdStrike update.
We had a bad CrowdStrike update years ago where their network scanning portion couldn’t handle a load of DNS queries on start up. When asked how we could switch to manual updates we were told that wasn’t possible. So we had to black hole the update endpoint via our firewall, which luckily was separate from their telemetry endpoint. When we were ready to update, we’d have FW rules allowing groups to update in batches. They since changed that but a lot of companies just hand control over to them. They have both a file system and network shim so it can basically intercept **everything **
Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao
Did you feel a great disturbance in the force?
I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.
This is a crowdstrike issue specifically related to the falcon sensor. Happens to affect only windows hosts.
It’s only marginal for running custom code. Every large organization has at least a few of them running important out-of-the-box services.
Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.
I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…
Most servers aren’t Internet-facing.
Almost everyone, because the Windows server market share isn’t marginal at all.
Well, I’ve seen some, but they usually don’t have automatic updates and generally do not have access to the Internet.
My current company does and I hate it so much. Who even got that idea in the first place? Linux always dominated server-side stuff, no?
Not too long ago, a lot of Customer Relationship Management (CRM) software ran on MS SQL Server. Businesses made significant investments in software and training, and some of them don’t have the technical, financial, or logistical resources to adapt - momentum keeps them using Windows Server.
For example, small businesses that are physically located in rural areas can’t use cloud based services because rural internet is too slow and unreliable. Its not quite the case that there’s no amount of money you can pay for a good internet connection in rural America, but last time I looked into it, Verizon wanted to charge me $20,000 per mile to run a fiber optic cable from the nearest town to my client’s farm.
How many coffee cups have you drank in the last 12 hours?
There was a point where words lost all meaning and I think my heart was one continuous beat for a good hour.
I work in a data center
I lost count
I work in a datacenter, but no Windows. I slept so well.
Though a couple years back some ransomware that also impacted Linux ran through, but I got to sleep well because it only bit people with easily guessed root passwords. It bit a lot of other departments at the company though.
This time even the Windows folks were spared, because CrowdStrike wasn’t the solution they infested themselves with (they use other providers, who I fully expect to screw up the same way one day).
What was Dracula doing in your data centre?
Because he’s Dracula. He’s twelve million years old.
THE WORMS
Surely Dracula doesn’t use windows.
How’s it going, Obi-Wan?
I’m used to IT doing a lot of their work on the weekends as to not impact operations.
Good ol microsloth
I’m so exhausted… This is madness. As a Linux user I’ve busy all day telling people with bricked PCs that Linux is better but there are just so many. It never ends. I think this is outage is going to keep me busy all weekend.
AWS No!!!
Oh wait it’s not them for once.
This is a better article. It’s a CrowdStrike issue with an update (security software)
I agree that’s a better article, thanks for sharing
If these affected systems are boot looping, how will they be fixed? Reinstall?
It is possible to edit a folder name in windows drivers. But for IT departments that could be more work than a reimage
It’s just one file to delete.
Having had to fix >100 machines today, I’m not sure how a reimage would be less work. Restoring from backups maybe, but reimage and reconfig is so painful
There is a fix people have found which requires manual booting into safe mode and removal of a file causing the BSODs. No clue if/how they are going to implement a fix remotely when the affected machines can’t even boot.
Do you have any source on this?
It seems like it’s in like half of the news stories.
If you have an account you can view the support thread here: https://supportportal.crowdstrike.com/s/article/Tech-Alert-Windows-crashes-related-to-Falcon-Sensor-2024-07-19
Workaround Steps:
-
Boot Windows into Safe Mode or the Windows Recovery Environment
-
Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
-
Locate the file matching “C-00000291*.sys”, and delete it.
-
Boot the host normally.
-
I can confirm it works after applying it to >100 servers :/
Nice work, friend. 🤝 [back pat]
Probably have to go old-skool and actually be at the machine.
And hope you are not using BitLocker cause then you are screwed since BitLocker is tied to CS.
You just need console access. Which if any of the affected servers are VMs, you’ll have.
Yes, VMs will be more manageable.
Exactly, and super fun when all your systems are remote!!!
It’s not super awful as long as everything is virtual. It’s annoying, but not painful like it would be for physical systems.
Really don’t envy physical/desk side support folks today…
An offline server is a secure server!
Honestly my philosophy these days, when it comes to anything proprietary. They just can’t keep their grubby little fingers off of working software.
At least this time it was an accident.
There is nothing unsafer than local networks.
AV/XDR is not optional even in offline networks. If you don’t have visibility on your network, you are totally screwed.
Yep, this is the stupid timeline. Y2K happening to to the nuances of calendar systems might have sounded dumb at the time, but it doesn’t now. Y2K happening because of some unknown contractor’s YOLO Friday update definitely is.
It’s a fair point but I would rather diversify and also use something that is open / less opaque
ReactOS for the win!
Bahaha 😂😂 continue using proprietary software, that’s all you are going to get in addition to privacy issues… Switch to Linux.
A bunch of shitty sysadmins/cybersec people just learned why you don’t blindly deploy new updates to production without testing them first.
I’ve used crowd strike before. It has support for deploying version N to a pilot group, N-1 to the test environment and N-2 to production.
So that’s why my work laptop is down for the count today. I’m even getting that same error as the thumbnail picture
Interesting how ARPA net (the internet) was build to with stand these issues, but companies like Microsoft and Amazon (and no regulation) have completely reversed it’s original intent. I actually didn’t even notice this since I use Lemmy, and have my own internal network running home assistant, synology, emby, ect…