- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.
Apparently caused by a bad CrowdStrike update.
More than that: it’s an IT security and infrastructure admin issue. How was this 3rd party software update allowed to go out to so many systems to break them all at once with no one testing it?
Bingo. I work for a small software company, so I expect shit like this to go out to production every so often and cause trouble for our couple tens of thousands of clients… But I can’t fathom how any company with worldwide reach can let it happen…
That’s because cloudstrike likely has significantly worse leadership compared to your company.
They have a massive business development budget though.
From what I understand, Crowdstrike doesn’t have built in functionality for that.
One admin was saying that they had to figure out which IPs were the update server vs the rest of the functionality servers, block the update server at the company firewall, and then set up special rules to let the traffic through to batches of their machines.
So… yeah. Lot of work, especially if you’re somewhere where the sysadmin and firewall duties are split across teams. Or if you’re somewhere that is understaffed and overworked. Spend time putting out fires, or jerry-rigging a custom way to do staggered updates on a piece of software that runs largely as a black box?
Edit: re-read your comment. My bad, I think you meant it was a failure of that on CrowdStrike’s end. Yeah, absolutely.