This isn’t a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn’t log in.
Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)
Seems insane to me that one company’s messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.
Can’t you automate it?
Sadly not. Windows doesn’t boot. You can boot it into safe mode with networking, at which point maybe with anaible we could login to delete the file but since it’s still manual work to get windows into safe mode there’s not much point
It is theoretically automatable, but on bare metal it requires having hardware that’s not normally just sitting in every data centre, so it would still require someone to go and plug something into each machine.
On VMs it’s more feasible, but on those VMs most people are probably just mounting the disk images and deleting the bad file to begin with.
I guess it depends on numbers too. We had 200 to work on. If you’re talking hundreds more than looking at automation would be a better solution. In our scenario it was just easier to throw engineers at it. I honestly thought at first this was my weekend gone but we got through them easily in the end.
The real problem with VM setups is that the host system might have crashed too
Since it has to happen in windows safe mode it seems to be very hard to automate the process. I haven’t seen a solution yet.