The CrowdStrike Windows outage that hit the world this week stems back to an EU-Microsoft deal from 2009 that meant Microsoft had to give antivirus vendors the same Windows API access it had.
The thing is, Microsoft’s virus-scanning API shouldn’t be able to BSOD anything, no matter what third-party software makes calls to it, or the nature of those calls. They should have implemented some kind of error handler for when the calls are malformed.
So this is really a case of both Crowdstrike and Microsoft fucking up. Crowdstrike shoulders most of the blame, of course, but Microsoft really needs to harden their API to appropriately catch errors, or this will happen again.
I don’t believe there was any specific API in use here, for virus scanning or not. I suppose maybe the device driver API? I am not a kernel developer so I don’t know if that’s the right term for it.
Crowdstrike’s driver was loaded at boot and caused a null pointer dereference error, inside the kernel. In userspace, when this happens, the kernel is there to catch it so only the application that caused it crashes. In kernelspace, you get a BSOD because there’s really nothing else to do.
I stand corrected. For some reason, I was thinking they used the actual Windows Defender API, which can be called programmatically from third-party applications, but you’re correct, it was a driver loaded at boot. Microsoft isn’t at all at fault, here.
Nope. It’s a lower level kernel API that has to be accessed at boot via a driver. The API I was thinking of - and I use the term “thinking” loosely, here - is an API that userspace applications can take advantage of to scan files after boot is already complete.
The thing is, Microsoft’s virus-scanning API shouldn’t be able to BSOD anything, no matter what third-party software makes calls to it, or the nature of those calls. They should have implemented some kind of error handler for when the calls are malformed.
So this is really a case of both Crowdstrike and Microsoft fucking up. Crowdstrike shoulders most of the blame, of course, but Microsoft really needs to harden their API to appropriately catch errors, or this will happen again.
I don’t believe there was any specific API in use here, for virus scanning or not. I suppose maybe the device driver API? I am not a kernel developer so I don’t know if that’s the right term for it.
Crowdstrike’s driver was loaded at boot and caused a null pointer dereference error, inside the kernel. In userspace, when this happens, the kernel is there to catch it so only the application that caused it crashes. In kernelspace, you get a BSOD because there’s really nothing else to do.
https://youtube.com/watch?v=wAzEJxOo1ts
I stand corrected. For some reason, I was thinking they used the actual Windows Defender API, which can be called programmatically from third-party applications, but you’re correct, it was a driver loaded at boot. Microsoft isn’t at all at fault, here.
Isn’t that API what the article is talking about?
Nope. It’s a lower level kernel API that has to be accessed at boot via a driver. The API I was thinking of - and I use the term “thinking” loosely, here - is an API that userspace applications can take advantage of to scan files after boot is already complete.