The CrowdStrike Windows outage that hit the world this week stems back to an EU-Microsoft deal from 2009 that meant Microsoft had to give antivirus vendors the same Windows API access it had.
But in this case Microsoft certified the driver. If they knew the driver included an interpreter that can run arbitrary code, they shouldn’t have certified it because they can not fully test it. If they didn’t know, then their certification test are inadequate. Most of the blame lies with the security software. If Microsoft didn’t certify it, they would have had zero fault.
I had a read about the WHQL (which I assumes what certified means). It uses the Windows HLK to perform a series of tests, which submited to Microsoft, and only then the driver will be signed.
While certification isn’t endorsement, the testing and the resulting certification implies basic compatibility and reliability. And causing bootloops and BSODs is anywhere but close to “basic compatibility and reliability.”
Crowdstrike bypassed WHQL because the update was not to the driver, it was to a configuration file that then gets ingested by the driver. It’s deliberate so they can push out updates for developing threats without being slowed down by the WHQL process.
And that means when they decide to just send it on a Friday with a buggy config file, nobody is responsible but Crowdstrike.
The Windows Hardware Certification program (formerly Windows Hardware Quality Labs Testing, WHQL Testing, or Windows Logo Testing) is Microsoft’s testing process which involves running a series of tests on third-party device drivers, and then submitting the log files from these tests to Microsoft for review. The procedure may also include Microsoft running their own tests on a wide range of equipment, such as different hardware and different Microsoft Windows editions.
For the Nth time, crowdstrike circumvented the testing process
Edit: this is not to say that cs didn’t have to in order to provide their services, nor is this to say that ms didn’t know about the circumvention and/or delegate testing of config files to CS. I’ll take any opportunity to rag on MS, but in this case it is entirely on CS.
We all hate Microsoft for turning Windows into an ad platform but they aren’t wrong.
They are legally required to give Crowdstrike or anyone complete low level access to the OS. They are legally required to let Crowdstrike crash your computer. Because anything else means Microsoft is in control and not the software you installed.
It’s no different than Linux in that way. If you install a buggy device driver on Linux, that’s your/the driver’s fault, not Linux.
The thing is, Microsoft’s virus-scanning API shouldn’t be able to BSOD anything, no matter what third-party software makes calls to it, or the nature of those calls. They should have implemented some kind of error handler for when the calls are malformed.
So this is really a case of both Crowdstrike and Microsoft fucking up. Crowdstrike shoulders most of the blame, of course, but Microsoft really needs to harden their API to appropriately catch errors, or this will happen again.
Nope. It’s a lower level kernel API that has to be accessed at boot via a driver. The API I was thinking of - and I use the term “thinking” loosely, here - is an API that userspace applications can take advantage of to scan files after boot is already complete.
I don’t believe there was any specific API in use here, for virus scanning or not. I suppose maybe the device driver API? I am not a kernel developer so I don’t know if that’s the right term for it.
Crowdstrike’s driver was loaded at boot and caused a null pointer dereference error, inside the kernel. In userspace, when this happens, the kernel is there to catch it so only the application that caused it crashes. In kernelspace, you get a BSOD because there’s really nothing else to do.
I stand corrected. For some reason, I was thinking they used the actual Windows Defender API, which can be called programmatically from third-party applications, but you’re correct, it was a driver loaded at boot. Microsoft isn’t at all at fault, here.
But what if Windows have something similar to eBPF in Linux, and CS opted to use it, will this disaster won’t happen at all or in a much smaller scale and less impactful?
If you load hacky shit into the kernel it can always find a way to make a nasty surprise. eBPF is a little bit better fence, not some miracle that automatically fixes shitty code.
I actually agree, I own my computer / OS and I should be able to do what you’re saying (install and break things). But Microsoft is a trillion dollar multi national corporation and I am certainly going to give them grief about this because I owe them less than nothing, let alone any good will.
You are not wrong, but people don’t want to hear it. Do we want to retain control over what goes into kernel space or not? If so, we have to accept that whatever we stuff in there can crash the entire thing. That’s why we have stuff like driver signatures. Which Crowdstrike apparently bypassed with a technical loophole from how I understand it.
Yeah I saw the article that says they’re legally required but until I can actually read that document where it says “thou shall give everyone ring-0” access I’m gonna call it bullshit.
It might not be written literally like that but for Microsoft not letting third party developers write kernel drivers for windows would be considered abusing their position in the market very fast. The problem isn’t they allow kernel drivers, this is just ms throwing all the balls they can, is that they certified this very driver, as tested and stable. Without this certification most IT teams would’ve been more reticent to install crowdstrike’s root kit in their systems.
They are legally required to let Crowdstrike crash your computer.
I call Bullshit.
If it had been Windows NT 3.5, there would have been no bluescreens around the world. It would have stopped the buggy software, given a message accordingly, and continued it’s job. That Windows was not stupid enough to crash itself just because of a null pointer in another software.
I ran 3.5. Yes, a network driver crash would blue screen NT3.5. Graphics were in user space in 3.5 so a video driver couldn’t take NT 3.5 down but networking was in the kernel.
https://en.m.wikipedia.org/wiki/Hybrid_kernel
A better comparison would be an iPhone. Apple has locked that down so much that it’s impossible to install something like CrowdStrike falcon, thus it’s not possible for something like this to happen.
Microsoft is saying if the EU would let them, they too could lock down their platform enough to prevent this from happening.
However, I would prefer to maintain control over my device and do what I want with it, instead of just what Apple/Microsoft want; even if that means I might break my device.
Not then, but European anti trust lawsuits resulted in laws that require Microsoft to allow 3rd parties complete access. That means if the 3rd party software is a low level driver, it will crash the system. They are legally required to allow vendors the level of access that can crash the system.
@OfCourseNot@neme@apfelwoiSchoppen@Blue_Morpho@NeoNachtwaechter@MinFapper
My dislike for MS products goes back to 1979 and the BASIC interpreter on the Commodore Pet.
At the time I thought that’s just how computers are but within just a few years I realised that non Microsoft BASIC always seemed to be better…
Oh FFS. I love this era where companies will not accept the blame due to “liability”, even when they are explicitly to blame.
Fuck Microsoft and fuck Windows.
But if you inject hacky bullshit third party code into someone’s OS that breaks things, it’s not the OS’s fault.
But in this case Microsoft certified the driver. If they knew the driver included an interpreter that can run arbitrary code, they shouldn’t have certified it because they can not fully test it. If they didn’t know, then their certification test are inadequate. Most of the blame lies with the security software. If Microsoft didn’t certify it, they would have had zero fault.
Certifying a driver is not an endorsement.
It is a verification that it is legitimately from who it claims to be from. Microsoft has zero fault, period.
I had a read about the WHQL (which I assumes what certified means). It uses the Windows HLK to perform a series of tests, which submited to Microsoft, and only then the driver will be signed.
While certification isn’t endorsement, the testing and the resulting certification implies basic compatibility and reliability. And causing bootloops and BSODs is anywhere but close to “basic compatibility and reliability.”
Crowdstrike bypassed WHQL because the update was not to the driver, it was to a configuration file that then gets ingested by the driver. It’s deliberate so they can push out updates for developing threats without being slowed down by the WHQL process.
And that means when they decide to just send it on a Friday with a buggy config file, nobody is responsible but Crowdstrike.
Oh wow. Then definitely CS is in fault. What a brilliant idea they have.
For the Nth time, crowdstrike circumvented the testing process
Edit: this is not to say that cs didn’t have to in order to provide their services, nor is this to say that ms didn’t know about the circumvention and/or delegate testing of config files to CS. I’ll take any opportunity to rag on MS, but in this case it is entirely on CS.
We all hate Microsoft for turning Windows into an ad platform but they aren’t wrong.
They are legally required to give Crowdstrike or anyone complete low level access to the OS. They are legally required to let Crowdstrike crash your computer. Because anything else means Microsoft is in control and not the software you installed.
It’s no different than Linux in that way. If you install a buggy device driver on Linux, that’s your/the driver’s fault, not Linux.
The thing is, Microsoft’s virus-scanning API shouldn’t be able to BSOD anything, no matter what third-party software makes calls to it, or the nature of those calls. They should have implemented some kind of error handler for when the calls are malformed.
So this is really a case of both Crowdstrike and Microsoft fucking up. Crowdstrike shoulders most of the blame, of course, but Microsoft really needs to harden their API to appropriately catch errors, or this will happen again.
Isn’t that API what the article is talking about?
Nope. It’s a lower level kernel API that has to be accessed at boot via a driver. The API I was thinking of - and I use the term “thinking” loosely, here - is an API that userspace applications can take advantage of to scan files after boot is already complete.
I don’t believe there was any specific API in use here, for virus scanning or not. I suppose maybe the device driver API? I am not a kernel developer so I don’t know if that’s the right term for it.
Crowdstrike’s driver was loaded at boot and caused a null pointer dereference error, inside the kernel. In userspace, when this happens, the kernel is there to catch it so only the application that caused it crashes. In kernelspace, you get a BSOD because there’s really nothing else to do.
https://youtube.com/watch?v=wAzEJxOo1ts
I stand corrected. For some reason, I was thinking they used the actual Windows Defender API, which can be called programmatically from third-party applications, but you’re correct, it was a driver loaded at boot. Microsoft isn’t at all at fault, here.
But what if Windows have something similar to eBPF in Linux, and CS opted to use it, will this disaster won’t happen at all or in a much smaller scale and less impactful?
Crowdstrike managed to fuck up Linux through eBPF just as well.
https://access.redhat.com/solutions/7068083
If you load hacky shit into the kernel it can always find a way to make a nasty surprise. eBPF is a little bit better fence, not some miracle that automatically fixes shitty code.
But these eBPF loader bugs are fixed now. Windows drivers are still causing BSODs and will continue to do so until Microsoft adopts eBPF.
I actually agree, I own my computer / OS and I should be able to do what you’re saying (install and break things). But Microsoft is a trillion dollar multi national corporation and I am certainly going to give them grief about this because I owe them less than nothing, let alone any good will.
You are going to give grief to Microsoft for allowing what you want?
???
That doesn’t make any sense. How does arguing against your position do anything but harm it?
Maybe just give them grief over the myriad negative things they do that don’t counter your position?
You are not wrong, but people don’t want to hear it. Do we want to retain control over what goes into kernel space or not? If so, we have to accept that whatever we stuff in there can crash the entire thing. That’s why we have stuff like driver signatures. Which Crowdstrike apparently bypassed with a technical loophole from how I understand it.
Sorry, how is that related to the stability of the kernel?
I explained in my second sentence.
“They are legally required to give Crowdstrike or anyone low level access to the OS.”
If you install a buggy driver into Linux and it crashes, that’s not a problem with the Linux kernel.
https://www.redhat.com/sysadmin/linux-kernel-panic
I fully agree with you on that front, but ads have nothing to do with kernel access, so how is that relevant to their legal requirements?
I was explaining why everyone hates on Microsoft but the Crowdstrike crash had nothing to do with the reasons people hate MS.
Gotcha.
Yeah I saw the article that says they’re legally required but until I can actually read that document where it says “thou shall give everyone ring-0” access I’m gonna call it bullshit.
If it’s not ring 0, it’s not full access. They are legally required to give full access.
I’ll believe it when I read it.
It might not be written literally like that but for Microsoft not letting third party developers write kernel drivers for windows would be considered abusing their position in the market very fast. The problem isn’t they allow kernel drivers, this is just ms throwing all the balls they can, is that they certified this very driver, as tested and stable. Without this certification most IT teams would’ve been more reticent to install crowdstrike’s root kit in their systems.
I call Bullshit.
If it had been Windows NT 3.5, there would have been no bluescreens around the world. It would have stopped the buggy software, given a message accordingly, and continued it’s job. That Windows was not stupid enough to crash itself just because of a null pointer in another software.
Now you tell me that Windows NT 3.5 is illegal?
I ran 3.5. Yes, a network driver crash would blue screen NT3.5. Graphics were in user space in 3.5 so a video driver couldn’t take NT 3.5 down but networking was in the kernel. https://en.m.wikipedia.org/wiki/Hybrid_kernel
OK, and… Were the legally required to make it crash?
A better comparison would be an iPhone. Apple has locked that down so much that it’s impossible to install something like CrowdStrike falcon, thus it’s not possible for something like this to happen.
Microsoft is saying if the EU would let them, they too could lock down their platform enough to prevent this from happening.
However, I would prefer to maintain control over my device and do what I want with it, instead of just what Apple/Microsoft want; even if that means I might break my device.
They were legally required to permit third party to install a kernel mode driver.
Not then, but European anti trust lawsuits resulted in laws that require Microsoft to allow 3rd parties complete access. That means if the 3rd party software is a low level driver, it will crash the system. They are legally required to allow vendors the level of access that can crash the system.
You could absolutely install software on Windows 3.5 that would crash the system.
Can confirm. I’ve crashed most Microsoft products from msdos 5.
@OfCourseNot @neme @apfelwoiSchoppen @Blue_Morpho @NeoNachtwaechter @MinFapper
My dislike for MS products goes back to 1979 and the BASIC interpreter on the Commodore Pet.
At the time I thought that’s just how computers are but within just a few years I realised that non Microsoft BASIC always seemed to be better…