Getting Past Crowdstruck Requires Access (Image source; YouTube)

Getting Past Crowdstruck Requires Access

Last Friday (July 19), cybersecurity firm Crowdstrike pushed an update to its threat sensors. Ultimately, that ended up with over 8 million Windows PC unable to boot, stuck on a BSOD for invalid references in a kernel-mode driver. Behind the scenes, all kinds of companies from hospitals, to government agencies, to airlines, and more, found themselves unable to use updates machines after a post-update reboot. What really caused the heartburn? Getting past Crowdstruck requires access to affected machines on a one-at-a-time basis.

If you look at the BSOD screencap at the head of this blog post, you’ll see a driver named csagent.sys. This is the CrowdStrike Agent driver which runs at kernel mode by design. That ensures it can’t be easily accessed or tampered with by hackers. But when something runs as a kernel mode driver it must be rigorously and thoroughly tested and vetted, or it can crash any PC on which it runs. Errors, in short, cannot be tolerated. Oops!

Why Getting Past Crowdstruck Requires Access

Part of the Crowdstrike software run as a Windows kernel-mode driver. That means it has the same level of access as privileged parts of the OS itself. If any of this code throws an error — as Crowdstrike has publicly admitted its update did — Windows crashes itself. That’s by design,  out of an abundance of caution to avoid loss of data or other damage to affected systems.

Here’s where things get interesting. Windows can’t boot and run until the offending driver is removed. In turn, the affected PCs must boot into safe mode or a recovery image. Either can operate on the damaged Windows image, remove the bad driver, and stand Windows back up again. This is easy when admins or IT pros have physical access to affected PCs. Indeed, Copilot recommends using the “three strikes” method to get into Windows recovery. (Three consecutive boot failures autoomatically triggers Windows alternate boot.) Then, using WinRE (or Windows itself in safe mode, from the Advanced Boot Options), repairs can go forward.

The problem is that many, if not virtually all, of the affected machines stayed down, stuck in a “boot loop.” They remained that way because their operators DIDN’T have physical access to those PCs. I’ll bet that most of them had to be teleoperated through a KVM device that can work around PC  problems that extend all the way down to the hardware level (outside the scope of normal remote access and RDP). This kind of thing doesn’t scale well, either, so it takes time to work through hundreds to thousands of remote PCs (think of the PC behind the counter at AA or Delta, where the gate or ticket agent is completely clueless about boot-level Windows repairs).

An “Interesting” Problem, Indeed!

Far too many cybersecurity and IT pros found themselves in the grip of the old Chinese curse (“May you live in interesting times”) after the *291* driver for Crowdstrike  tried to run on Friday. Organizations that prepare and drill for these kinds of outages were doubtless at an advantage in already knowing how to broker and run boot repairs remotely. I can only imagine the hair-pulling that went on at other outfits less well-equipped to handle this outage.

Here’s a moral to ponder for those who run remote Windows PCs where physical access is impossible, difficult or impractical: Can your remote management infrastructure and automation work with a Windows PC that’s not booting, and won’t boot until it’s restarted in some special way? If your answer is “yes,” you’re probably over the Crowdstruck hump already. If your answer is “no,” you’ll probably make that a top priority as soon as you can kick-start and repair all remaining affected Windows nodes. In the meantime, my deepest sympathies…

Facebooklinkedin
Facebooklinkedin

3 thoughts on “Getting Past Crowdstruck Requires Access”

  1. “Can your remote management infrastructure and automation work with a Windows PC that’s not booting, and won’t boot until it’s restarted in some special way? If your answer is “yes,” you’re probably over the Crowdstruck hump already. If your answer is “no,” you’ll probably make that a top priority as soon as you can kick-start and repair all remaining affected Windows nodes. ”

    My guess is that the “no” side won’t change.

    It’s a matter of risk, versus the cost to mitigate that risk…

    Same with Microsoft and kernel-mode drivers…do they shut that down for 3rd party apps and risk massive fines from the EU, or hope that 3rd party providers will do the right thing and
    test before deploying.

  2. Bill Toulas posted the article “CrowdStrike: ‘Content Validator’ bug let faulty update pass checks” earlier today on Bleeping Computer.

    Kind of shocking what CrowdStike said they were adding to their QA process…which means steps they weren’t doing before.

    1. As in your earlier comment, you make some important and unsettling points. That said, I’m not sure what SOHO and small business users do really qualifies as “risk management” in any kind of formal sense. It is, has been, and will most like remain a “fire drill” environment. Sigh. =Ed=

Leave a Reply

Your email address will not be published. Required fields are marked *