CrowdStrike

It's a complex area.
Yes, and, frankly, things are still not clear enough and will probably never become clearer.
It is not even clear (at least to me) whether any Microsot subsidiaries are among the shareholders of CrowdStrike.
Microsoft puts all the blame on sharing those APIs, but I can't believe it.
And it's not known for sure whether Microsoft actually really shared those APIs to third parties.
As far as I might expect, Microsoft might even intend to file a lawsuit that would last maybe 30 years, who knows?
However, given the huge disproportion in favor of Windows PCs in the world, it is clear at least that when something like this happens to Microsoft it makes much more media "noise".
 
the onus is on the vendor to make sure said code is robust enough
Not arguing. On one hand there's the drive to get something that's efficient and lean on resources on the other you want robust. The two aren't, generally, complementary. In a kernel, performance is crucial.

Some years back I worked on the development of a programming API that, in effect, let users write C++ code to run in our servers. It wasn't easy to maintain the performance we needed and still prevent bad user code from killing our server. OK, it was impossible - we had to settle, eventually, for simply being able to report something before the process died. But at least we had the benefit of designing it from the ground up. Not inserting it into something never intended to be exposed to the external world.

What I found was that it exposed differences in attitude between developers and management/sales/marketing. The latter three were all pretty much ignorant technically and treated everything as a negotiation. There were no absolute facts...

Me: "actually, that's just not possible"
Them: "well, OK so it might take a little longer"
Me: "No, it's not technically possible"
Them: "So if we added a few people to the dev group?"
Me: "No, really, it's not technically possible"
Them: "What would it take to rewrite that part of the code to achieve this"
Me: "It won't make any difference, it's just not possible"

I'm pretty sure many here will recognise that sort of dialog from the electronics/engineering field.
 
Antivirus software is a kludge. It works by detecting signatures and activity patterns of known viruses. Antivirus software cannot detect unknown viruses and is susceptible to false positives. It can never be completely reliable.

I have never used antivirus software on my personal machines. They don't need it.

I have written a lot of software, both professionally and as a hobby. The thing that programmers tend to overlook is validation of the program's input. Programs are written and tested with legal inputs, but illegal inputs don't receive as much attention. This is a source of security holes. I am very careful about validating the input.
Ed
 
  • Like
Reactions: Logon
The thing that programmers tend to overlook is validation of the program's input
Only the badduns.

Many many years ago a colleague of mine was asked to write an input validation program for local libraries data collected at remote sites, written onto forms, and typed in the "punch room" for input. The people making the request handed over a loooong list of validation rules that took weeks to code and test. Eventually all the live data was available for a test and 15k input records were put through the program. None made it through validation, and the error report was a marvel of fanfold paper consumption, the machine room (yes, THAT long ago) phoned the developer to ask when they could have the printer back...

The rules were whittled down, the code changed/tested multiple times until there was little except a few missing field checks.

Note: the libraries elected to remove the validation rather than submit data that met their own criteria...
 
Isn't part of the difficulty under win that under win the display engine is in the kernel? Again history plays a role. Nix went with X, which is just a user process. So even if you crash the window system (or just the window manager, another user process) the machine stays up. Win went with integrating the win system into the kernel, so a crash of the display engine takes out the box and does a reboot. Another side effect of this is a much larger exposure of the system API that can take out the machine. Another benefit from the *nix approach has been that I can choose which window manager I want to run and vary the look and feel of the window system without asking for a change from the global HQ. It also means I can run a much slimmer manager for things like a pi which doesn't have as much hp for eye candy. Heck you can even switch window managers by killing the current one and starting a new one. They are just user processes. X was so ahead of its time in the 80's. Kudo's to MIT.
 
Still worth as much as in January.


Screenshot 2024-07-23 at 16-09-47 crowdstrike shares - Google Search.png
 
Isn't part of the difficulty under win that under win the display engine is in the kernel?
The original Dave Cutler design of Windows NT was clean room and actually owed a lot to VMS internally I believe. This Microsoft diagram explains the difference between user mode drivers and kernel mode drivers. Display drivers USED to be Kernel mode (BSODS were oh so common) and were moved to be User mode.
 
mikeAtx - That was true when the framebuffer was dumb. Now, all computers have accelerated graphics. The GPU being programmable opens up a large attack surface.
Ed
Aware of this, but I still believe that it is more resilient as X itself is still just a process. My primary reason for the belief is I actually write client X code and have not had an X crash (or a sys crash) for almost 20 years, even when I make mistakes in the protocol. Early linux was not great and I would see an occasional crash from X. I've got machines that have run for years without crashes. For years the only source of "crash" is power fails for me. Even the laptop I write this from has been up almost a year since the last reboot. Don't recall why I rebooted it, but was definitely not a crash. I just don't hear windows boxes getting the same level of reliability.
 
  • Like
Reactions: 69383
Antivirus software is a kludge. It works by detecting signatures and activity patterns of known viruses. Antivirus software cannot detect unknown viruses and is susceptible to false positives. It can never be completely reliable.

I have never used antivirus software on my personal machines. They don't need it.

I have written a lot of software, both professionally and as a hobby. The thing that programmers tend to overlook is validation of the program's input. Programs are written and tested with legal inputs, but illegal inputs don't receive as much attention. This is a source of security holes. I am very careful about validating the input.
Ed

That's the point of using static analysis and requirements based analysis.

Those corner cases will always kill you.

Some serious companies, like Cisco, require stringent testing to issue an "interoperability" status. But they don't allow anybody's code into their hardware.

IMHO, the onus is always on the software vendor to verify that they have tested their product to an "acceptance test" issued by the purchaser. In the case of "important" software such as Crowdstrike's, Microsoft should work with such vendors to ensure proper interoperability and put a simple piece of defensive code to look for a key when installing the software. If the software you're trying to install uses deep parts of the OS then it MUST have that key or the OS will not allow it to be installed.

Think.... does Microsoft test its own software? Don't joke... be serious, THEY DO.

BTW, restricting access to certain "deep" APIs prior to installation by a simple key minimizes the amount of defensive code required (speed, simplicity) and still allows other software using not to "deep" APIs to get installed. If your browser, for example, has a bug, it won't take the whole computer down.

I think the time has come to re-examine how some code has become critical to parts of our global economy. Such code ( and systems ) has to be treated as a utility or some means of breaking it up while ensuring interoperability is the next step.


Think... The Internet didn't go down, huh? Only an application on a small percentage of ecommerce machines that by bad (lazy) design were not robust and had reached a critical mass. If there were a significant number of other applications doing the same thing and if the code had to pass more stringent testing and validation this would not have happened.

Also, it seems like stuff like Crowdstrike has become too much of an octopus... it does too many things... that's, IMHO, the fundamental problem.... This has happened by the pure laziness and cheapness of the Internet ecommerce model. If those people were held responsible for the seriousness of their application this things would not happen.
 
mikeAtx - That was true when the framebuffer was dumb. Now, all computers have accelerated graphics. The GPU being programmable opens up a large attack surface.
Ed

But to reach the GPU you have to run in the kernel ( access to physical memory ).

I suppose an MMU could be designed that would segregate access to physical memory so that applications trying to reach certain parts of it had to run with a key of sorts. The task manager would write a key to a register in the MMU to signify the key to the task being swapped in, call this the Current Task Access Key - CTAK ).... that would be very easy to do.. and the MMU would compare the CTAK with the key that unlocks the desired memory access.

If they match, the transaction is allowed, if not an error is returned and possible an interrupt thrown. With the interrupt you could easily create some "cop software" that keeps track of bad actors trying to access memory locations they have no right to access.

You know, I really HATE bad actors. They've screwed up what used to be a very nice World.
 
You know, I really HATE bad actors. They've screwed up what used to be a very nice World.
Yeah, they just keep chipping and chipping, emptying the dust down their pant leg into the yard ground, until they get a hole big enough to crawl through. Many aspects of life...

Excuse me for my ignorance, but I always thought a real computer OS did no processing that the sys admin does not install / enable. I thought it was supposed to be locked down tight already, like you just cant come in over the net and "get" something to execute somehow. A toy OS like Windows, fit for everything from some kids $100 throw-away tablet to servers; how is it still possible this OS gets highjacked from afar to run nefarious operations?

I used to think it was all exploits of M$ trying to make a computer "easy to use" by anyone who doesnt know a THING ABOUT THEM. But it cant be that still, right?
 
It begun with html... where in we allowed the remote party to install some "code" that would be interpreted by the "browser".

However, it was supposed to run in its own little virtual machine ( a process ) and not have access to much that would affect the machine proper. It was JAVA.

But the application clowns could not control themselves... the old "Look Ma! Look what I can do!"... and they blew past that... HTML became a port for other applications that were never meant to be run that way... and then access to the underlaying computer, beyond the virtual machines came to be.

"Woopsie!" The clowns said... when they realized what they had done...

And now, we're back to the beginning with the "apps" that they want us to install native code in our machines just to access our bank accounts or shop at Target... WTF?
 
In my opinion this second article is even "worse" than the first one.
Every authoritative and technological person involved who was asked, after a series of conjectures he concludes by saying that the real causes are unknown, even if a lot of more or less technical hypotheses are made.

On the other hand, how can we think that a company that within a few hours has effectively blocked millions of Windows PCs all over the world and which for this reason could be sued for damages in so many countries probably amounting to billions of dollars, tells us the truth?

And in fact CrowdStrike and Microsoft have been passing the ball back and forth for a while now.

Maybe politics (in general, no particular side) will take care of everything soon to "solve" everything... 😛

And there is no guarantee that it won't happen again, on the contrary.
 
In my opinion this second article is even "worse" than the first one.
Well, OK. But to suggest that it's nothing but idle speculation seems, to me, to be taking a pretty one sided view.

The people quoted in the article are all technology literate, have an understanding of operating systems and AV software, and have had access to the diagnostic information provided by Windows on the BSOD screen. Crowdstrike have said the problem file was a configuration file that had bad information in it, but not NUL (there's history with NUL in config files causing crashes in other unrelated software, some years ago).

While these people might be dissembling it seems to me that, at this stage, they're not stepping outside the bounds of expert conjecture.

Programs are written and tested with legal inputs, but illegal inputs don't receive as much attention
This would be my thought. The configuration file format and content logically forms part of an API, though all too often in reading the file and processing the data there's the assumption that the file "must be good", so no validation.

Most of the time it is good. Often the config files themselves have been generated by a program that's responding to user input. So it can't be wrong? Right? Until the day when someone needs to make a change and modifies the generator, can't be bothered with process, and hand edits the generator output. Because it's a simple change, what could possibly go wrong????
 
So this could possibly be down to a bad pointer?
Apparently so. The config file controls the operation of the crowdstrike kernel driver. The config was wrong and meant that the driver tried to read memory that was not accessible, the kernel detects this and crashes. It's not clear (yet) what CrowdStrike were trying to achieve with the update, so we don't know if the address was directly in the config or not. Probably not, I'd think it was calculated from dodgy information in the configuration.

The kernel drivers can set a flag (boot-start) such that when the kernel starts the driver is started even if it's caused an error. It's used for critical drivers (like discs) and the CrowdStrike driver had that enabled so that it was never possible to boot past it...

It's a nightmare, because of course if the config file is capable of causing a driver crash, you should really go through a full system test with every configuration update. Which could take days if not weeks. Not ideal if you're dealing with a zero-day threat. But that being the case, you need cast iron process in place to ensure config releases are correct...
 
Well, OK. But to suggest that it's nothing but idle speculation seems, to me, to be taking a pretty one sided view.

The people quoted in the article are all technology literate, have an understanding of operating systems and AV software, and have had access to the diagnostic information provided by Windows on the BSOD screen. Crowdstrike have said the problem file was a configuration file that had bad information in it, but not NUL (there's history with NUL in config files causing crashes in other unrelated software, some years ago).

While these people might be dissembling it seems to me that, at this stage, they're not stepping outside the bounds of expert conjecture.
Steven, as usual, you speak competently, wisely and kindly, which I really appreciate.
So the last thing I would like to do is be argumentative with you. 🙂

But. in my opinion, there is a "but".
Below I reported the sentences from that same article that inspired my previous comment.
And, of course, I'll not comment on them further. 😉

"The cause, to the extent so far revealed by CrowdStrike, was "a logic error resulting in a system crash and blue screen (BSOD) on impacted systems."

That crash stemmed from quite possibly mangled data

While there has been speculation that the error was the result of null bytes in the Channel File, CrowdStrike insists that's not the case.

the cybersecurity outfit said, promising further root cause analysis to determine how the logic flaw occurred.

Specific details about the root cause of the error have yet to be formally disclosed

security experts . . . have argued convincingly that the offending Channel File in some way caused Falcon to access

It appears Falcon reads entries from a table in memory in a loop

the crash dump and disassembly make it clear that the crash arose from trying to use uninitialized data as a pointer – a wild pointer – but further specifics remain unknown.

"We still don’t have the exact reason, though, why the channel file triggered that," he said.

Arasaratnam
* said the exact cause remains a matter of speculation because he doesn't have access to the CrowdStrike source code or the Windows kernel.

"We don't know what the channel configuration file actually entails.

"It seems obvious that something slipped past QA given the frequency with which the crash occurred," he said. "It seems like even a trivial amount of QA would have caught this.

Arasaratnam said there are several best practices that should have been observed.


"One of the techniques employed by Google, which we used when I was there, is to do what's called Canary releasesgradual or slow rollouts – and observe what's occurring rather than crashing what Microsoft estimated were 8.5 million machines.""

*Omkhar Arasaratnam is a cybersecurity veteran and general manager of OpenSSF.


Edit just to point out it was confirmed that the tests on the patch, if there ever were any, were extremely poor and superficial.
As usual. 🙄
this won't change usual way of testing updates poorly.
 
Last edited: