What the CrowdStrike Outage Actually Taught Us About IT Vendor Risk — 18 Months Later

8.5 Million Machines Went Down in Hours. Most Businesses Still Haven’t Changed Anything.

In July 2024, a single faulty software update — pushed automatically by a widely used cybersecurity vendor — took down an estimated 8.5 million Windows machines in hours. Airlines couldn’t board passengers. Hospitals delayed surgeries. Emergency dispatch systems went dark across multiple states. The financial damage ran into the billions. The lesson on IT vendor risk was immediate and severe: when one deeply embedded vendor fails, the blast radius is everyone’s problem.

It wasn’t a cyberattack. No threat actor was involved. It was a routine configuration update that skipped adequate testing, deployed globally, and triggered a catastrophic boot loop on every affected machine.

Eighteen months later, the question isn’t whether it was bad. Everyone knows it was bad. The question is: what, specifically, did your business change because of it?

For most small and mid-sized businesses, the honest answer is: not much.

What Actually Happened — Without the Technical Theater

CrowdStrike’s endpoint protection software runs at a deep level inside Windows. When it received a malformed configuration update on July 19th, machines couldn’t finish starting up. Every affected device displayed the now-infamous blue screen and rebooted endlessly.

The fix required a technician to manually intervene on each machine — boot into safe mode, locate and delete a specific file, restart. At scale, across thousands of endpoints, that’s not a fix. That’s a crisis.

What made it worse wasn’t just the technical failure. It was the architecture behind it: one vendor, deeply embedded, automatically trusted, pushing updates to millions of machines simultaneously — with no meaningful gate between a test environment and global production.

That architecture exists in some form at nearly every small business in the country.

The Real IT Vendor Risk Isn’t One Company — It’s How You’ve Built Your Stack

What the CrowdStrike Outage Actually Taught Us About IT Vendor Risk — 18 Months Later — professional IT services

Blaming CrowdStrike misses the point. They made a serious error in their update process, and they’ve since made public commitments to improve their testing pipeline. That’s worth noting. But the underlying lesson isn’t about one company’s failure.

The lesson is about concentration risk — and most businesses still aren’t thinking about it clearly.

Concentration risk means this: when a single vendor’s software, service, or infrastructure failure has the power to bring your entire operation to a halt, you have a structural problem — regardless of how good that vendor normally is.

This applies to your endpoint protection. It applies to your cloud provider. It applies to your phone system, your email platform, your backup solution, and yes — your IT firm itself. If any one of those threads gets pulled, how bad does it get?

For most NJ small businesses, the honest answer is: bad enough to matter. CISA’s guidance on vendor-related incidents is clear: third-party software risk is now one of the primary threats to business continuity across every sector.

Three Things a Well-Run IT Environment Would Have Had in Place

No configuration makes a business immune to vendor failures. But there is a significant difference between environments built with resilience in mind and those that aren’t. Here’s what that difference looks like in practice.

1. Staged Update Deployment

Automatic updates are a double-edged tool. They’re essential for patching security vulnerabilities quickly — but pushed without any delay or testing gate, they carry the CrowdStrike risk: a bad update reaches every machine simultaneously.

A well-managed environment doesn’t push every update to every machine the moment it’s available. Updates are staged — a small group of machines receives them first, behavior is monitored, and broader deployment happens only after a quiet period confirms stability. This doesn’t eliminate risk. It limits blast radius.

If your current IT firm can’t tell you how updates are staged across your environment, that’s a gap worth examining. The NIST Cybersecurity Framework addresses patch and update management as a core component of organizational resilience — staged deployment aligns directly with those recommendations.

2. Documented Recovery Procedures — Not Just Backups

Most businesses now understand the importance of data backups. Fewer have thought carefully about recovery procedures — what happens, step by step, when a key system can’t start.

The CrowdStrike fix wasn’t complicated. But it required a person with the right knowledge, access, and instructions at every affected machine. Companies with documented recovery runbooks and trained internal contacts recovered in hours. Companies whose IT firm had to figure it out in real time recovered in days.

A recovery plan that exists only in your IT firm’s head isn’t a plan. It’s a dependency.

3. A Vendor Risk Conversation That Actually Happens

This one is almost never done. Most small businesses have no process for evaluating the downside risk of the vendors embedded in their technology stack. Tools get adopted because they work well — not because someone asked: “If this vendor has a major incident, how long are we down, and who calls whom?”

That conversation should happen at least once a year. It doesn’t need to be a formal audit. It needs to be an honest look at which vendors have deep access to your systems, what their update and incident history looks like, and whether your current setup would survive their worst day.

The Businesses Nobody Wrote About on July 19th

Here’s what didn’t make the news: the businesses that stayed up.

Some organizations went through that day without interruption — not because they were lucky, but because their environments were built differently. Staged updates. Layered systems. Recovery procedures that didn’t require calling someone in a panic. Vendors that weren’t all trusted at the same depth simultaneously.

That’s the goal: quiet. No drama. No board-level conversations about why the company was down for three days. No client calls explaining why deliverables were missed.

Quiet is engineered. It doesn’t happen by accident, and it doesn’t come from buying the right product. It comes from how an environment is designed and maintained over time.

Questions Worth Putting to Your IT Firm Today

If you haven’t revisited your technology stack since July 2024, these questions deserve a direct answer from whoever manages your IT:

How are software updates staged across our environment — and what’s the process before something reaches all our machines?
If a key security or infrastructure vendor has a major incident on a Tuesday morning, what are the first three things that happen on our side?
Do we have documented recovery procedures, or does everything depend on you being available?
Which vendors in our stack have the deepest access — and have we ever evaluated what happens if they go down?

These aren’t trick questions. A good IT firm answers them without hesitation. Hesitation is information.

Eighteen Months Is Long Enough

The CrowdStrike incident was one of the most widely covered technology failures in recent memory. It generated weeks of analysis, congressional scrutiny, and earnest commitments from vendors across the industry to do better.

And then, for most businesses, life returned to normal — and the structural questions it raised went unaddressed.

Eighteen months is long enough to have made changes. The businesses that will handle the next incident quietly — and there will be a next incident, from some vendor, on some platform — are the ones that used the last eighteen months to build differently.

If you’re not sure where your environment stands, that’s exactly what a Business Technology Growth & Risk Assessment is designed to surface. Not an audit for its own sake — an honest look at whether your technology is built to handle the business risk it’s sitting under.

Reserve Your Business Technology Growth & Risk Assessment and find out where the gaps are — before the next vendor has a bad Tuesday.

What the CrowdStrike Outage Actually Taught Us About IT Vendor Risk — 18 Months Later

8.5 Million Machines Went Down in Hours. Most Businesses Still Haven’t Changed Anything.

What Actually Happened — Without the Technical Theater

The Real IT Vendor Risk Isn’t One Company — It’s How You’ve Built Your Stack

Three Things a Well-Run IT Environment Would Have Had in Place

1. Staged Update Deployment

2. Documented Recovery Procedures — Not Just Backups

3. A Vendor Risk Conversation That Actually Happens

The Businesses Nobody Wrote About on July 19th

Questions Worth Putting to Your IT Firm Today

Eighteen Months Is Long Enough

Xact IT Solutions

Quick Links

Services

Recent Blogs

What the CrowdStrike Outage Actually Taught Us About IT Vendor Risk — 18 Months Later

8.5 Million Machines Went Down in Hours. Most Businesses Still Haven’t Changed Anything.

What Actually Happened — Without the Technical Theater

The Real IT Vendor Risk Isn’t One Company — It’s How You’ve Built Your Stack

Three Things a Well-Run IT Environment Would Have Had in Place

1. Staged Update Deployment

2. Documented Recovery Procedures — Not Just Backups

3. A Vendor Risk Conversation That Actually Happens

The Businesses Nobody Wrote About on July 19th

Questions Worth Putting to Your IT Firm Today

Eighteen Months Is Long Enough

Xact IT Solutions

Follow Us

Quick Links

Services

Recent Blogs