@[email protected] to

[email protected]English • 7 months ago

Major IT outage affecting banks, airlines, media outlets across the world

cross-posted to:
[email protected]

1.2K

Major IT outage affecting banks, airlines, media outlets across the world

@[email protected] to

[email protected]English • 7 months ago

cross-posted to:
[email protected]

Live: Major IT outage affecting banks, airlines, media outlets across the world

There are reports of IT outages affecting major institutions in Australia and internationally.

All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…

Chat

YTG123
link
fedilink
English
167•7 months ago
>Make a kernel-level antivirus
>Make it proprietary
>Don’t test updates… for some reason??
- @[email protected]
  link
  fedilink
  English
  55•7 months ago
  I mean I know it’s easy to be critical but this was my exact thought, how the hell didn’t they catch this in testing?
  - @[email protected]
    link
    fedilink
    English
    50•7 months ago
    I have had numerous managers tell me there was no time for QA in my storied career. Or documentation. Or backups. Or redundancy. And so on.
    - u/lukmly013 💾 (lemmy.sdf.org)
      link
      fedilink
      English
      19•7 months ago
      Just always make sure you have some evidence of them telling you to skip these.
      - @[email protected]
        link
        fedilink
        English
        11•7 months ago
        There’s a reason I still use lots of email in the age of IM. Permanent records, please. I will email a record of in person convos or chats on stuff like this. I do it politely and professionally, but I do it.
        
        @[email protected]
        link
        fedilink
        English
        6•7 months ago
        A lot of people really need to get into the habit of doing this.
        
        “Per our phone conversation earlier, my understanding is that you would like me to deploy the new update without any QA testing. As this may potentially create significant risks for our customers, I just want to confirm that I have correctly understood your instructions before proceeding.”
        
        If they try to call you back and give the instruction over the phone, then just be polite and request that they reply to your email with their confirmation. If they refuse, say “Respectfully, if you don’t feel comfortable giving me this direction in writing, then I don’t feel comfortable doing it,” and then resend your email but this time loop in HR and legal (if you’ve ever actually reached this point, it’s basically down to either them getting rightfully dismissed, or you getting wrongfully dismissed, with receipts).
        
        @[email protected]
        link
        fedilink
        English
        3•7 months ago
        Engineering prof in uni was big on journals/log books for cyoa and it’s stuck with me, I write down everything I do during the day, research, findings etc, easily the best bit of advice I ever had.
        
        @[email protected]
        link
        fedilink
        English
        1•7 months ago
        
        Permanent records, please.
        
        The issue with this is that a lot of companies have a retention policy that only retains emails for a particular period, after which they’re deleted unless there’s a critical reason why they can’t be (eg to comply with a legal hold). It’s common to see 2, 3 or 5 year retention policies.
      - @[email protected]
        link
        fedilink
        English
        2•
        edit-2
        7 months ago
        Unless their manager works in Boeing.
        
        Midnight Wolf
        link
        fedilink
        English
        4•7 months ago
        There’s some holes in our production units
        
        Software holes, right?
        
        …
        
        …software holes, right?
    - @[email protected]
      link
      fedilink
      English
      19•7 months ago
      Move fast and break things! We need things NOW NOW NOW!
    - The Quuuuuill
      link
      fedilink
      English
      11•7 months ago
      Push that into the technical debt. Then afterwards never pay off the technical debt
  - @[email protected]
    link
    fedilink
    English
    44•7 months ago
    Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can’t test for.
    
    But this… This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn’t raises massive questions about the safety and security of Crowdstrike’s internal processes.
    - Midnight Wolf
      link
      fedilink
      English
      7•7 months ago
      
      most basic of testing
      
      “I ran the update and now shit’s proper fucked”
      - @[email protected]
        link
        fedilink
        English
        6•7 months ago
        That would have been sufficient to notice this update’s borked
    - @[email protected]
      link
      fedilink
      English
      6•7 months ago
      I think when you are this big you need to roll out any updates slowly. Checking along the way they all is good.
      - @[email protected]
        link
        fedilink
        English
        21•7 months ago
        The failure here is much more fundamental than that. This isn’t a “no way we could have found this before we went to prod” issue, this is a “five minutes in the lab would have picked it up” issue. We’re not talking about some kind of “Doesn’t print on Tuesdays” kind of problem that’s hard to reproduce or depends on conditions that are hard to replicate in internal testing, which is normally how this sort of thing escapes containment. In this case the entire repro is “Step 1: Push update to any Windows machine. Step 2: THERE IS NO STEP 2”
        
        There’s absolutely no reason this should ever have affected even one single computer outside of Crowdstrike’s test environment, with or without a staged rollout.
        
        @[email protected]
        link
        fedilink
        English
        7•7 months ago
        God damn this is worse than I thought… This raises further questions… Was there a NO testing at all??
        
        @[email protected]
        link
        fedilink
        English
        1•7 months ago
        Tested on Windows 10S
        
        @[email protected]
        link
        fedilink
        English
        5•7 months ago
        My guess is they did testing but the build they tested was not the build released to customers. That could have been because of poor deployment and testing practices, or it could have been malicious.
        
        Such software would be a juicy target for bad actors.
        
        @[email protected]
        link
        fedilink
        English
        1•7 months ago
        Agreed, this is the most likely sequence of events. I doubt it was malicious, but definitely could have occurred by accident if proper procedures weren’t being followed.
    - @[email protected]
      link
      fedilink
      English
      2•7 months ago
      Yes. And Microsoft’s
      - @[email protected]
        link
        fedilink
        English
        4•7 months ago
        How exactly is Microsoft responsible for this? It’s a kernel level driver that intercepts system calls, and the software updated itself.
        
        This software was crashing Linux distros last month too, but that didn’t make headlines because it effected less machines.
        
        @[email protected]
        link
        fedilink
        English
        2•7 months ago
        From what I’ve heard, didn’t the issue happen not solely because of CS driver, but because of a MS update that was rolled out at the same time, and the changes the update made caused the CS driver to go haywire? If that’s the case, there’s not much MS or CS could have done to test it beforehand, especially if both updates rolled out at around the same time.
        
        @[email protected]
        link
        fedilink
        English
        3•7 months ago
        I’ve seen zero suggestion of this in any reporting about the issue. Not saying you’re wrong, but you’re definitely going to need to find some sources.
        
        @[email protected]
        link
        fedilink
        English
        2•7 months ago
        Is there any links to this?
        
        @[email protected]
        link
        fedilink
        English
        1•7 months ago
        My apologies I thought this went out with a MS update
  - @[email protected]
    link
    fedilink
    English
    1•7 months ago
    From what I’ve heard and to play a devil’s advocate, it coincidented with Microsoft pushing out a security update at basically the same time, that caused the issue. So it’s possible that they didn’t have a way how to test it properly, because they didn’t have the update at hand before it rolled out. So, the fault wasn’t only in a bug in the CS driver, but in the driver interaction with the new win update - which they didn’t have.
    - @[email protected]
      link
      fedilink
      English
      3•7 months ago
      How sure are you about that? Microsoft very dependably releases updates on the second Tuesday of the month, and their release notes show if updates are pushed out of schedule. Their last update was on schedule, July 9th.
      - @[email protected]
        link
        fedilink
        English
        4•7 months ago
        I’m not. I vaguely remember seeing it in some posts and comments, and it would explain it pretty well, so I kind of took it as a likely outcome. In hindsight, You are right, I shouldnt have been spreading hearsay. Thanks for the wakeup call, honestly!
- @[email protected]
  link
  fedilink
  English
  30•
  edit-2
  7 months ago
  You left out
  
  >Pushed a new release on a Friday
- @[email protected]
  link
  fedilink
  English
  12•7 months ago
  You left out > Profit
  
  Oh… Wait…Hang on a sec.
- @[email protected]
  link
  fedilink
  English
  3•7 months ago
  Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It’s a necessity for these things to actually be effective.
  - @[email protected]
    link
    fedilink
    English
    3•7 months ago
    You missed most important line:
    
    >Make it proprietary

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

5.92K users / day
10.3K users / week
17.5K users / month
31.7K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog