• Delta Air Lines CEO Ed Bastian said the massive IT outage earlier this month that stranded thousands of customers will cost it $500 million.
  • The airline canceled more than 4,000 flights in the wake of the outage, which was caused by a botched CrowdStrike software update and took thousands of Microsoft systems around the world offline.
  • Bastian, speaking from Paris, told CNBC’s “Squawk Box” on Wednesday that the carrier would seek damages from the disruptions, adding, “We have no choice.”
  • Th4tGuyII@fedia.io
    link
    fedilink
    arrow-up
    0
    ·
    3 months ago

    I think what @riskable@programming.dev was saying is you shouldn’t have multiple mission critical systems all using the same 3rd party services. Have a mix of at least two, so if one 3rd party service goes down not everything goes down with it

    • partial_accumen@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      3 months ago

      That sounds easy to say, but in execution it would be massively complicated. Modern enterprises are littered with 3rd party services all over the place. The alternative is writing and maintaining your own solution in house, which is an incredibly heavy lift to cover the entirety of all services needed in the enterprise. Most large enterprises are resources starved as is, and this suggestion of having redundancy for any 3rd party service that touches mission critical workloads would probably increase burden and costs by at least 50%. I don’t see that happening in commercial companies.

      • Th4tGuyII@fedia.io
        link
        fedilink
        arrow-up
        0
        ·
        3 months ago

        As far as the companies go, their lack of resources is an entirely self-inflicted problem, because they’re won’t invest in increasing those resources, like more IT infrastructure and staff. It’s the same as many companies that keep terrible backups of their data (if any) when they’re not bound to by the law, because they simply don’t want to pay for it, even though it could very well save them from ruin.

        The crowdstrike incident was as bad as it was exactly because loads of companies had their eggs in one basket. Those that didn’t recovered much quicker. Redundancy is the lesson to take from this that none of them will learn.

        • partial_accumen@lemmy.world
          link
          fedilink
          arrow-up
          0
          ·
          3 months ago

          As far as the companies go, their lack of resources is an entirely self-inflicted problem, because they’re won’t invest in increasing those resources, like more IT infrastructure and staff.

          Play that out to its logical conclusion.

          • Our example airline suddenly doubles or triples its IT budget.
          • The increased costs don’t actually increase profit it merely increases resiliency
          • Other airlines don’t do this.
          • Our example airline has to increase ticket prices or fees to cover the increased IT spending.
          • Other airlines don’t do this.
          • Customers start predominantly flying the other airlines with their cheaper fares.
          • Our example airline goes out of business, or gets acquired by one of the other airlines

          The end result is all operating airlines are back to the prior stance.

          • cm0002@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            3 months ago

            Our example airline has to increase ticket prices or fees to cover the increased IT spending.

            Or they could just cut already excessive executive bonuses…

            • partial_accumen@lemmy.world
              link
              fedilink
              arrow-up
              0
              ·
              3 months ago

              You know they’re not going to do that, so how useful is it to suggest that? If we just want to talk about pie-in-the-sky fixes then sure, but at the end of that we’ll likely have nationalized airlines, which that isn’t happening either.

              So are we talking about fantasy or things that can actually happen?

              • cm0002@lemmy.world
                link
                fedilink
                arrow-up
                0
                ·
                3 months ago

                No, we’re talking about things that should happen and things that should be called out every time.

                Not just throwing up our hands and going “welp, they won’t willingly do it so there’s nothing we can do” like you seem to be doing.

          • bomibantai@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            3 months ago

            customers start predominantly flying the other airlines with cheaper fares

            I was with you till this part, except with the way flying is set up in this country, there’s very little competition between airlines. They’ve essentially set themselves up with airports/hubs so if an airline is down for a day, that’s kinda it unless you want to switch to a different airport.

          • brianary@startrek.website
            link
            fedilink
            arrow-up
            0
            ·
            3 months ago

            Two big assumptions here.

            First, multiple business systems are already being supported, and the OS only incidentally. Assuming double or triple IT costs is very unlikely, but feel free to post evidence to the contrary.

            Second, a tight coupling between costs and prices. Anyone that’s been paying attention to gouging and shrinkflation of the past few years of record profits, or the doomsaying virtually anywhere the minimum wage has increased and businesses haven’t been annihilated, would know this is nonsense.

            • partial_accumen@lemmy.world
              link
              fedilink
              arrow-up
              0
              ·
              3 months ago

              First, multiple business systems are already being supported, and the OS only incidentally. Assuming double or triple IT costs is very unlikely, but feel free to post evidence to the contrary.

              The suggestion the poster made was that ALL 3rd party services need to have an additional counterpart for redundancy. So we’re not just talking about a second AV vendor. We have to duplicate ALL 3rd party services running on or supporting critical workloads to meet what that poster is suggesting.

              • inventory agents
              • OS patching
              • security vulnerability scanning
              • file and DB level backup
              • monitoring and alerting
              • remote access management
              • PAM management
              • secrets management
              • config managment

              …the list goes on.

              Anyone that’s been paying attention to gouging and shrinkflation of the past few years of record profits, or the doomsaying virtually anywhere the minimum wage has increased and businesses haven’t been annihilated, would know this is nonsense.

              You’re suggesting the companies simply take less profits? Those company’s board of directors will get annihilated by shareholders. The board would be voted out with their IT improvement plans, and replace with those that would return to profitability.

              • brianary@startrek.website
                link
                fedilink
                arrow-up
                0
                ·
                3 months ago

                Even load-balancing multiple servers in a homogenous network, where patches are only deployed in phases is better (and a best practice) than what, to outside observers, appears to have been everything going down due to a mass update everywhere, all at once.

    • ricecake@sh.itjust.works
      link
      fedilink
      arrow-up
      0
      ·
      3 months ago

      In this case, it’s a local third party tool and they thought they could control to cadence of updates. There was no reason to think there was anything particularly unstable about the situation.

      This is closer to saying that half of your servers should be Linux and half should be windows in case one has a bug.

      Crowdstrike bypassed user controls on updates.
      The normal responsible course of action is to deploy an update to a small test environment, test to make sure it doesn’t break anything, and then slowly deploy it to more places while watching for unexpected errors.
      Crowdstrike shotgunned it to every system at once without monitoring, with grossly inadequate testing, and entirely bypassed any user configurable setting to avoid or opt out of the update.

      I was much more willing to put the blame on the organizers that had the outages for failing to follow best practices before I learned that they way the update was pushed would have entirely bypassed any of those safeguards.

      It’s unreasonable to say that an organization needs to run multiple copies of every service with different fundamental infrastructure choices for each in case one magics itself broken.

      • kbin_space_program@kbin.run
        link
        fedilink
        arrow-up
        0
        ·
        3 months ago

        Crowdstrike also bypassed Microsoft’s driver signing as part of their update process, just to make the updates release faster.

        That MS is getting any flak for this is just shit journalism.