Fess up. You know it was you.

  • spaghetti_carbanana@krabb.org
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    1 year ago

    Worked for an MSP, we had a large storage array which was our cloud backup repository for all of our clients. It locked up and was doing this semi-regularly, so we decided to run an “OS reinstall”. Basically these things install the OS across all of the disks, on a separate partition to where the data lives. “OS Reinstall” clones the OS from the flash drive plugged into the mainboard back to all the disks and retains all configuration and data. “Factory default”, however, does not.

    This array was particularly… special… In that you booted it up, held a paperclip into the reset pin, and the LEDs would flash a pattern to let you know you’re in the boot menu. You click the pin to move through the boot menu options, each time you click it the lights flash a different pattern to tell you which option is selected. First option was normal boot, second or third was OS reinstall, the very next option was factory default.

    I head into the data centre. I had the manual, I watched those lights like a hawk and verified the “OS reinstall” LED flash pattern matched up, then I held the pin in for a few seconds to select the option.

    All the disks lit up, away we go. 10 minutes pass. Nothing. Not responding on its interface. 15 minutes. 20 minutes, I start sweating. I plug directly into the NIC and head to the default IP filled with dread. It loads. I enter the default password, it works.

    There staring back at me: “0B of 45TB used”.

    Fuck.

    This was in the days where 50M fibre was rare and most clients had 1-20M ADSL. Yes, asymmetric. We had to send guys out as far as 3 hour trips with portable hard disks to re-seed the backups over a painful 30ish days of re-ingesting them into the NAS.

    The worst part? Years later I discovered that, completely undocumented, you can plug a VGA cable in and you get a text menu on the screen that shows you which option you have selected.

    I (somehow) did not get fired.

  • GolfNovemberUniform@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Installed a flatpak app (can’t remember which one but it wasn’t obscure or shady) and smh it broke the file system on one of my main machines :) (at least I think that’s what happened because the machine started lagging, any app refused to launch and after a reboot I got an fsck error or something like that)

  • Quazatron@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Did you know that “Terminate” is not an appropriate way to stop an AWS EC2 instance? I sure as hell didn’t.

      • Quazatron@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        Noob was told to change some parameters on an AWS EC2 instance, requiring a stop/start. Selected terminate instead, killing the instance.

        Crappy company, running production infrastructure in AWS without giving proper training and securing a suitable backup process.

        • tslnox@reddthat.com
          link
          fedilink
          arrow-up
          0
          ·
          1 year ago

          Maybe there should be some warning message… Maybe a question requiring you to manually type “yes I want it” or something.

          • synae[he/him]@lemmy.sdf.org
            link
            fedilink
            English
            arrow-up
            0
            ·
            1 year ago

            Maybe an entire feature that disables it so you can’t do it accidentally, call it “termination protection” or something

      • ilinamorato@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        “Stop” is the AWS EC2 verb for shutting down a box, but leaving the configuration and storage alone. You do it for load balancing, or when you’re done testing or developing something for the day but you’ll need to go back to it tomorrow. To undo a Stop, you just do a Start, and it’s just like power cycling a computer.

        “Terminate” is the AWS EC2 verb for shutting down a box, deleting the configuration and (usually) deleting the storage as well. It’s the “nuke it from orbit” option. You do it for temporary instances or instances with sensitive information that needs to go away. To undo a Terminate, you weep profusely and then manually rebuild everything; or, if you’re very, very lucky, you restore from backups (or an AMI).

  • Accidentally deleted an entire column in a police department’s evidence database 😬

    Thankfully, it only contained filepaths that could be reconstructed via a script. But I was sweating 12+1 bullets.

  • BestBouclettes@jlai.lu
    cake
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    I was still a wee IT technician, I was supposed to remove some cables from a patch panel. I pulled at least two cables that were used as ISCSI from the hypervisors to the storage bays. During production hours. Not my proudest memory.

  • shyguyblue@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Updated WordPress…

    Previous Web Dev had a whole mess of code inside the theme that was deprecated between WP versions.

    Fuck WordPress for static sites…

  • -RJ-@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Plugged a server in after it had been repaired but the person whose responsibility it was insisted it would be fine - they didn’t release the FSMO roles from it, the time was an hour out, it changed the time EVERYWHERE and broke ALL THE THINGS. Not technically my fault, but i should have pushed harder for them to have demoted it before I turned it back on.

  • Futs@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Advertised an OS deployment to the ‘All Wokstations’ collection by mistake. I only realized after 30 minutes when peoples workstations started rebooting. Worked right through the night recovering and restoring about 200 machines.

  • TheMadIrishman@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Was troubleshooting a failed drive in a raid array on a small business DC/File Serv/Print/Everything else box. Replaced drive still showed failed. Moved to another bay thinking it was the slot not the drive. Accidentally hit yes when asked to initialize the array. Blew the whole thing away. It was an OLD server the customer was working on replacing, so I told them it finally gave up the ghost and I was taking it back to the office to keep working on it. I had been on the job for about 4 months and thought for SURE I was fired. Turns out we were already working on moving them to the cloud, so it ended up not being a big deal.

  • Monkey With A Shell@lemmy.socdojo.com
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Found out the hard way to triple check your work when adding a new line to the proxy policy. Or, more accurately 2 lines when you only planned one, and that second one defaulted to a ‘deny all’ and resulted in dropping all web traffic out for the company…

    That made for a REAL tense meeting the next day after it got deployed and people started asking WTF happened…

  • FaceDeer@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    It wasn’t “worst” in terms of how much time it wasted, but the worst in terms of how tricky it was to figure out. I submitted a change list that worked on my machine as well as 90% of the build farm and most other dev and QA machines, but threw a baffling linker error on the remaining 10%. It turned out that the change worked fine on any machine that used to have a particular old version of Visual Studio installed on it, even though we no longer used that version and had phased it out for a newer one. The code I had written depended on a library that was no longer in current VS installs but got left behind when uninstalling the old one. So only very new computers were hitting that, mostly belonging to newer hires who were least equipped to figure out what was going on.

    • tslnox@reddthat.com
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      That reminds me of when some of my former colleagues and I were on a training about programming industrial camera system that judges the quality of produced parts. I’m not really a programmer, just a guy who can troubleshoot and google stuff and occasionally hack together a simple code with heavy help from Google too.

      The guy was a German (we are Czech and we communicated in English) programmer who coded the whole thing in Omron software but he also wrote his own plugin for it. All was well when he was showing us on the big screen, but when he sent us the program file so we could experiment on it (changing parameters, adding steps to the flow…) the app would crash. I finally delved into the app logs and with the help of Google I found it was because he compiled his plugin with debug flags and it worked for him because he had the VS debug DLLs installed but we didn’t.

  • slazer2au@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    I took down an ISPfor a couple hours because I forgot the ‘add’ keyword at the end of a Cisco configuration line

    • sloppy_diffuser@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      That’s a rite of passage for anyone working on Cisco’s shit TUI. At least its gotten better with some of the newer stuff. IOS-XR supported commits and diffing.