Archive link: https://archive.ph/GtA4Q

The complete destruction of Google Search via forced AI adoption and the carnage it is wreaking on the internet is deeply depressing, but there are bright spots. For example, as the prophecy foretold, we are learning exactly what Google is paying Reddit $60 million annually for. And that is to confidently serve its customers ideas like, to make cheese stick on a pizza, “you can also add about 1/8 cup of non-toxic glue” to pizza sauce, which comes directly from the mind of a Reddit user who calls themselves “Fucksmith” and posted about putting glue on pizza 11 years ago.

A joke that people made when Google and Reddit announced their data sharing agreement was that Google’s AI would become dumber and/or “poisoned” by scraping various Reddit shitposts and would eventually regurgitate them to the internet. (This is the same joke people made about AI scraping Tumblr). Giving people the verbatim wisdom of Fucksmith as a legitimate answer to a basic cooking question shows that Google’s AI is actually being poisoned by random shit people say on the internet.

Because Google is one of the largest companies on Earth and operates with near impunity and because its stock continues to skyrocket behind the exciting news that AI will continue to be shoved into every aspect of all of its products until morale improves, it is looking like the user experience for the foreseeable future will be one where searches are random mishmashes of Reddit shitposts, actual information, and hallucinations. Sundar Pichai will continue to use his own product and say “this is good.”

  • ILikeBoobies@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    Speaking of, I found a recipe today which had to have been ai generated because the ingredient list and the directions were for completely different recipes

  • Maggoty@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    I’m just thinking of all the really dumb shit we all said on Reddit as satire. Oh I need to go search military meme stuff!

  • Ketchup@reddthat.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    Now I only regret not *EDITING all of my Reddit posts to say complete nonsense when I deleted my account June 2023. Instead I deleted each and every post and requested a copy of my data to cost them money.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      I’m sure they used a dataset from before people started editing and deleting stuff.

  • Margot Robbie@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.

    There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let’s do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let’s say, the internet marketing campaign of a major Hollywood blockbuster.

    Since scrapers do not understand context, by creating shitposts in similar format to, let’s say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it’s likely that shitposts in the same format would be generated instead, with no one being the wiser.

    That would be pretty funny.

    Again, this is entirely hypothetical, of course.

  • Konala Koala@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    At least this is not “Google Is Paying Lemmy $60 Million for Fucksmith to Tell Its Lemmings to Eat Glue” otherwise I would be wondering why Lemmy Admins are excepting huge wads of cash from tech giants.

    • CodeInvasion@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      You do realize that every posted on the Fediverse is open and publicly available? It’s not locked behind some API or controlled by any one company or entity.

      Fediverse is the Wikipedia of encyclopedias and any researcher or engineer, including myself, can and will use Lemmy data to create AI datasets with absolutely no restrictions.

      • AnarchistArtificer@slrpnk.net
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 months ago

        I personally don’t have nearly as much of a problem with that than I do with Reddit making AI deals. I’m still not keen on the idea of having anything I interact with being scraped for training AI, but aside from only interacting in closed wall spaces that I or someone I trust controls, I can’t change that. That’a not great for actually interacting with the world though, so it seems that I need to accept that scraping is going to happen. Given that, I’d definitely rather be on Lemmy than Reddit.

        And this way, who knows, maybe we’re on our way to the almost utopian “open digital commons”

  • pkmkdz@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    And then they just slap small disclaimer on bottom of the page “Ai may make mistakes” and they are safe legally. I hope there will be class action lawsuit on them some day regardless.

    • NotMyOldRedditName@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      6 months ago

      Air Canada tried this and lost in court.

      The AI gave wrong advice on a policy, person acted on it, and then Air Canada said, nah dude, the AI was wrong, tough shit.

      • can@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        6 months ago

        More info

        Air Canada has been ordered to pay compensation to a grieving grandchild who claimed they were misled into purchasing full-price flight tickets by an ill-informed chatbot.

        In an argument that appeared to flabbergast a small claims adjudicator in British Columbia, the airline attempted to distance itself from its own chatbot’s bad advice by claiming the online tool was “a separate legal entity that is responsible for its own actions.”

        “This is a remarkable submission,” Civil Resolution Tribunal (CRT) member Christopher Rivers wrote.

        “While a chatbot has an interactive component, it is still just a part of Air Canada’s website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot.”

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    6 months ago

    Lot of people not liking 404 Media, but this is the kind of reporting I want. Point out what’s going wrong. Bring it to a conversation without a lot of skew. Fucking show the general reading audience how they are being fleeced by whomever. Didn’t Vice do this at one point?

  • Crack0n7uesday@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    That is a legit trick to use when making commercials for pizza and other chain restaurant food, but not for eating…

  • nyan@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    This is why you don’t train a bot on the entire Internet and then use it to offer advice. Even if only 1% of all posts are dangerously ignorant . . . that’s a lot of dangerous ignorance.

    Fortunately, this particular piece of bad advice is unlikely to poison any fool who goes through with it, since PVA glue is not considered an ingestion hazard, but “non-toxic” doesn’t mean “edible”, it just means “not going to poison you when used in the intended manner”. “Non-toxic” can still be quite dangerous if you mistake something intended as linoleum pigment for a dessert topping.

  • duffman@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    I Googled some extremely invasive weed(creeping buttercup) and Google suggested to let it be, quoting some awful reddit comment.

    • dumblederp@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      I googled how to increase my blue tooth range and was told to place the devices closer to each other.

  • Nightwatch Admin@feddit.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    AI will continue to be shoved into every aspect of all of its products until morale improves

    Stahp! I can only get so hard!