ForgottenFlux@lemmy.world to

Technology@lemmy.worldEnglish · 6 months ago

Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue

www.404media.co

cross-posted to:
technology@beehaw.org

0

Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue

www.404media.co

ForgottenFlux@lemmy.world to

Technology@lemmy.worldEnglish · 6 months ago

cross-posted to:
technology@beehaw.org

"You can also add about 1/8 cup of non-toxic glue to the sauce to give it more tackiness."

Archive link: https://archive.ph/GtA4Q

The complete destruction of Google Search via forced AI adoption and the carnage it is wreaking on the internet is deeply depressing, but there are bright spots. For example, as the prophecy foretold, we are learning exactly what Google is paying Reddit $60 million annually for. And that is to confidently serve its customers ideas like, to make cheese stick on a pizza, “you can also add about 1/8 cup of non-toxic glue” to pizza sauce, which comes directly from the mind of a Reddit user who calls themselves “Fucksmith” and posted about putting glue on pizza 11 years ago.

A joke that people made when Google and Reddit announced their data sharing agreement was that Google’s AI would become dumber and/or “poisoned” by scraping various Reddit shitposts and would eventually regurgitate them to the internet. (This is the same joke people made about AI scraping Tumblr). Giving people the verbatim wisdom of Fucksmith as a legitimate answer to a basic cooking question shows that Google’s AI is actually being poisoned by random shit people say on the internet.

Because Google is one of the largest companies on Earth and operates with near impunity and because its stock continues to skyrocket behind the exciting news that AI will continue to be shoved into every aspect of all of its products until morale improves, it is looking like the user experience for the foreseeable future will be one where searches are random mishmashes of Reddit shitposts, actual information, and hallucinations. Sundar Pichai will continue to use his own product and say “this is good.”

Chat

Hackerman_uwu@lemmy.world
link
fedilink
English
arrow-up
0·
edit-2
6 months ago
Is this real though? Does ChatGPT just literally take whole snippets of texts like that? I thought it used some aggregate or probability based on the whole corpus of text it was trained on.
- uranos@sh.itjust.works
  link
  fedilink
  English
  arrow-up
  0·
  6 months ago
  This is not the model directly but the model looking through Google searches to give you an answer.
- bionicjoey@lemmy.ca
  link
  fedilink
  English
  arrow-up
  0·
  6 months ago
  It does, but the thing with the probability is that it doesn’t always pick the most likely next bit of text, it basically rolls dice and picks maybe the second or third or in rare cases hundredth most likely continuation. This chaotic behaviour is part of what makes it feel “intelligent” and why it’s possible to reroll responses to the same prompt.
  - blusterydayve26@midwest.social
    link
    fedilink
    English
    arrow-up
    0·
    6 months ago
    Back in my day, we called that “hard-mode plagiarism.” They can’t punish you if they can’t find a specific plagiarized source!
  - sugar_in_your_tea@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0·
    edit-2
    6 months ago
    I remember doing ghetto text generation in my NLP (Natural Language Processing) class, and the logic was basically this:
    
    Associate words with a probability number - e.g. given the word “math”: “homework” has 25% chance, “class” has 20% chance, etc; these probabilities are generated from the training data
    
    Generate a random number to decide which word to pick next - average roll gives likely response, less likely roll gives less likely response
    
    Repeat for as long as you need to generate text
    
    This is a rough explanation of Baysian nets, which I think are what’s used in LLMs. We used a very simple n-gram model (e.g. n words are considered for the statistics, e.g. “to my math” is much more likely to generate “class” than “homework”), but they’re probably doing fancy things with text categorization and whatnot to generate more relevant text.
    
    The LLM isn’t really “thinking” here, it’s just associating input text and the training data to generate output text.
    - Karyoplasma@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      0·
      edit-2
      6 months ago
      Most LLMs are transformers, in fact GPT stands for Generative Pre-trained Transformer. They are a different to Bayesian networks as transformers are not state machines, but rather assign importance according to learned attention based on their training. The main upside of this approach is scalability because it can be easily parallelized due to not relying on states.
    - bionicjoey@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0·
      6 months ago
      Yeah I’m not an AI expert, or even really someone who studies it as my primary role. But my understanding is that part of the “innovation” of modern LLMs is that they generate tokens, which are not necessarily full words, but simply small linguistic units. So basically with enough training the model can learn to predict the most likely next couple of characters and the words just generate themselves.
      - sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0·
        6 months ago
        I haven’t looked too much into it either, but from that very brief description, it sounds like that would help to mostly make it sound more natural by abstracting a bit over word roots and considering grammar structures, without actually baking those into the model as logic.
        
        AI text does read pretty naturally, so hopefully my interpretation is correct. But it’s also very verbose, and can repeat itself a lot.
    - alphafalcon@feddit.de
      link
      fedilink
      English
      arrow-up
      0·
      6 months ago
      Sounds quite similar to Markov chains which made me think of this story:
      
      https://thedailywtf.com/articles/the-automated-curse-generator
      
      Still gets a snort out of me every time Markov chains are mentioned.

Technology@lemmy.world

technology@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

837 users / day
1.97K users / week
5.19K users / month
13.4K users / 6 months
0 local subscribers
59.3K subscribers
9.25K Posts
284K Comments
Modlog