Archive link: https://archive.ph/GtA4Q
The complete destruction of Google Search via forced AI adoption and the carnage it is wreaking on the internet is deeply depressing, but there are bright spots. For example, as the prophecy foretold, we are learning exactly what Google is paying Reddit $60 million annually for. And that is to confidently serve its customers ideas like, to make cheese stick on a pizza, “you can also add about 1/8 cup of non-toxic glue” to pizza sauce, which comes directly from the mind of a Reddit user who calls themselves “Fucksmith” and posted about putting glue on pizza 11 years ago.
A joke that people made when Google and Reddit announced their data sharing agreement was that Google’s AI would become dumber and/or “poisoned” by scraping various Reddit shitposts and would eventually regurgitate them to the internet. (This is the same joke people made about AI scraping Tumblr). Giving people the verbatim wisdom of Fucksmith as a legitimate answer to a basic cooking question shows that Google’s AI is actually being poisoned by random shit people say on the internet.
Because Google is one of the largest companies on Earth and operates with near impunity and because its stock continues to skyrocket behind the exciting news that AI will continue to be shoved into every aspect of all of its products until morale improves, it is looking like the user experience for the foreseeable future will be one where searches are random mishmashes of Reddit shitposts, actual information, and hallucinations. Sundar Pichai will continue to use his own product and say “this is good.”
When do they announce a deal with 4chan?
I wonder what cuil things it will say if you start asking questions about hamburgers instead…
I primed ChatGPT with cuil theory (which it already knew) and here’s what it came up with
You ask me for a hamburger. I nod and walk into the kitchen, but instead of returning with a hamburger, I bring you a picture of a hamburger. Confused, you ask again, and this time I present you with a photograph of you asking for a hamburger. Frustrated, you repeat your request, and I hand you an intricate painting of the universe, meticulously devoid of any trace of hamburgers. Baffled, you insist once more, and suddenly, a trout appears, reciting lines from Shakespeare’s “Hamlet.” Undeterred, you ask again, and I give you a detailed map of Atlantis, with all the continents shaped like hamburgers. Your persistence leads me to produce an ancient scroll, describing a hamburger in a forgotten language. As your patience wears thin, I conjure a sentient cloud that dreams of becoming a hamburger. Still seeking a hamburger, you find yourself transported to a dimension where hamburgers debate human rights. Finally, a symphony envelops you, its notes tasting like a hamburger. At your final request, the fabric of reality unravels, and in an existential twist, you become the hamburger you so desperately sought.
Not bad. Doesn’t look like it cribbed directly from any existing texts, at least as far as I can tell by searching Google for “cuil hamburger hamlet atlantis”.
Hey Google, when is Jenny available to meet up for kisses?
Now I only regret not *EDITING all of my Reddit posts to say complete nonsense when I deleted my account June 2023. Instead I deleted each and every post and requested a copy of my data to cost them money.
I’m sure they used a dataset from before people started editing and deleting stuff.
Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.
There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let’s do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let’s say, the internet marketing campaign of a major Hollywood blockbuster.
Since scrapers do not understand context, by creating shitposts in similar format to, let’s say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it’s likely that shitposts in the same format would be generated instead, with no one being the wiser.
That would be pretty funny.
Again, this is entirely hypothetical, of course.
The new SEO model
As an SEO - I don’t want this AI crap at all in search. Leave it on its own siloed platform, please!
What’s this about shitposting? I’m just here to talk about rampart.
I knew it! So that’s what you’ve really been up to on Lemmy, @kjaeswlrejk@lemmy.ml.
Or should I say, Academy Award nominated actor Woody Harrelson?
So we should all start ending our comments with a randomly generated string of words to fuck with the models?
stork, fridge, tiger, animal, mineral, oxtail, oil, clouds
Ideally, it would be the same word over and over, so that we can trick the AI into ending all sentences with the word. Bonus points if it is the word “buffalo”, since it can from a grammatically correct sentence.
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
At least this is not “Google Is Paying Lemmy $60 Million for Fucksmith to Tell Its Lemmings to Eat Glue” otherwise I would be wondering why Lemmy Admins are excepting huge wads of cash from tech giants.
You do realize that every posted on the Fediverse is open and publicly available? It’s not locked behind some API or controlled by any one company or entity.
Fediverse is the Wikipedia of encyclopedias and any researcher or engineer, including myself, can and will use Lemmy data to create AI datasets with absolutely no restrictions.
I personally don’t have nearly as much of a problem with that than I do with Reddit making AI deals. I’m still not keen on the idea of having anything I interact with being scraped for training AI, but aside from only interacting in closed wall spaces that I or someone I trust controls, I can’t change that. That’a not great for actually interacting with the world though, so it seems that I need to accept that scraping is going to happen. Given that, I’d definitely rather be on Lemmy than Reddit.
And this way, who knows, maybe we’re on our way to the almost utopian “open digital commons”
Fediverse is the Wikipedia of encyclopedias
Isn’t Wikipedia the Wikipedia of encyclopedias?
They also highlight the fact that Google’s AI is not a magical fountain of new knowledge, it is reassembled content from things humans posted in the past indiscriminately scraped from the internet and (sometimes) remixed to look like something plausibly new and “intelligent.”
This. “AI” isn’t coming up with new information on its own. The current state of “AI” is a drooling moron, plagiarizing any random scrap of information it sees in a desperate attempt to seem smart. The people promoting AI are scammers.
Yeah, just like that x-files episode with the sushi and the theme of teaching them well.
I mean in this case it’s probably more accurately web search results being fed into an LLM and asked to summarize said results. Which if web search results were consistently good and helpful might be a useful feature instead of the thing you skip past and look for links to something useful.
Can reddit just fucking die off?
Not disagreeing with the sentiment… But how is this Reddit’s fault? This is entirely on Google.
I’m just thinking of all the really dumb shit we all said on Reddit as satire. Oh I need to go search military meme stuff!
This is why you don’t train a bot on the entire Internet and then use it to offer advice. Even if only 1% of all posts are dangerously ignorant . . . that’s a lot of dangerous ignorance.
Fortunately, this particular piece of bad advice is unlikely to poison any fool who goes through with it, since PVA glue is not considered an ingestion hazard, but “non-toxic” doesn’t mean “edible”, it just means “not going to poison you when used in the intended manner”. “Non-toxic” can still be quite dangerous if you mistake something intended as linoleum pigment for a dessert topping.
There’s also wilful and or malicious ignorance
I Googled some extremely invasive weed(creeping buttercup) and Google suggested to let it be, quoting some awful reddit comment.
I googled how to increase my blue tooth range and was told to place the devices closer to each other.
Just wait until they start scraping the chans
I expect it to create the next Qanon.
AI will continue to be shoved into every aspect of all of its products until morale improves
Stahp! I can only get so hard!
Speaking of, I found a recipe today which had to have been ai generated because the ingredient list and the directions were for completely different recipes
It’s so fucking stupid







