- cross-posted to:
- technology@beehaw.org
- cross-posted to:
- technology@beehaw.org
Archive link: https://archive.ph/GtA4Q
The complete destruction of Google Search via forced AI adoption and the carnage it is wreaking on the internet is deeply depressing, but there are bright spots. For example, as the prophecy foretold, we are learning exactly what Google is paying Reddit $60 million annually for. And that is to confidently serve its customers ideas like, to make cheese stick on a pizza, “you can also add about 1/8 cup of non-toxic glue” to pizza sauce, which comes directly from the mind of a Reddit user who calls themselves “Fucksmith” and posted about putting glue on pizza 11 years ago.
A joke that people made when Google and Reddit announced their data sharing agreement was that Google’s AI would become dumber and/or “poisoned” by scraping various Reddit shitposts and would eventually regurgitate them to the internet. (This is the same joke people made about AI scraping Tumblr). Giving people the verbatim wisdom of Fucksmith as a legitimate answer to a basic cooking question shows that Google’s AI is actually being poisoned by random shit people say on the internet.
Because Google is one of the largest companies on Earth and operates with near impunity and because its stock continues to skyrocket behind the exciting news that AI will continue to be shoved into every aspect of all of its products until morale improves, it is looking like the user experience for the foreseeable future will be one where searches are random mishmashes of Reddit shitposts, actual information, and hallucinations. Sundar Pichai will continue to use his own product and say “this is good.”
Speaking of, I found a recipe today which had to have been ai generated because the ingredient list and the directions were for completely different recipes
I’m just thinking of all the really dumb shit we all said on Reddit as satire. Oh I need to go search military meme stuff!
Now I only regret not *EDITING all of my Reddit posts to say complete nonsense when I deleted my account June 2023. Instead I deleted each and every post and requested a copy of my data to cost them money.
I’m sure they used a dataset from before people started editing and deleting stuff.
Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.
There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let’s do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let’s say, the internet marketing campaign of a major Hollywood blockbuster.
Since scrapers do not understand context, by creating shitposts in similar format to, let’s say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it’s likely that shitposts in the same format would be generated instead, with no one being the wiser.
That would be pretty funny.
Again, this is entirely hypothetical, of course.
So we should all start ending our comments with a randomly generated string of words to fuck with the models?
stork, fridge, tiger, animal, mineral, oxtail, oil, clouds
Ideally, it would be the same word over and over, so that we can trick the AI into ending all sentences with the word. Bonus points if it is the word “buffalo”, since it can from a grammatically correct sentence.
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
What’s this about shitposting? I’m just here to talk about rampart.
I knew it! So that’s what you’ve really been up to on Lemmy, @kjaeswlrejk@lemmy.ml.
Or should I say, Academy Award nominated actor Woody Harrelson?
The new SEO model
As an SEO - I don’t want this AI crap at all in search. Leave it on its own siloed platform, please!
At least this is not “Google Is Paying Lemmy $60 Million for Fucksmith to Tell Its Lemmings to Eat Glue” otherwise I would be wondering why Lemmy Admins are excepting huge wads of cash from tech giants.
You do realize that every posted on the Fediverse is open and publicly available? It’s not locked behind some API or controlled by any one company or entity.
Fediverse is the Wikipedia of encyclopedias and any researcher or engineer, including myself, can and will use Lemmy data to create AI datasets with absolutely no restrictions.
Fediverse is the Wikipedia of encyclopedias
Isn’t Wikipedia the Wikipedia of encyclopedias?
I personally don’t have nearly as much of a problem with that than I do with Reddit making AI deals. I’m still not keen on the idea of having anything I interact with being scraped for training AI, but aside from only interacting in closed wall spaces that I or someone I trust controls, I can’t change that. That’a not great for actually interacting with the world though, so it seems that I need to accept that scraping is going to happen. Given that, I’d definitely rather be on Lemmy than Reddit.
And this way, who knows, maybe we’re on our way to the almost utopian “open digital commons”
Money well spent.
And then they just slap small disclaimer on bottom of the page “Ai may make mistakes” and they are safe legally. I hope there will be class action lawsuit on them some day regardless.
Air Canada tried this and lost in court.
The AI gave wrong advice on a policy, person acted on it, and then Air Canada said, nah dude, the AI was wrong, tough shit.
Air Canada has been ordered to pay compensation to a grieving grandchild who claimed they were misled into purchasing full-price flight tickets by an ill-informed chatbot.
In an argument that appeared to flabbergast a small claims adjudicator in British Columbia, the airline attempted to distance itself from its own chatbot’s bad advice by claiming the online tool was “a separate legal entity that is responsible for its own actions.”
“This is a remarkable submission,” Civil Resolution Tribunal (CRT) member Christopher Rivers wrote.
“While a chatbot has an interactive component, it is still just a part of Air Canada’s website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot.”
shitpost is praxis
Lot of people not liking 404 Media, but this is the kind of reporting I want. Point out what’s going wrong. Bring it to a conversation without a lot of skew. Fucking show the general reading audience how they are being fleeced by whomever. Didn’t Vice do this at one point?
I saw this exact same “reporting” on the Verge and several other sites yesterday and earlier in the week, and without the paywall 404 has half way down reading the article.
I recall vice doing that at one time also.
They were always hit-or-miss, but we’re all worse off for them getting eaten by a hedge fund.
Isn’t 404 media the guys from Vice who left before it imploded?
https://www.nytimes.com/2023/08/22/business/media/404-media-vice-motherboard.html
Apparently so! I dunno how to remove the paywall for others I just use reader mode.
Just create an account, it’s free.
And give them my data? Nahhh
The article’s author was the Editor-in-chief of Vice’s Motherboard as stated in his bio.
Maybe. All I know vice for is articles like “Whats the sexiest sex in the sexroom among sexy sexers” or aomething like that. So the average r/askreddit post
So if they were basically regurgitating Reddit already, does that mean they were using AI before it was cool? They might have just used the Amazon approach to AI (I.e., why use technology when we can throw a bunch of minimum workers at the problem).
That is a legit trick to use when making commercials for pizza and other chain restaurant food, but not for eating…
I don’t think you read it closely. It says “non-toxic” glue. Sounds legit for eating.
Elmer’s is non-toxic.
You sound like an ad…
Lead paint was non toxic until it wasn’t…
Glad I’m old enough to have eaten it back when it wasn’t!
This is why you don’t train a bot on the entire Internet and then use it to offer advice. Even if only 1% of all posts are dangerously ignorant . . . that’s a lot of dangerous ignorance.
Fortunately, this particular piece of bad advice is unlikely to poison any fool who goes through with it, since PVA glue is not considered an ingestion hazard, but “non-toxic” doesn’t mean “edible”, it just means “not going to poison you when used in the intended manner”. “Non-toxic” can still be quite dangerous if you mistake something intended as linoleum pigment for a dessert topping.
There’s also wilful and or malicious ignorance
I Googled some extremely invasive weed(creeping buttercup) and Google suggested to let it be, quoting some awful reddit comment.
I googled how to increase my blue tooth range and was told to place the devices closer to each other.
Just wait until they start scraping the chans
I expect it to create the next Qanon.
AI will continue to be shoved into every aspect of all of its products until morale improves
Stahp! I can only get so hard!
Hey Google, when is Jenny available to meet up for kisses?