Wikipedia is under assault: rogue users keep posting AI generated nonsense

ForgottenFlux@lemmy.world · edit-2 1 month ago

Wikipedia is under assault: rogue users keep posting AI generated nonsense

drunkpostdisaster@lemmy.world · 1 month ago

It’s over. We lost.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 ℹ️@yiffit.net · 1 month ago

fights back by posting human-generated nonsense

OhmsLawn@lemmy.world · 1 month ago

Oh God. This is horrible news. So incredibly frustrating to hear.

randon31415@lemmy.world · 1 month ago

If anyone can survive the AI text apocalypse, it is wikipedia. They have been fending off and regulating article writing bots since someone coded up a US town article writer from the 2000 census (not the 2010 or 2020 census, the 2000 census. This bot was writing wikipedia articles in 2003)

jagged_circle@feddit.nl · 1 month ago

Well, for everything except fictional articles. Thats the hardest for them, historically

T156@lemmy.world · 1 month ago

Hopefully they tightened things up after the Scots incident.

rottingleaf@lemmy.world · 1 month ago

What Scots incident?

T156@lemmy.world · 1 month ago

A considerable number of the articles written in Scots weren’t written in Scots. The most prolific writer of the Scots articles was an American teen with no knowledge of Scots, and was more or less just writing them in a Scottish accent.

rottingleaf@lemmy.world · 1 month ago

Yep, I recalled that.

A certain amount of Russians with Cossack roots would do this with Ukrainian on the web, causing a bit less butthurt because TBH a lot of Ukrainians don’t speak in any way proper Ukrainian, but a mix of Ukrainian and Russian, and a lot of the rest talk dialects still different from standard.

rottingleaf@lemmy.world · 1 month ago

Ah. That.

kibiz0r@midwest.social · 1 month ago

Unleashing generative AI on the world was basically the information equivalent of jumping headfirst into Kessler Syndrome.

khannie@lemmy.world · 1 month ago

For the uninitiated like me:

The Kessler syndrome (also called the Kessler effect,[1][2] collisional cascading, or ablation cascade), proposed by NASA scientists Donald J. Kessler and Burton G. Cour-Palais in 1978, is a scenario in which the density of objects in low Earth orbit (LEO) due to space pollution is numerous enough that collisions between objects could cause a cascade in which each collision generates space debris that increases the likelihood of further collisions.

Wikipedia link.

kibiz0r@midwest.social · 1 month ago

Good call, thank you.

Also: Referencing Wikipedia in this context is kinda funny.

khannie@lemmy.world · 1 month ago

I did think that. :) It’s just… So good. I hope it never enshitifies. God help us.

narc0tic_bird@lemm.ee · 1 month ago

Best case is that the model used to generate this content was originally trained by data from Wikipedia so it “just” generates a worse, hallucinated “variant” of the original information. Goes to show how stupid this idea is.

Imagine this in a loop: AI trained by Wikipedia that then alters content on Wikipedia, which in turn gets picked up by the next model trained. It would just get worse and worse, similar to how converting the same video over and over again yields continuously worse results.

8uurg@lemmy.world · 1 month ago

A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)

Blaster M@lemmy.world · 1 month ago

AI model makers are very well aware of this and there is a move from ingesting everything to curating datasets more aggressively. Data prep ia something many upstarts have no idea is critical, but everyone is learning about, sometimes the hard way.

Zorque@lemmy.world · 1 month ago

Every article would end up being the philosophy page.

huginn@feddit.it · 1 month ago

See also: model collapse

(Which is more or less just regression towards the mean with more steps)

Captain Aggravated@sh.itjust.works · 1 month ago

Eventually every article just reads “Delve delve delve delve delve delve delve.”

Wrench@lemmy.world · 1 month ago

Yes, this is what many of us worry will become the internet in general. AI content generated on from AI trained on AI garbage.

AI bots can trivially outpace humans.

kboy101222@sh.itjust.works · 1 month ago

I was just discussing with a friend of mine how we’re rapidly approaching the dead internet. At some point, many websites will likely just be chat bots talking to other chat bots, which then gets used to train further chat bots. Human made content is already becoming harder and harder to find on algorithm heavy websites like Reddit and facebooks suite of sites. The bots can easily outpace any algorithmic changes they might make to help deter them, but my fb using family members all constantly block those weird Jesus accounts and they still show up constantly

vext01@lemmy.sdf.org · 1 month ago

Slop!

RubberDuck@lemmy.world · 1 month ago

Require someone that wants to add stuff to pay a small amount to the Wikimedia Foundation for activating their account and refund it if they moderate a certain amount.

aubertlone@lemmy.world · 1 month ago

Yeah I mean I’ve had minor edits reversed because I didn’t source the fact properly

And that was like 10 years ago I’m surprised these edits are getting through in the first place

Shdwdrgn@mander.xyz · 1 month ago

Seems like that would be an easy problem to solve… require all edits to have a peer review by someone with a minimum credibility before they go live. I can understand when Wikipedia was new, allowing anyone to post edits or new content helped them get going. But now? Why do they still allow any random person to post edits without a minimal amount of verification? Sure it self-corrects given enough time, but meanwhile what happens to all the people looking for factual information and finding trash?

sugar_in_your_tea@sh.itjust.works · 1 month ago

Or at least give it a certain amount of time before it goes live. So if nobody comes around to approve it in 24 hours, it goes live.

Usually bad edits are corrected within hours, if not minutes, so that should catch the lion’s share w/o bogging down the approval queue too much.

RubberDuck@lemmy.world · 1 month ago

Croudsourcing is the strenght that led to the vast resource and also the weakness as displayed here. So probably there will be a need for some form of barrier. Hence my suggestion.

oldfart@lemm.ee · 1 month ago

Link it to your real identity, brilliant idea 👌

Daemon Silverstein@thelemmy.club · 1 month ago

Dead Internet Theory all over again.

Kalkaline @leminal.space · 1 month ago

Lock it down make people get confirmed editing access face to face with trusted users.

e$tGyr#J2pqM8v@feddit.nl · edit-2 1 month ago

Sabotage Wikipedia, Ddos the Internet Archive. Makes you wonder if in the future we’re going to forget our past. Will actual history be obscured in a sea of alternative histories unrecognizably presented as the same thing. Maybe we need to keep some books laying around in archives just to be sure.

TachyonTele@lemm.ee · edit-2 1 month ago

The digital dark age will be a real thing, absolutely.

Interesting idea on a sea of alternative histories. That might be a possible threat.
Someone else here called it “AI text apocalypse”. I like that term.

endofline@lemmy.ca · 1 month ago

We have still Anna’s archive, scihub, libgen and old fashion traditional libraries ( including the national ). National libraries won’t disappear in the nearest years, maybe will rotten due to defunding but still they will exist

lolola@lemmy.blahaj.zone · 1 month ago

I hate to post because I have loved and trusted Wikipedia for years, but the fact that there are folks out there who equally trust what AI tools generate just baffles me.

Dragonstaff@leminal.space · 1 month ago

The signal to noise ratio is so low these days. There’s so much information out there but everyone wants to profit from you before you can get it. Even worse, the people with good information generally can’t buy as big a megaphone as the people who profit from lying to you.

Honestly, I think humans have been more likely to believe an easy lie than a hard truth all along, but it’s easier than ever these days.

YeetPics@mander.xyz · 1 month ago

Damn Putin, you’re retarded.

foenkyfjutschah@programming.dev · 1 month ago

the name is Altman.

Aatube@kbin.melroy.org · 1 month ago

Don’t worry, it’s not as bad as the title suggests. The attack on Internet Archive is far, far worse. It’s obviously a bit of a problem, though.We

CALIGVLA@lemmy.dbzer0.com · 1 month ago

We what? WE WHAT?!

Aatube@kbin.melroy.org · 1 month ago

oops

(I accidentally added an extra “we” to the end at first)

WhatsHerBucket@lemmy.world · 1 month ago

This is why we can’t have nice things

e$tGyr#J2pqM8v@feddit.nl · 1 month ago

We still have it, and it’s quite nice.