Data contamination expert 👌

ElCanut@jlai.lu · 2 years ago

Data contamination expert 👌

Adalast@lemmy.world · 2 years ago

OpenAI team after including the data: why is the model suddenly even more horny, abusive, and discriminatory?

crackajack@reddthat.com · 2 years ago

Why does Spez want tocs to sell data? To buy a new yacht?

I will delete my data from Reddit then.

Uncle Roach@lemmy.world · 2 years ago

Wait how do i do that?

TwanHE@lemmy.world · 2 years ago

There were some scripts for it. But i can still find my comments and posts trough Google after deleting them.

Dont think reddit will let you take “their” (your) content away.

apemint@lemmy.world · 2 years ago

Oh, shit you’re right.

I wiped my whole profile years ago (with a script that overwrites your comments before deleting) but they’re still visible everywhere except in my profile.

Isn’t this bullshit illegal?

Beefalo@midwest.social · 2 years ago

Try out Shreddit, it’s a web app for exactly this. It even lets you filter by post karma so you can keep your hits. I’ve never used it but that’s the name that came up over on Reddit from everyone talking about the announcement.

MBM@lemmings.world · 2 years ago

Dont think reddit will let you take “their” (your) content away.

There should be a way to do it under GDPR

TwanHE@lemmy.world · 2 years ago

I filed a right to erasure request as well. Never got a response/nothing happened, but currently not in the position to take them to court over it.

benignintervention@lemmy.world · 2 years ago

I wonder how much these models are now learning from spam they were used to generate

THE MASTERMIND@feddit.ch · 2 years ago

All of them

Kbin_space_program@kbin.social · 2 years ago

Time to make a lot of wandering dwarf bots on reddit to make variations of various game phrases all over, so the LLM based bots just spout Rock And Stone and This is my favourite store on the Citadel?

Ilovethebomb@lemm.ee · 2 years ago

Thing is, you could use a bot to do nothing but post pop culture references, and it would be indistinguishable from a garden variety Redditor. Reddit is one of the worst places to train an AI.

LordOfTheChia@lemmy.world · edit-2 2 years ago

Johnson! Why the hell is your report the most unintelligible thing I’ve read since nineteen ninety eight when the undertaker threw mankind off hеll in a cell, and plummeted sixteen feet through an announcer’s table.

SatansMaggotyCumFart@lemmy.world · 2 years ago

Haha poop knife jolly rancher broken arms blowfly girl.

theneverfox@pawb.social · 2 years ago

I don’t recognize one of those… I’m not going to say which one, because I’m sure I don’t need to know

TropicalDingdong@lemmy.world · 2 years ago

I used some tools to corrupt about 10 years of comments and posts of mine.

Ragnarok314159@sopuli.xyz · 2 years ago

I think Reddit caught on to this. I tried destroying my comment history (~7 years with 600k karma) with a few of the available tool on GitHub.

Found my account permabanned next time trying to login. People should attempt to eliminate/poison as much as possible, but Reddit has all the comments and modifications in a database somewhere to sell it all to whatever AI is the highest bidder.

They have to do something to make money after taking away awards. The advertising is absolute shit and not worth the $100 entry fee.

ElCanut@jlai.lu · 2 years ago

Can’t post a genius idea like this one without posting the links of the tools

Sabin10@lemmy.world · 2 years ago

A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldn’t be here having this conversation.

prettybunnys@sh.itjust.works · 2 years ago

Most of them just do webpage stuff via browser extensions.

They just automate it

TropicalDingdong@lemmy.world · 2 years ago

A tool like that would almost definitely require api access to function. If that was still possible, most of us wouldn’t be here having this conversation.

No it didn’t use the API. You had to run it in browser and be logged in to reddit.

dependencyinjection@discuss.tchncs.de · 2 years ago

The tool I used had an extension for Firefox. You then used that Reddit extension so you could get more scrolling on your post history. Then you pressed a button and it would insert gibberish for all comments and posts. Then you’d go next page and do it again.

TropicalDingdong@lemmy.world · 2 years ago

Its not my idea, but I could probably dig up the tool I used. Dollars to donuts, it doesn’t work any more.

This might have been the tool I used. I dont think so because I overwrote everything with one message, but google around you’ll find similar.

https://github.com/adriantache/YARCO

RecallMadness@lemmy.nz · edit-2 2 years ago

This would be better if it fed the parent comment into ChatGPT prefixed with “create a plausible but factually incorrect aggressive response to <comment>”

Feed the machine to the machine!

TropicalDingdong@lemmy.world · 2 years ago

be the change you want to see

Maalus@lemmy.world · 2 years ago

If you overwrote with a single message, then your messages are back to what they were.

KnightontheSun@lemmy.world · 2 years ago

Not necessarily true. I overwrote several thousand comments with a different tool and used three different quotes on greed. I have periodically checked and about two dozen came back. I just manually changed them at that point.

VaultBoyNewVegas@lemmy.world · 2 years ago

I edited mine via a tool to say fuck Reddit and Steve Huffman is a greedy pig boy.

Octopus1348@lemy.lol · 2 years ago

What do you mean by corrupt?

PlasmaDistortion@lemm.ee · 2 years ago

I used a tool that edited my comments to replace it with gibberish. Supposedly Reddit still retains deleted comments but if you edit them, it only keeps the latest version. So by editing it you make the comments worthless.

Octopus1348@lemy.lol · 2 years ago

I also edited my comments to be basically a Lemmy ad and completely deleted the posts except in a few communities where it could be helpful in the future.

citrusface@lemmy.world · 2 years ago

What tool? I’d like to use it as well.

bobs_monkey@lemm.ee · 2 years ago

I used redact.dev

citrusface@lemmy.world · 2 years ago

Thank you

Jo Miran@lemmy.ml · 2 years ago

It replaces them with jibbersish. I did the same for my 12+ years worth.

TropicalDingdong@lemmy.world · 2 years ago

I ran a script over all of my comments (through my browser) to edit them into something about how spez had back stabbed the community. I had tens? hundreds of thousands? of comments.

It took several hours to run, but I did a forward pass (newest to oldest) and a backwards pass (oldest to newest). It bugged out because it had to run so long but I think I got it all.

I’m not sure this will really do anything because you could pretty easily statistically isolate any one who did what I did, and roll their account history back to a prior state in the training data.

Regardless, it was the least I could do on the way out the door.

teamevil@lemmy.world · 2 years ago

I simply got permabanned and my account disappeared.

mp04610@lemm.ee · 2 years ago

While that’s the correct thing to do in my opinion, it would be a mistake to assume that Reddit didn’t store your original comments.

By corrupting their dataset, you may actually be helping them recognize maliciously edited comments.

flambonkscious@sh.itjust.works · 2 years ago

Mass edits made rapidly are obviously suspect, too… If the same user edits anything more than a dozen comments in, say a minute, you have to ask what’s going on

TropicalDingdong@lemmy.world · 2 years ago

Yeah, I mean I knew that when I was doing it.

Sometimes all you can do is make a symbolic gesture that really does nothing, and even if it does nothing, you should still do it.

Probably leaving and supporting lemmy by paying for some developer fees (i’m on the patreon), posting and commenting, probably 100x more damaging to Reddit.

FeelThePower@lemmy.dbzer0.com · 2 years ago

FWIW, I requested an old reddit accounts data the other day under CCPA and all the contamination was in there. My guess is their backend updates every so often. i guess i made a good call to edit my comments and leave them there to simmer before i deleted them along with the account. perhaps this is the way?

khannie@lemmy.world · 2 years ago

it would be a mistake to assume that Reddit didn’t store your original comments.

They were fairly specific about not doing that (I’d imagine largely because of GDPR).

I deleted 10 years of “content” before I left and checked their policies. They apparently actually do properly delete from their servers.

ItsAFake@lemmus.org · 2 years ago

But the GDPR only covers European users tho.

khannie@lemmy.world · 2 years ago

That’s true but it’s far easier to globally implement rather than trying to segment. Very difficult to accurately prove a user isn’t EU resident across an entire userbase.

ItsAFake@lemmus.org · 2 years ago

That’s probably why they don’t let you access Reddit with a VPN, so they can have some idea of location.

Frozengyro@lemmy.world · 2 years ago

I’ve got a bridge in the desert I’d like to sell you.

joenforcer@midwest.social · edit-2 2 years ago

GDPR is no joke. Storing a handful of comments is not worth the penalty if they get caught.

Note that I speak from experience as part of a company that needs to comply with the regulations. We do it because the risk of violation is 10000000% not worth it no matter how annoying and arduous it is to comply.

Ensign_Crab@lemmy.world · 2 years ago

“So much for your fucking canoe!”

magnetosphere@kbin.social · 2 years ago

This is the ideal meme format. Pedro’s smile is perfect.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

I really ought to have done that.

Norgur@kbin.social · 2 years ago

we need a bot that deletes comments and replaces them with some faulty grammar yoda-speak.

Valmond@lemmy.mindoki.com · 2 years ago

They’ll just find the signal in what you’re doing. Sorry but checkmate, mate.

Flumpkin@slrpnk.net · 2 years ago

I’m pissed at reddit but I still hate searching for something and finding a post on reddit discussing it, only to find some of the posts being deleted or overwritten.

mods_are_assholes@lemmy.world · 2 years ago

Good, then the protest at least worked somewhat.

FIST_FILLET@lemmy.ml · 2 years ago

if you’re lucky, some posts have been archived on the internet archive’s wayback machine. highly recommend pinning the extension to your toolbar, it’ll show a number badge of how many times the current site has been archived :) https://addons.mozilla.org/en-US/firefox/addon/wayback-machine_new

EmperorHenry@discuss.tchncs.de · edit-2 9 months ago

deleted by creator

FIST_FILLET@lemmy.ml · 2 years ago

exactly, ”ai” right now is just a computer parrot. why settle for blurry generic versions of the art that it is digesting and shitting back out?

mods_are_assholes@lemmy.world · edit-2 2 years ago

Nailed it. The whole essence of AI is that it can make images with a variety of colors and styles, but it’s not creative or artistic by definition. At the end of the day, it’s just a bunch of numbers and equations being translated into pixels on a screen.

(This comment pasted from NovelAI with this prompt:

Please write a reply to this interrnet comment: exactly, ”ai” right now is just a computer parrot. why settle for blurry generic versions of the art that it is digesting and shitting back out?)

Fades@lemmy.world · edit-2 2 years ago

That is not the “whole essence” of it all… You are summarizing he whole piece of tech off a single use-case (image generation).

AI is MUCH more than just a picture or generator. As a software engineer I use AI for things like debugging or quickly automating some tasks

Beefalo@midwest.social · 2 years ago

This announcement is just “oh by the way, the horse is now out of the barn. He left like 10 years ago but this is the announcement.”

Shout out to whoever dismissed the first AI writings with “It’s like a perfect Redditor. Totally confident and completely full of shit, doesn’t even know that it’s lying.”

That doesn’t happen by accident. That happens when everyone was already scraping the shit out of the site, at the very least.

Poem_for_your_sprog@lemmy.world · 2 years ago

Set up a bot that just constantly posts blatantly wrong information, like “the earth is flat according to encyclopedia Britannica”, “the sky is green because it’s full or chlorophyll according to the UK foundation of science”

Vilian@lemmy.ca · 2 years ago

we need to make a repository just for that and spam reddit with it, everyone is welcome to contribute, open-source fake news

jkrtn@lemmy.ml · 2 years ago

You won’t poison the data if the bot is on there just doing the same things as the redditors.

Zink@programming.dev · 2 years ago

Or in line with current events, “we are sorry about your experience and will refund you triple.”

ILikeBoobies@lemmy.ca · 2 years ago

Why care?

It just seems they were correct in changing api prices

Sylvartas@lemmy.world · 2 years ago

I mean, yeah, but because they fully expected to sell their userbase as training data for LLMs, not because they actually care about people using bots to post wrong informations. Wouldn’t that require them caring about actual people posting wrong infos in the first place ?

byroon@lemmy.world · 2 years ago

So you’ve contaminated the training data for an LLM by spamming a public forum? Seems like everyone loses

wildginger@lemmy.myserv.one · 2 years ago

I dont lose, I get a good laugh out of watching idiots feed unreliable data to their LLMs because it was cheap

byroon@lemmy.world · 2 years ago

I mean the people using the forum who have to navigate around your spam

wildginger@lemmy.myserv.one · 2 years ago

Theyre on reddit, the spam site. I think theyre okay with a little more spam on their spam.