Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

some_guy@lemmy.sdf.org · 1 year ago

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Subverb@lemmy.world · 1 year ago

It’s okay to plagiarize books if they’re in a library.

grrgyle@slrpnk.net · 1 year ago

No you have to run them through an elaborate model first, then it’s totally legit to use someone else’s literal words as if they were your own

bitwaba@lemmy.world · 1 year ago

I mean, that’s how I got through high school. So sure.

masterspace@lemmy.ca · 1 year ago

You’re describing how human beings learn and create.

grrgyle@slrpnk.net · 1 year ago

I was actually describing a piece of software, which is not considered a human being, and can in fact be treated differently without any legal or philosophical confusion

anon_8675309@lemmy.world · 1 year ago

Hey ChatGPT , download “The Boys” for me.

orcrist@lemm.ee · 1 year ago

He spoke carelessly, but he didn’t exactly say what the author said he said. You can in fact do many things with it. Copyright doesn’t care what you do if you aren’t copying. That’s the definition of the word.

Todd Bonzalez@lemm.ee · edit-2 9 months ago

deleted by creator

interdimensionalmeme@lemmy.ml · 1 year ago

Yes, it’s streamer making a copy. You’re fine. Sharing is caring. Copyright is a mental illness.

Todd Bonzalez@lemm.ee · edit-2 9 months ago

deleted by creator

interdimensionalmeme@lemmy.ml · 1 year ago

I’m sorry but that’s what illness means. It is not an identity.

Evotech@lemmy.world · 1 year ago

Well see what the results of the music industry vs suno.ai will be

red_pigeon@lemm.ee · 1 year ago

Isn’t web scraping copying ?

orcrist@lemm.ee · 1 year ago

I think that depends how you write your web scraper. Of course the web scraper is going to load the page, just like your web browser does, which by all accounts is not an issue. What happens after the page is loaded depends on how the software is written.

masterspace@lemmy.ca · 1 year ago

No. It’s only illegal if you republish what you scrape. Absolutely nothing prevents any company from scraping the web and using that information internally.

TheReturnOfPEB@reddthat.com · 1 year ago

he gets paid a lot to not speak carelessly

interdimensionalmeme@lemmy.ml · 1 year ago

Copyright infrigment is not theft, training models is not copyright infringement either. We need a law equivalent to when an artist says “he’s inpired by someone else” . That it specifically is illegal to do that without permission if you use a machine. That will force big tech to pay a pittance for it and it will instakill all the small player.

Elias Griffin@lemmy.world · edit-2 1 year ago

Copyright Infringment strawman argument. When considering AI, we are not talking legal copyright infringement in the relationship between humans vs AI. Humans are mostly concerned with being obsoleted by Big Tech so the real issue is Intellectual Property Theft.

artificial INTELLIGENCE stole our Intellectual Property

Do you see it now?

interdimensionalmeme@lemmy.ml · 1 year ago

It’s only theft as long as you cling to the failed “copyright” model.

Big tech couldn’t steal anything if we don’t respect their property rights in the first place.

By reifying copyright under the AI paradigm, we maintain big tech’s power over us.

The truth is chatgpt belong to us. ClosedAI is just the compiler of the data.

If we finally end the failed experiment of copyright, we destroy their mote.

afraid_of_zombies@lemmy.world · 1 year ago

What I see is a system of laws that came about during the Middle Ages and have been manipulated by the powers that be to kill off any good parts of them.

We all knew copyright was broken. It was broken before my grandparents were born. It didn’t encourage artists or promise them proper income, it didn’t allow creations to gradually move into public domain. It punished all forms of innovation from player pianos to fanfiction on Tumblr.

bitchkat@lemmy.world · 1 year ago

Creating a derivative work without a license to do so would be copyright infringement.

blindbunny@lemmy.ml · 1 year ago

You’re always morally justified to steal from Microsoft

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

Microsoft AI boss Mustafa Suleyman incorrectly believes that the moment you publish anything on the open web, it becomes “freeware” that anyone can freely copy and use.

When CNBC’s Andrew Ross Sorkin asked him whether “AI companies have effectively stolen the world’s IP,” he said:

That certainly hasn’t kept many AI companies from claiming that training on copyrighted content is “fair use,” but most haven’t been as brazen as Suleyman when talking about it.

Speaking of brazen, he’s got a choice quote about the purpose of humanity shortly after his “fair use” remark:

Suleyman does seem to think there’s something to the robots.txt idea — that specifying which bots can’t scrape a particular website within a text file might keep people from taking its content.

Disclosure: Vox Media, The Verge’s parent company, has a technology and content deal with OpenAI.

The original article contains 351 words, the summary contains 139 words. Saved 60%. I’m a bot and I’m open source!

A_Very_Big_Fan@lemmy.world · 1 year ago

Look, guys! The TLDR bot is stealing!

afraid_of_zombies@lemmy.world · 1 year ago

Yeps. The same way when my coworkers talk about sports ball without the expressed permission of multiple corporations.

A_Very_Big_Fan@lemmy.world · 1 year ago

Impressive that your coworkers discuss the events exclusively by recalling 60% of the announcer’s words and then quoting them verbatim.

afraid_of_zombies@lemmy.world · 1 year ago

I am almost afraid to go down this rabbit hole but I have no idea what you are talking about.

A_Very_Big_Fan@lemmy.world · 1 year ago

I got the math the wrong way around but read the bottom of the bot’s post. The bot’s job is to cut the fluff out of articles, and it copy/pastes the remaining text for us to read here.

So my comment should have said 40%, but the point was if we’re comparing what the bot did with your coworkers talking about a game, it’d be more akin to them reciting the commentator verbatim.

afraid_of_zombies@lemmy.world · 1 year ago

I thought that even discussing the game without the express permission of the media company you used to watch and the sports league was a violation. Not sure why you are bringing commentary on commentary in it. Again not a sports ball guy but when I do hear people talk about sports they are talking about sports not the person talkimg about sports.

Vanth@reddthat.com · 1 year ago

Fair use once it’s posted on the web? Thank you very much for the framework to pirate anything and everything.

catloaf@lemm.ee · 1 year ago

fun fact, windows is posted on the web: https://www.microsoft.com/en-us/software-download/windows10

Refurbished Refurbisher@lemmy.sdf.org · edit-2 1 year ago

Microsoft would prefer that you pirate Windows rather than use Linux, as it further entrenches their dominance in the market.

They mainly make their money off of business licenses anyway, similar to Adobe and Autodesk.

There’s a reason massgravel’s scripts are hosted on Microsoft’s GitHub platform and hasn’t been taken down.

Grandwolf319@sh.itjust.works · 1 year ago

If that’s the case, then why not release a free home version??

Refurbished Refurbisher@lemmy.sdf.org · 1 year ago

They already have a free version of Windows. Just don’t activate it.

WalnutLum@lemmy.ml · 1 year ago

My one dark hope is AI will be enough of an impetus for somebody to update DMCA

ZILtoid1991@lemmy.world · 1 year ago

If that gets updated, then it will favor big corporations.

crusa187@lemmy.ml · 1 year ago

Only because our “representatives” let them write the law entirely. Imagine if Congress wasn’t filled to the brim with 80 year old fundraisers…

afraid_of_zombies@lemmy.world · 1 year ago

When is the last time a crisis resulted in a better solution for the general public?

schizo@forum.uncomfortable.business · 1 year ago

And this is why I don’t have ANY moral qualms about pirating shit: they’d do it to us in a heartbeat if there was a buck to be made.

Kairos@lemmy.today · 1 year ago

They would?? They are**

rottingleaf@lemmy.zip · 1 year ago

I had some, but not anymore.

catloaf@lemm.ee · 1 year ago

*have done

afraid_of_zombies@lemmy.world · 1 year ago

Anyone in this thread is creating derivative works and you should not be reading it without the written permission of verge.com’s parent company.

Paragone@lemmy.world · 1 year ago

Is his personal-information on the dark-web?

Is he saying that if his personal-information is on the dark-web, then it’s perfectly-OK for everybody & their robot to be using it??

XOR is he saying that there are 2 kinds of law:

1 for protecting his entitlement,

the other for disallowing rights from the lives he consumes, through his beloved herd/corporation/pseudo-person?

( obviously, he’s already answered the latter )

Buffalox@lemmy.world · 1 year ago

copying is not theft

Womble@lemmy.world · 1 year ago

Didnt you hear? We stan draconian IP laws now because AI bad.

Snot Flickerman@lemmy.blahaj.zone · edit-2 1 year ago

Is it that or is it that the laws are selectively applied on little guys and ignored once you make enough money? It certainly looks that way. Once you’ve achieved a level of “fuck you money” it doesn’t matter how unscrupulously you got there. I’m not sure letting the big guys get away with it while little guys still get fucked over is as big of a win as you think it is?

Examples:

The Pirate Bay: Only made enough money to run the site and keep the admins living a middle class lifestyle.

VERDICT: Bad, wrong, and evil. Must be put in jail.

OpenAI: Claims to be non-profit, then spins off for-profit wing. Makes a mint in a deal with Microsoft.

VERDICT: Only the goodest of good people and we must allow them to continue doing so.

The IP laws are stupid but letting fucking rich twats get away with it while regular people will still get fucked by the same rules is kind of a fucking stupid ass hill to die on.

But sure, if we allow the giant companies to do it, SOMEHOW the same rules will “trickle down” to regular people. I think I’ve heard that story before… No, they only make exceptions for people who can basically print money. They’ll still fuck you and me six ways to Sunday for the same.

I mean, the guys who ran Jetflicks, a pirate streaming site, are being hit with potentially 48 year sentences. Longer than a lot of way more serious fucking crimes. I’ve literally seen murderers get half that.

But yeah, somehow, the same rules will end up being applied to us? My ass. They’re literally jailing people for it right now. If that wasn’t the case, maybe this argument would have legs.

But AI companies? Totes okay, bro.

Grimy@lemmy.world · 1 year ago

The laws are currently the same for everyone when it comes to what you can use to train an AI with. I, as an individual, can use whatever public facing data I wish to build or fine tune AI models, same as Microsoft.

If we make copyright laws even stronger, the only one getting locked out of the game are the little guys. Microsoft, google and company can afford to pay ridiculous prices for datasets. What they don’t own mainly comes from aggregators like Reddit, Getty, Instagram and Stack.

Boosting copyright laws essentially kill all legal forms of open source AI. It would force the open source scene to go underground as a pirate network and lead to the scenario you mentioned.

Womble@lemmy.world · edit-2 1 year ago

Yes, it is a travesty that people are being hounded for sharing information, but the solution to that isn’t to lock up information tighter by restricting access to the open web and saying if you download something we put up to be freely accessed and then use it in a way we don’t like you owe us.

The solution to bad laws being applied unevenly isn’t to apply the bad laws to everyone equally, its to get rid of the bad laws.

cmhe@lemmy.world · edit-2 1 year ago

“Copying is theft” is the argument of corporations for ages, but if they want our data and information, to integrate into their business, then, suddenly they have the rights to it.

If copying is not theft, then we have the rights to copy their software and AI models, as well, since it is available on the open web.

They got themselves into quite a contradiction.

masterspace@lemmy.ca · 1 year ago

You realize that half of Lemmy is tying themselves in inconsistent logical knots trying to escape the reverse conundrum?

Copying isn’t stealing and never was. Our IP system that artificially restricts information has never made sense in the digital age, and yet now everyone is on here cheering copyright on.

BoxOfFeet@lemmy.world · 1 year ago

You wouldn’t download a car!

Buffalox@lemmy.world · 1 year ago

If copying is not theft, then we have the rights to copy their software

Nope false dichotomy, Copying copyrighted material is copyright infringement. Which is illegal.
Oversimplifying the issue makes for an uninformed debate.

cactusupyourbutt@lemmy.world · 1 year ago

any content you produce is automatically copyrighted

ZILtoid1991@lemmy.world · 1 year ago

Issue is power imbalance.

There’s a clear difference between a guy in his basement on his personal computer sampling music the original musicians almost never seen a single penny from, and a megacorp trying to drive out creative professionals from the industry in the hopes they can then proceed to hike up the prices to use their generative AI software.

GamingChairModel@lemmy.world · edit-2 1 year ago

Yeah, I’m not a fan of AI but I’m generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive’s Wayback Machine), or running user extensions on (including ad blockers). Is training an AI model all that different?

Evotech@lemmy.world · 1 year ago

You can’t be for piracy but against LLMs fair the same reason

And I think most of the people on Lemmy are for piracy,

sugar_in_your_tea@sh.itjust.works · edit-2 1 year ago

I’m not in favor of piracy or LLMs. I’m also not a fan of copyright as it exists today (I think we should go back to the 1790 US definition of copyright).

I think a lot of people here on lemmy who are “in favor of piracy” just hate our current copyright system, and that’s quite understandable and I totally agree with them. Having a work protected for your entire lifetime sucks.

masterspace@lemmy.ca · 1 year ago

The problem with copyright has nothing to do with terms limits. Those exacerbate the problem, but the fundamental problem with copyright and IP law is that it is a system of artificial scarcity where there is no need for one.

Rather than reward creators when their information is used, we hamfistedly try and prevent others from using that information so that people have to pay them to use it sometimes.

Capitalism is flat out the wrong system for distributing digital information, because as soon as information is digitized it is effectively infinitely abundant which sends its value to $0.

sugar_in_your_tea@sh.itjust.works · 1 year ago

Copyright is not a capitalist idea, it’s collectivist. See copyright in the Soviet Union, the initial bill of which was passed in 1925, right near the start of the USSR.

A pure capitalist system would have no copyright, and works would instead be protected through exclusivity (I.e. paywalls) and DRM. Copyright is intended to promote sharing by providing a period of exclusivity (temporary monopoly on a work). Whether it achieves those goals is certainly up for debate.

Long terms go against any benefit to society that copyright might have. I think it does have a benefit, but that benefit is pretty limited and should probably only last 10-15 years. I think eliminating copyright entirely would leave most people worse off and probably mostly benefit large orgs that can afford expensive DRM schemes in much the same way that our current copyright duration disproportionately benefits large orgs.

sugar_in_your_tea@sh.itjust.works · 1 year ago

Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.

TheRealKuni@lemmy.world · 1 year ago

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.

sugar_in_your_tea@sh.itjust.works · 1 year ago

Derivative works are not copyright infringement

They absolutely are, unless it’s covered by “fair use.” A “derivative work” doesn’t mean you created something that’s inspired by a work, but that you’ve modified the the work and then distributed the modified version.

Halosheep@lemm.ee · 1 year ago

My brain also takes information and creates derivative works from it.

Shit, am I also a data thief?

sugar_in_your_tea@sh.itjust.works · 1 year ago

That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that’s plagiarism and you’re a thief. If you create your own answer, it’s not.

Current AI doesn’t actually “understand” anything, and “learning” is just grabbing input data. If you ask it a question, it’s not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That’s it.

It’s a tricky line in journalism since so much of it is borrowed, and it’s likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.

petrol_sniff_king@lemmy.blahaj.zone · 1 year ago

None of those things replace that content, though.

Look, I dunno if this is legally a copyrights issue, but as a society, I think a lot of people have decided they’re willing to yield to social media and search engine indexers, but not to AI training, you know? The same way I might consent to eating a mango but not a banana.

pewgar_seemsimandroid@lemmy.blahaj.zone · 1 year ago

so we can steal Microsoft’s products?

Resol van Lemmy@lemmy.world · 1 year ago

That explains why my friend’s Xbox got stolen. It was an original Xbox, too. Holds eggs perfectly.

pewgar_seemsimandroid@lemmy.blahaj.zone · 1 year ago

i meant stealing like it all not just some random person’s version.

boatsnhos931@lemmy.world · 1 year ago

Let’s be real, they let most of it be stolen

Zacryon@lemmy.wtf · 1 year ago

Yes. Exactly. Although there isn’t much left worth stealing from Microsoft.

(This was a low-key “Microsoft bad, Linux supreme”, comment.)

(And now it’s no longer low-key.)

(I’m using a touch-screen keyboard for writing this. And yet I can’t open my doors using the keyboard. Ever wondered why that is?)

(Correct, because I forgot my keys at home and didn’t put them on my keyboard.)

(Now it’s just a –board.)

(Oral diarrhea over. Go get some guhd Linux!)

glitchdx@lemmy.world · 1 year ago

This is the year of the linux desktop!

By our powers combined, we’ll exceed 2% market share!

(no actually, please support linux. I just switched like a month ago and while it’s so much better than windows there are so many petty annoyances that will never get resolved unless more people bitch about it and that kind of support needs more users)

jabjoe@feddit.uk · 1 year ago

I can see a lot of comments against copyright here, but has anyone considered the implications of changes to copyright on copyleft?

I argue copyleft is demonstrably socially useful in locking things open. I do wonder if we’ll end up the two being different legally…

profdc9@lemmy.world · 1 year ago

In other news: we have lawyers to protect our copyrights, you don’t. Suck it.

Lumisal@lemmy.world · 1 year ago

Apparently he thinks data is like the ducks you find in the park

masterspace@lemmy.ca · 1 year ago

In that I can take a picture of them and you wouldn’t notice or be impacted by it?

Lumisal@lemmy.world · 1 year ago

You can take the ducks you know

Capitao_Duarte@lemmy.eco.br · 1 year ago

Wait, you can steal from those ducks?

Lumisal@lemmy.world · 1 year ago

No one ever tells you this, but you can just take the ducks. Just like with the city pigeons. Just make sure you don’t take a government drone by accident.

Fungah@lemmy.world · 1 year ago

They’re the same thing.

afraid_of_zombies@lemmy.world · 1 year ago

Clearly ducks you find in the park is just free meat. Why else would they be there?

Lumisal@lemmy.world · 1 year ago

To ask of you have any grapes