OpenAI Says It's Fine to Vacuum Up Everyone's Content and Charge for It Without Paying Them

Yuritopiaposadism [none/use name] · 10 months ago

OpenAI Says It's Fine to Vacuum Up Everyone's Content and Charge for It Without Paying Them

DragonBallZinn [he/him] · 10 months ago

Everyone's in support of intellectual property rights until the poors don't want to share with the rich.

Infamousblt [any] · 10 months ago

So it's fine if I use OpenAIs content for free without attribution right? That's the same thing? Glad they gave us permission

JohnBrownNote [comrade/them, des/pair] · 10 months ago

~~only if you run it through your own LLM~~

"ai" (and none of this shit is AI, they should have to change their name) "works" aren't copyrightable so go nuts

Awoo [she/her] · edit-2 10 months ago

If ai will regurgitate its training data then you can perform copyright-laundering via this one neat loophole.

We can move literally the entire internet (which is basically all in their training data) into the public domain.

JohnBrownNote [comrade/them, des/pair] · 10 months ago

unfortunately i think these things don't keep the training set, just the set of associations and relations it made by analyzing it

Awoo [she/her] · 10 months ago

Not true, they will completely and totally replicate their training data. The companies try to prevent this so the method to get it to happen regularly changes, but they do it.

Chatgpt: https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html

Image AIs: https://techcrunch.com/2022/12/13/image-generating-ai-can-copy-and-paste-from-training-data-raising-ip-concerns/?guccounter=1

I'm not saying this would work and you won't get in trouble for doing it. But it would fuck the system just a little bit.

JohnBrownNote [comrade/them, des/pair] · 10 months ago

oh wow that 's great lol

BeamBrain [he/him] · 10 months ago

Their argument basically boils down to "Our business can't make money without it." Unusually honest of capitalists to say "Who cares what the law is, all that matters is that it's profitable."

jonne@infosec.pub · edit-2 10 months ago

I mean, isn't it basically an automated way of doing the thing that was always legal within current copyright laws? You read a few books/articles on a subject, and you write your own content based on the information you ingested as part of that research? You've never needed to pay anyone to do this in the past.

The points about attribution depend mostly about the context: in academia or when writing a book, you attribute, in other contexts (like talks), you generally don't. And either way those aren't legal requirements except for certain Creative Commons licences.

I'm not necessarily against updating laws to deal with this issue, but we have to be careful about not undermining fair use provisions, which are already under attack by automated systems that can't tell whether something's being used under a fair use provision or not.

WithoutFurtherBelay · 10 months ago

You read a few books/articles on a subject, and you write your own content based on the information you ingested as part of that research? You've never needed to pay anyone to do this in the past.

Maybe this is the same as LLM if you have no lived, internal experience

KobaCumTribute [she/her] · edit-2 10 months ago

The problem is that increasing the overreach of copyright further like that is bad for everyone and does nothing to curtail the actual problems of AI generation.

The problem of AI content generation is that it's an infinite vapid slop factory that spouts gibberish forever and without any sort of sanity checking, and whether it correctly paid the corporate owners of the various platforms the generators' creators scraped or not is irrelevant. All that achieving "you must acquire an explicit license to authorize AI training on a given body of data" means is that a bunch of oligarch ghouls get a payday and the AI is only trained on corporate owned data by big AI corporations that can work out deals with them (or you get a bunch of in-house shit like how they all want their own streaming services).

The only solution is to make AI generated content a poison pill for copyright in general, regardless of whether that "makes sense" or "has precedent": any work that contains generative AI content becomes uncopyrightable and all licenses attached to it become public domain. Protect silly hobbyist work while harshly punishing corporate use.

To put it bluntly, if something like DisneyExtraGeneratorAI makes a crowd scene it doesn't matter if it made them from scraping stock photos or by paying Disney adults in ride fastpass priority and a coupon for a slightly discounted $1000 cocktail at their resort for some full body scans, the problem isn't the licensing it's the fact that they're replacing actors and getting an engine they can further profit from by licensing. It's a new sort of enclosure whether they're paying to do it or not, and the only way to stop it is to make it impossible to profit from using.

comrade_pibb [comrade/them] · 10 months ago

make it impossible to profit from using

which is why this will never ever happen

jonne@infosec.pub · 10 months ago

Thanks for engaging with the content of the post instead of doing a lazy at hominem that doesn't even attempt to refute anything I said.

ShimmeringKoi [comrade/them] · edit-2 10 months ago

Pointing out that what you said demonstrates a complete ignorance of how LLMs are used does not constitute an ad hominem. And even if it had, who gives a shit, this isn't high school debate club. If they want to call you a dork, no hall monitor is going to pull them up on it.

Maoo [none/use name] · 10 months ago

I mean, isn't it basically an automated way of doing the thing that was always legal within current copyright laws?

Nope. You are smarter and more creative than an LLM. LLMs don't understand their material they just copy patterns and do substitutions.

It's more like doing a good job at plagiarism. Take 3 sources about the idea and copy them, then switch up the words. Also just add some random sentences of dubious quality that sound right but who knows they're probably lies. But they sure do look like things humans wrote in the neighborhood of this topic.

jonne@infosec.pub · edit-2 10 months ago

That's my point, if you do exactly that as a human, you're fine from a copyright standpoint. The LLM created a new work, it's quoting existing work under fair use, etc.

It's completely fraudulent in an academic setting, and it's a way to get around anyone getting paid for their original labour, but that's not what we're talking about.