Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

AbbysMuscles [she/her] · 1 year ago

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

VernetheJules [they/them] · edit-2 1 year ago

Calling it now, NFTs are gonna make a comeback because they'll be used to assign provenance to human-produced work

AbbysMuscles [she/her] · 1 year ago

Oh that is deeply fucking cursed

VernetheJules [they/them] · 1 year ago

The researchers conclude that in a future filled with gen AI tools and their content, human-created content will be even more valuable than it is today — if only as a source of pristine training data for AI.

Nonsense, we have a bright future ahead of us! As

:soypoint-1: CONTENT CREATORS

LeninsBeard [he/him] · 1 year ago

Hooking up every living human to the matrix so I can AI generate a New Yorker article about how my trip to Colombia was subpar because the locals didn't bow to me.

UlyssesT · edit-2 11 days ago

deleted by creator

Owl [he/him] · 1 year ago

You can just make an NFT of the AI generated work though. NFTs can't actually prove any real-world concept.

This isn't to say that NFT grifters won't try.

Flyberius [comrade/them] · 1 year ago

God damn, you're lathing that into existence

ssjmarx [he/him] · 1 year ago

:lathe-of-heaven: :stalin-gun-1::stalin-gun-2:

Anyway what's actually going to be used to assign provenance to human-produced work is freely-accessible video of that work being created. No-one can accuse you of using generative tools if you have video proof of your brush touching canvas.

oh shit "touch canvas" is gonna be an internet meme

Frank [he/him, he/him] · 1 year ago

Yeah but they'll just make a plagiarism bot that fake video. They've already got plagiarism bots that fake the steps of drawing an image in reverse.

ssjmarx [he/him] · 1 year ago

I think there will be an "arms race" between the generators and the verification methods, but the speedrun community for example has been dealing with this exact problem for a while and the methods of spotting fake runs are really sophisticated for the most popular games. At the very least you can ask an artist technical questions and 90% of cheaters will get weeded out because they won't be able to talk about their process.

mayo_cider [he/him] · 1 year ago

I've been fighting against AI all my life by producing only trash content.

AbbysMuscles [she/her] · 1 year ago

:rat-salute:

Tommasi [she/her, pup/pup's] · 1 year ago

:data-laughing:

LLMs being fed more and more generated garbage and producing increasingly worse results would be the funniest way for the AI hype to collapse.

AbbysMuscles [she/her] · 1 year ago

It's so funny that the capitalist overbosses were so excited to remove labor from the equation, only to realize that their precious new toy depends on human labor just as much as anything else

AOCapitulator [they/them, she/her] · 1 year ago

They are discovering jpeg compression

wopazoo [he/him] · 1 year ago

No shit? LLMs imitate (imperfectly) human writing. A LLM trained on LLM output is going to imperfectly imitate the imperfect imitation. This is called generation loss.

motherofmonsters [she/her] · 1 year ago

Eugenics but for computers?

wopazoo [he/him] · 1 year ago

https://en.wikipedia.org/wiki/Generation_loss

It's a lossy compression thing

motherofmonsters [she/her] · 1 year ago

nods knowingly like how people in genesis lived 1000 years and now we die at 67

IceWallowCum [he/him] · 1 year ago

Is your username a reference to Maupassant?

motherofmonsters [she/her] · 1 year ago

I wish

heartheartbreak [fae/faer] · 1 year ago

Real reason why China forces ai output to be watermarked ? :thonk:

FloridaBoi [he/him] · 1 year ago

Common China W

UlyssesT · edit-2 11 days ago

deleted by creator

invalidusernamelol [he/him] · 1 year ago

That's specifically for "copying" the training data from projects like Open AI. Or filtering it down to only parts of the data that you want.

It's different when you're compiling a new training dataset and now have to worry about a large portion of your input data being totally garbage and ruining the model.

And that "copying" process introduces tons of errors and bugs anyways so it's only good if you are building a hyper specific LLM, like one that is meant to emulate the style of one specific artist.

iridaniotter [she/her] · 1 year ago

The article’s main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs

Low-background steel but for plagiarism machines lmao

Flyberius [comrade/them] · 1 year ago

If only there was some way of generating original content...

SuperZutsuki [they/them] · 1 year ago

Can't wait for people to start asking, "What did you prompt your AI with to get that idea?" when someone says something novel that just popped into their head.

Infamousblt [any] · 1 year ago

Unironically its praxis to poison this data somehow

blobjim [he/him] · 1 year ago

LLM , more like LTV

thisonethatone [he/him] · 1 year ago

I've been using AI as a fun little writing tool since 2019 and anyone who actually uses it knows it is only as good as the writer using it.

Garbage in, garbage out.

AbbysMuscles [she/her] · 1 year ago

Oh don't get me wrong, it's a fun toy and it has its uses. If I'm struggling to remember a specific word for example, I can describe the word and it'll usually realize what I mean. I can also paste difficult to understand text and it can help me figure out what's going on. It's a good tool for aiding language processing. Creating its own shit? Absolutely not.

thisonethatone [he/him] · 1 year ago

Totally agreed on that point. I love using AI to break writers block because it pulls interesting ideas out that I never would have come up with on my own. Sort of like an electronic writers room.

AI under capitalism sucks though. It feels like the next block chain hype.

IceWallowCum [he/him] · 1 year ago

How exactly do you use it for that? Is it specific for each situation you're facing, or do you have go-to directions for it?

thisonethatone [he/him] · 1 year ago

It depends on the model you're using. I use novel ai and it has a lore book feature that keeps track of world building. Each character, location, historical event has its own entry with tags.

You can also mess with generation parameters to limit certain words, increase/decrease generation randomness, and use modules if you want to stick with a specific author style.

This really helps me avoid writers block because the AI can refer to lore I wrote. Sometimes it brings things up that I forgot about from previous chapters. Or I can ask it for an alternative way to write something when I find that my writing is getting too repetitive.

It still needs a lot of guidance to make an interesting novel, but as an assistant it's great.

ssjmarx [he/him] · edit-2 1 year ago

I like it to prompt me. I spend a lot of time staring at blank documents, but if a bot can throw a couple sentences at me I can get the creativity going much easier.

But the final product always ends up being 95%+ my own writing anyway. I like what the AI gives me but its form is normally pretty bland. Still useful enough that I'm subbed to Novel AI for their biggest LLM.

maya [she/her, they/them] · 1 year ago

ChatGPT currently produces unworkable garbage if you try to get any kind of high level writing out of it. Seems a bit unnecessary to worry about "model collapse" when the models aren't actually useful yet.

SuperZutsuki [they/them] · edit-2 1 year ago

https://www.youtube.com/watch?v=jmaUIyvy8E8 (warning: loud)

Basically like saving a JPEG and opening it and saving as a JPEG again x1000. You can never match the original from a lossy source applies to ideas, too, apparently.

ChaosMaterialist [he/him] · 1 year ago

Funny you mention JPEG, because a New Yorker article ( original , archive ) makes the general case that LLMs are lossy for language in the same way JPEGs are lossy for images .

CarmineCatboy [he/him] · 1 year ago

the singularity but disappointing

frankfurt_schoolgirl [she/her] · 1 year ago

I wonder how close gpt4 and company are to having used every bit of writing in the English language as training data. Like assuming ypu downloaded the entire content of social media sites, used every e book you could find, pulled in all wikis, forums, news sites, and blogs that a web crawler could produce, the only real volume of writing that's left is private communications. At some point, Google or MS or another company with lots of communications will use every message ever sent in their systems as training data. But once you do that, you've run out.

More training data probably brings diminishing returns, so if gpt4 has already used like 10% of all available writing then maybe even with the other 90% it won't be good enough to do what people want. Maybe in the future companies will hire vast numbers of writers just to make good content that can be used for the llms.

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

The AI feedback loop: Researchers warn of ‘model collapse’ as AI trains on AI-generated content