Basically every time AI tries to create its own thing, it's incrementally shittier than whatever it trained on. As more and more AI - produced content floods the internet, it's increasingly training on AI - generated material. The effect is analogous to scanning and printing the same document over and over again, where it ultimately becomes a blurry mess. AI cannot create on its own, it can only modify pre-existing human work.
The article's main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs, but isn't optimistic of this actually happening. I'm wondering if in a few years the "write TV script" button on chatGPT generates completely unworkable garbage, will studios stop trying to pretend it's a viable replacement for writing staff?
Calling it now, NFTs are gonna make a comeback because they'll be used to assign provenance to human-produced work
The researchers conclude that in a future filled with gen AI tools and their content, human-created content will be even more valuable than it is today — if only as a source of pristine training data for AI.
Nonsense, we have a bright future ahead of us! As
:soypoint-1: CONTENT CREATORS
Hooking up every living human to the matrix so I can AI generate a New Yorker article about how my trip to Colombia was subpar because the locals didn't bow to me.
You can just make an NFT of the AI generated work though. NFTs can't actually prove any real-world concept.
This isn't to say that NFT grifters won't try.
:lathe-of-heaven: :stalin-gun-1::stalin-gun-2:
Anyway what's actually going to be used to assign provenance to human-produced work is freely-accessible video of that work being created. No-one can accuse you of using generative tools if you have video proof of your brush touching canvas.
oh shit "touch canvas" is gonna be an internet meme
Yeah but they'll just make a plagiarism bot that fake video. They've already got plagiarism bots that fake the steps of drawing an image in reverse.
I think there will be an "arms race" between the generators and the verification methods, but the speedrun community for example has been dealing with this exact problem for a while and the methods of spotting fake runs are really sophisticated for the most popular games. At the very least you can ask an artist technical questions and 90% of cheaters will get weeded out because they won't be able to talk about their process.
I've been fighting against AI all my life by producing only trash content.
:data-laughing:
LLMs being fed more and more generated garbage and producing increasingly worse results would be the funniest way for the AI hype to collapse.
It's so funny that the capitalist overbosses were so excited to remove labor from the equation, only to realize that their precious new toy depends on human labor just as much as anything else
No shit? LLMs imitate (imperfectly) human writing. A LLM trained on LLM output is going to imperfectly imitate the imperfect imitation. This is called generation loss.
https://en.wikipedia.org/wiki/Generation_loss
It's a lossy compression thing
nods knowingly like how people in genesis lived 1000 years and now we die at 67
Real reason why China forces ai output to be watermarked ? :thonk:
According to so many local bazingas around here, all the chatbots have to do is chatbot off of chatbotting to "achieve liftoff" and become a time-traveling divine being. :soypoint-1: :no-mouth-must-scream: :soypoint-2:
That's specifically for "copying" the training data from projects like Open AI. Or filtering it down to only parts of the data that you want.
It's different when you're compiling a new training dataset and now have to worry about a large portion of your input data being totally garbage and ruining the model.
And that "copying" process introduces tons of errors and bugs anyways so it's only good if you are building a hyper specific LLM, like one that is meant to emulate the style of one specific artist.
The article’s main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs
Low-background steel but for plagiarism machines lmao
Can't wait for people to start asking, "What did you prompt your AI with to get that idea?" when someone says something novel that just popped into their head.
I've been using AI as a fun little writing tool since 2019 and anyone who actually uses it knows it is only as good as the writer using it.
Garbage in, garbage out.
Oh don't get me wrong, it's a fun toy and it has its uses. If I'm struggling to remember a specific word for example, I can describe the word and it'll usually realize what I mean. I can also paste difficult to understand text and it can help me figure out what's going on. It's a good tool for aiding language processing. Creating its own shit? Absolutely not.
Totally agreed on that point. I love using AI to break writers block because it pulls interesting ideas out that I never would have come up with on my own. Sort of like an electronic writers room.
AI under capitalism sucks though. It feels like the next block chain hype.
How exactly do you use it for that? Is it specific for each situation you're facing, or do you have go-to directions for it?
It depends on the model you're using. I use novel ai and it has a lore book feature that keeps track of world building. Each character, location, historical event has its own entry with tags.
You can also mess with generation parameters to limit certain words, increase/decrease generation randomness, and use modules if you want to stick with a specific author style.
This really helps me avoid writers block because the AI can refer to lore I wrote. Sometimes it brings things up that I forgot about from previous chapters. Or I can ask it for an alternative way to write something when I find that my writing is getting too repetitive.
It still needs a lot of guidance to make an interesting novel, but as an assistant it's great.
I like it to prompt me. I spend a lot of time staring at blank documents, but if a bot can throw a couple sentences at me I can get the creativity going much easier.
But the final product always ends up being 95%+ my own writing anyway. I like what the AI gives me but its form is normally pretty bland. Still useful enough that I'm subbed to Novel AI for their biggest LLM.
ChatGPT currently produces unworkable garbage if you try to get any kind of high level writing out of it. Seems a bit unnecessary to worry about "model collapse" when the models aren't actually useful yet.
https://www.youtube.com/watch?v=jmaUIyvy8E8 (warning: loud)
Basically like saving a JPEG and opening it and saving as a JPEG again x1000. You can never match the original from a lossy source applies to ideas, too, apparently.
I wonder how close gpt4 and company are to having used every bit of writing in the English language as training data. Like assuming ypu downloaded the entire content of social media sites, used every e book you could find, pulled in all wikis, forums, news sites, and blogs that a web crawler could produce, the only real volume of writing that's left is private communications. At some point, Google or MS or another company with lots of communications will use every message ever sent in their systems as training data. But once you do that, you've run out.
More training data probably brings diminishing returns, so if gpt4 has already used like 10% of all available writing then maybe even with the other 90% it won't be good enough to do what people want. Maybe in the future companies will hire vast numbers of writers just to make good content that can be used for the llms.