Basically every time AI tries to create its own thing, it's incrementally shittier than whatever it trained on. As more and more AI - produced content floods the internet, it's increasingly training on AI - generated material. The effect is analogous to scanning and printing the same document over and over again, where it ultimately becomes a blurry mess. AI cannot create on its own, it can only modify pre-existing human work.
The article's main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs, but isn't optimistic of this actually happening. I'm wondering if in a few years the "write TV script" button on chatGPT generates completely unworkable garbage, will studios stop trying to pretend it's a viable replacement for writing staff?
According to so many local bazingas around here, all the chatbots have to do is chatbot off of chatbotting to "achieve liftoff" and become a time-traveling divine being. :soypoint-1: :no-mouth-must-scream: :soypoint-2:
That's specifically for "copying" the training data from projects like Open AI. Or filtering it down to only parts of the data that you want.
It's different when you're compiling a new training dataset and now have to worry about a large portion of your input data being totally garbage and ruining the model.
And that "copying" process introduces tons of errors and bugs anyways so it's only good if you are building a hyper specific LLM, like one that is meant to emulate the style of one specific artist.