Basically every time AI tries to create its own thing, it's incrementally shittier than whatever it trained on. As more and more AI - produced content floods the internet, it's increasingly training on AI - generated material. The effect is analogous to scanning and printing the same document over and over again, where it ultimately becomes a blurry mess. AI cannot create on its own, it can only modify pre-existing human work.

The article's main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs, but isn't optimistic of this actually happening. I'm wondering if in a few years the "write TV script" button on chatGPT generates completely unworkable garbage, will studios stop trying to pretend it's a viable replacement for writing staff?

  • frankfurt_schoolgirl [she/her]
    ·
    1 year ago

    I wonder how close gpt4 and company are to having used every bit of writing in the English language as training data. Like assuming ypu downloaded the entire content of social media sites, used every e book you could find, pulled in all wikis, forums, news sites, and blogs that a web crawler could produce, the only real volume of writing that's left is private communications. At some point, Google or MS or another company with lots of communications will use every message ever sent in their systems as training data. But once you do that, you've run out.

    More training data probably brings diminishing returns, so if gpt4 has already used like 10% of all available writing then maybe even with the other 90% it won't be good enough to do what people want. Maybe in the future companies will hire vast numbers of writers just to make good content that can be used for the llms.