What is it about ai art that makes it so recognisable? (Beyond obvious artifacts like fucked up hands or writing)

CliffordBigRedDog [he/him] · 4 days ago

What is it about ai art that makes it so recognisable? (Beyond obvious artifacts like fucked up hands or writing)

drhead [he/him] · 3 days ago

There's usually going to be a hegemonic style for AI art, since for most people making this stuff they're just going to put some vague keywords for a direction of the style then stuff the rest of the prompt with quality keywords. Often times hosted inference services will actually do the quality keyword stuffing for you or train in a house style. Whatever you don't specify is going to be filled in with essentially the model average (which is, of course, not going to be a representative average image, it's going to be the average of the "preferred" set for their preference optimization training). Practically nobody asks for mediocre images (because why would you), and people making models especially on hosted services often effectively won't let you.

Think of what you'd expect to get from requesting an image of "a beautiful woman". There's certainly a lot of different ideas that people have of which women are beautiful and what traits make a woman beautiful, across different individuals and especially across different cultures and time periods. But if you take a set of every picture that someone thought of as having a beautiful woman in it, and look at the mode of that distribution, it's going to settle on conventionally attractive by the standards of whatever group is labeling the images. And the same thing will happen with an AI model, training on those images labeled as "a beautiful woman" will shift its output towards conventionally attractive women. If you consider it as a set of traits contributing to conventional attractiveness, then it's also fairly likely that every "a beautiful woman" image will end up looking like a flawless supermodel, since the mode will be a woman with all of the most common traits in the "a beautiful woman" dataset. That often won't look natural, because we're not used to seeing flawless supermodels all of the time.

That's more or less what is happening when people make these AI images, but with the whole image and its style. The set of images labeled as "high quality" or whatever quality keyword, or that are in their preference optimization set, have attributes that are more common in those images than they are in other images. Those attributes end up becoming dominant and a lot of them will show up in a generated image stuffed with quality keywords or on a heavily DPO-tuned model, which may look unnatural when a typical good-looking natural image may have only a few of those traits. And the problem is exacerbated by each model having its own default flavor, and people heavily reusing the same sets of quality keywords, and I would honestly fully expect that I could pin part of it on how some text encoders work (CLIP's embeddings are hard to separate distinct concepts from and this does manifest in how images are generated, but a lot of recent popular models don't use CLIP so this doesn't necessarily always apply).