Just going off open source models, most of them are at least somewhat better than that out of the box, and certain strains of model have a bunch of tools built around forcing it into a more reasonable state (like modern descendants of SD1.5 - a model from two years ago - fail hard on their own 9 times out of 10, but there's a mature ecosystem of tools to control post-NAI-leak SD1.5 and cover for its flaws). The most recent big model, Flux, is terrifyingly accurate on its own although it still tends to get proportions subtly wrong in very offputting ways.
Closed source models are impenetrable and no one knows what they're doing under the hood since the public only interacts with them with prompt boxes and every company is being super secretive about whatever they're doing.
That said, I do agree that it feels like there's something fundamentally wrong with the way "AI" is currently being focused on. It's like what's being trained now are eyes (and backwards eyes that reverse the process of seeing to produce an image from how that image might be parsed) and speech centers, and there's all this hype that the speech center bits can just be made big enough to start being smart instead of just being a potentially useful bit of language processing, and I really can't help but feel like it's just a flawed overfixation on one novel bit of tech kind of like how rockets became the standard for space launches because primitive rocket tech looked neat and had already been developed.
Just going off open source models, most of them are at least somewhat better than that out of the box, and certain strains of model have a bunch of tools built around forcing it into a more reasonable state (like modern descendants of SD1.5 - a model from two years ago - fail hard on their own 9 times out of 10, but there's a mature ecosystem of tools to control post-NAI-leak SD1.5 and cover for its flaws). The most recent big model, Flux, is terrifyingly accurate on its own although it still tends to get proportions subtly wrong in very offputting ways.
Closed source models are impenetrable and no one knows what they're doing under the hood since the public only interacts with them with prompt boxes and every company is being super secretive about whatever they're doing.
That said, I do agree that it feels like there's something fundamentally wrong with the way "AI" is currently being focused on. It's like what's being trained now are eyes (and backwards eyes that reverse the process of seeing to produce an image from how that image might be parsed) and speech centers, and there's all this hype that the speech center bits can just be made big enough to start being smart instead of just being a potentially useful bit of language processing, and I really can't help but feel like it's just a flawed overfixation on one novel bit of tech kind of like how rockets became the standard for space launches because primitive rocket tech looked neat and had already been developed.