You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
The key is that Apples model isn't all that large, and that's how they're targeting being able to do it efficiently on a phone. It also sucks so IDK.