Not even close. A phone is lightyears more efficient than a server because it has to run on a battery. A server just needs to not outpace the air conditioning unit positioned right in front of it. Servers do a lot more per watt than say a desktop or maybe even a laptop. But phones do so much with almost no power otherwise you'd get an hour of battery life.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
Not even close. A phone is lightyears more efficient than a server because it has to run on a battery. A server just needs to not outpace the air conditioning unit positioned right in front of it. Servers do a lot more per watt than say a desktop or maybe even a laptop. But phones do so much with almost no power otherwise you'd get an hour of battery life.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
The key is that Apples model isn't all that large, and that's how they're targeting being able to do it efficiently on a phone. It also sucks so IDK.