It's the same thing though, no? Whatever power it takes to run a query in dedicated hardware in a data center is the same or lower than the power to do it on a cell phone. On a cell phone it's even worse because charging the battery, then using battery power to run AI queries is less efficient than just powering a GPU to run several queries in parallel. That's without getting into other efficiencies of scale and the fact that a data center is designed to keep power usage low compared to an iPhone which is designed to be the worst consumer product someone will pay $1000 for.
Not even close. A phone is lightyears more efficient than a server because it has to run on a battery. A server just needs to not outpace the air conditioning unit positioned right in front of it. Servers do a lot more per watt than say a desktop or maybe even a laptop. But phones do so much with almost no power otherwise you'd get an hour of battery life.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
It's the same thing though, no? Whatever power it takes to run a query in dedicated hardware in a data center is the same or lower than the power to do it on a cell phone. On a cell phone it's even worse because charging the battery, then using battery power to run AI queries is less efficient than just powering a GPU to run several queries in parallel. That's without getting into other efficiencies of scale and the fact that a data center is designed to keep power usage low compared to an iPhone which is designed to be the worst consumer product someone will pay $1000 for.
Not even close. A phone is lightyears more efficient than a server because it has to run on a battery. A server just needs to not outpace the air conditioning unit positioned right in front of it. Servers do a lot more per watt than say a desktop or maybe even a laptop. But phones do so much with almost no power otherwise you'd get an hour of battery life.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
The key is that Apples model isn't all that large, and that's how they're targeting being able to do it efficiently on a phone. It also sucks so IDK.