• KnilAdlez [none/use name]
    ·
    2 months ago

    Exactly! PCs today are powerful enough to run them in decent time without acceleration too, it would just be more efficient to have it, ultimately saving time and energy. I would be interested in seeing how much processing power is wasted to calculate what are effectively edge cases in a models real work load. What percentage of GPT-4 queries could not be answered accurately by GPT-3 or a local LLaMA model? I'm willing to bet it's less than 10%. Terawatt-hours and hundreds of gallons of water to run a model that, for 90% of users, could be ran locally.