• SirGolan@lemmy.sdf.org
    ·
    1 year ago

    GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.