• SirGolan@lemmy.sdf.org
      ·
      1 year ago

      GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.