Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

  • TankieTanuki [he/him]
    ·
    edit-2
    26 days ago

    I've been using Whisper with TankieTube and I'm curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.

    The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2.

    Snake eating its own tail, etc.