Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.
Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.
This is fucked, you don't use a black box approach in anything high risk without human supervision. Whisper probably could be used to help accelerate a transcriptions done by an expert, maybe some sort of "first pass" that needs to be validated, but even then it might not help speed things up and might impact quality (see coding with copilot). Maybe also use the timestamp information for some filtering of the most egregious hallucinations, or a bespoke fine-tuning setup (assuming it was fine-tuned it the first place)? Just spitballing here, I should probably read the paper to see what the common error cases are.
It's funny, because this is the openAI model I had the least cynicism towards, did they bazinga it up when I wasn't looking?