Very cool tool. I tried out the medium-size model on a Russian video, and the English subtitles that it generated were much more accurate than YouTube's autotranslated captions.

  • Kumikommunism [they/them]
    ·
    edit-2
    2 years ago

    The transcription system is the first one to actually surprise me how good it is. I used it on a very noisy, unscripted, rambling, 20-minute video of a Japanese guy talking and it was actually perfect. The weirdest thing is that it completely removed "thinking words". Like the Japanese equivalent of um, uh, "and so, like", "what was it?", "how do I say this?" and so on. It's actually scary how good its contextual understanding is.

      • Kumikommunism [they/them]
        ·
        2 years ago

        About the same length as the video on a 1070. I've been seeing people say it works at ~2x speed with newer gen graphics cards.

          • Kumikommunism [they/them]
            ·
            2 years ago

            Oh I should say that I used the medium setting which is much slower because I wanted to see how accurate it can get. So by default it's actually much faster than that.