Very cool tool. I tried out the medium-size model on a Russian video, and the English subtitles that it generated were much more accurate than YouTube's autotranslated captions.

  • Kumikommunism [they/them]
    ·
    edit-2
    2 years ago

    The transcription system is the first one to actually surprise me how good it is. I used it on a very noisy, unscripted, rambling, 20-minute video of a Japanese guy talking and it was actually perfect. The weirdest thing is that it completely removed "thinking words". Like the Japanese equivalent of um, uh, "and so, like", "what was it?", "how do I say this?" and so on. It's actually scary how good its contextual understanding is.

      • Kumikommunism [they/them]
        ·
        2 years ago

        About the same length as the video on a 1070. I've been seeing people say it works at ~2x speed with newer gen graphics cards.

          • Kumikommunism [they/them]
            ·
            2 years ago

            Oh I should say that I used the medium setting which is much slower because I wanted to see how accurate it can get. So by default it's actually much faster than that.

    • RION [she/her]
      ·
      edit-2
      2 years ago

      Surely as jobs are automated away, those gains in productivity will be distributed equitably so that no one is made destitute by something meant to make life easier?

      ... Haha, right? Right??

  • crime [she/her, any]
    ·
    2 years ago

    Oh this is dope. Looks like someone already put up a PR (and set up a fork of the project) that has srt support too.

    I'm trying it on an episode of a Korean reality tv show that doesn't have full English subs yet, will give an update on the quality once it's done!

    • crime [she/her, any]
      ·
      2 years ago

      Have you used python packages before? It's distributed as one of those. (instructions in project readme under Setup) Looks like it's primarily command-line use only right now (instructions), unless you're writing a python program that includes it.

      If you haven't used the command line before, what OS do you use? Might be able to walk you through it

        • crime [she/her, any]
          ·
          edit-2
          2 years ago

          Gotcha gotcha, I haven't used Windows since the mid 00s so I won't be the most helpful, but it looks like you'll need to do the following:

          1. If you aren't using a package manager, install Chocolatey (or maybe Scoop? I'm not familiar with that one - maybe some Windows comrades can chime in on which would be better for you)

          2. Install Python 3 and Pip if you don't have them installed

          3. Run the commands in the Setup part of that doc:

            pip install git+https://github.com/openai/whisper.git 
            
            choco install ffmpeg # assuming you are using Chocolatey and not Scoop
            
          4. Assuming everything installs properly, you can use the examples from the Command-line usage section as a starting point. I'm running whisper my-audio-file.mp3 --language Korean --task translate to translate an audio file from Korean to English.

  • LurkerJee [none/use name]
    ·
    2 years ago

    I almost dropped out of school when it turned out that I was really good at transcribing captions for tv shows. Good thing I got a useless degree instead lol.

  • Shoegazer [he/him]
    ·
    edit-2
    2 years ago

    Can this transcribe and translate songs? One of my hobbies is transcribing songs and sometimes it’s very difficult to hear

    Edit: it seems like it does https://youtu.be/Ph6K_0ttsSc (cw: Chan culture)