- cross-posted to:
- opensource@lemmy.ml
Very cool tool. I tried out the medium-size model on a Russian video, and the English subtitles that it generated were much more accurate than YouTube's autotranslated captions.
The transcription system is the first one to actually surprise me how good it is. I used it on a very noisy, unscripted, rambling, 20-minute video of a Japanese guy talking and it was actually perfect. The weirdest thing is that it completely removed "thinking words". Like the Japanese equivalent of um, uh, "and so, like", "what was it?", "how do I say this?" and so on. It's actually scary how good its contextual understanding is.
About the same length as the video on a 1070. I've been seeing people say it works at ~2x speed with newer gen graphics cards.
Oh I should say that I used the medium setting which is much slower because I wanted to see how accurate it can get. So by default it's actually much faster than that.
Surely as jobs are automated away, those gains in productivity will be distributed equitably so that no one is made destitute by something meant to make life easier?
... Haha, right? Right??
Oh this is dope. Looks like someone already put up a PR (and set up a fork of the project) that has srt support too.
I'm trying it on an episode of a Korean reality tv show that doesn't have full English subs yet, will give an update on the quality once it's done!
Is there a good tutorial on how to download and use this? Sounds awesome
Have you used python packages before? It's distributed as one of those. (instructions in project readme under Setup) Looks like it's primarily command-line use only right now (instructions), unless you're writing a python program that includes it.
If you haven't used the command line before, what OS do you use? Might be able to walk you through it
I've used command line some, but it's been awhile. I'm using Windows 10 rn.
Gotcha gotcha, I haven't used Windows since the mid 00s so I won't be the most helpful, but it looks like you'll need to do the following:
-
If you aren't using a package manager, install Chocolatey (or maybe Scoop? I'm not familiar with that one - maybe some Windows comrades can chime in on which would be better for you)
-
Install Python 3 and Pip if you don't have them installed
-
Run the commands in the Setup part of that doc:
pip install git+https://github.com/openai/whisper.git choco install ffmpeg # assuming you are using Chocolatey and not Scoop
-
Assuming everything installs properly, you can use the examples from the Command-line usage section as a starting point. I'm running
whisper my-audio-file.mp3 --language Korean --task translate
to translate an audio file from Korean to English.
-
I almost dropped out of school when it turned out that I was really good at transcribing captions for tv shows. Good thing I got a useless degree instead lol.
I got an entire chapo episode transcribed in like 3-4 minutes with this
follow along with episode 583. For some reason the transcription doesn't start until 0:58 also i'm not going to format this and this is gonna look ugly because it doesn't differentiate between who's speaking but this is super impressive because it only took like 3-4 minutes to produce this
Anyone know if it supports speaker diarization? Or maybe some fork of it does?
Based on the help text it doesn't seem to yet. There's a discussion thread on github requesting it though
Can this transcribe and translate songs? One of my hobbies is transcribing songs and sometimes it’s very difficult to hear
Edit: it seems like it does https://youtu.be/Ph6K_0ttsSc (cw: Chan culture)