OpenAI has recently open-sourced its automatic transcription and translation system

mark_zuckerberg [he/him] · 2 years ago

OpenAI has recently open-sourced its automatic transcription and translation system

Kumikommunism [they/them] · edit-2 2 years ago

The transcription system is the first one to actually surprise me how good it is. I used it on a very noisy, unscripted, rambling, 20-minute video of a Japanese guy talking and it was actually perfect. The weirdest thing is that it completely removed "thinking words". Like the Japanese equivalent of um, uh, "and so, like", "what was it?", "how do I say this?" and so on. It's actually scary how good its contextual understanding is.

GrainEater [he/him] · 2 years ago

How long did it take to finish processing?

Kumikommunism [they/them] · 2 years ago

About the same length as the video on a 1070. I've been seeing people say it works at ~2x speed with newer gen graphics cards.

GrainEater [he/him] · 2 years ago

:rat-salute-2:

Kumikommunism [they/them] · 2 years ago

Oh I should say that I used the medium setting which is much slower because I wanted to see how accurate it can get. So by default it's actually much faster than that.

TheoryUnderstander [any] · edit-2 2 years ago

deleted by creator

RION [she/her] · edit-2 2 years ago

Surely as jobs are automated away, those gains in productivity will be distributed equitably so that no one is made destitute by something meant to make life easier?

... Haha, right? Right??

kristina [she/her] · 2 years ago

:marx-angry:

gaycomputeruser [she/her] · 2 years ago

It's a super big step for open source translation

crime [she/her, any] · 2 years ago

Oh this is dope. Looks like someone already put up a PR (and set up a fork of the project) that has srt support too.

I'm trying it on an episode of a Korean reality tv show that doesn't have full English subs yet, will give an update on the quality once it's done!

DoctaaMonstaa [none/use name] · 2 years ago

Is there a good tutorial on how to download and use this? Sounds awesome

crime [she/her, any] · 2 years ago

Have you used python packages before? It's distributed as one of those. (instructions in project readme under Setup) Looks like it's primarily command-line use only right now (instructions), unless you're writing a python program that includes it.

If you haven't used the command line before, what OS do you use? Might be able to walk you through it

DoctaaMonstaa [none/use name] · 2 years ago

I've used command line some, but it's been awhile. I'm using Windows 10 rn.

crime [she/her, any] · edit-2 2 years ago

Gotcha gotcha, I haven't used Windows since the mid 00s so I won't be the most helpful, but it looks like you'll need to do the following:

If you aren't using a package manager, install Chocolatey (or maybe Scoop? I'm not familiar with that one - maybe some Windows comrades can chime in on which would be better for you)
Install Python 3 and Pip if you don't have them installed

Run the commands in the Setup part of that doc:

pip install git+https://github.com/openai/whisper.git 

choco install ffmpeg # assuming you are using Chocolatey and not Scoop

Assuming everything installs properly, you can use the examples from the Command-line usage section as a starting point. I'm running whisper my-audio-file.mp3 --language Korean --task translate to translate an audio file from Korean to English.

DoctaaMonstaa [none/use name] · 2 years ago

Thank you, I'll try this later!

crime [she/her, any] · 2 years ago

No problem, good luck! :stalin-heart:

LurkerJee [none/use name] · 2 years ago

I almost dropped out of school when it turned out that I was really good at transcribing captions for tv shows. Good thing I got a useless degree instead lol.

sempersigh [he/him] · 2 years ago

I got an entire chapo episode transcribed in like 3-4 minutes with this

follow along with episode 583. For some reason the transcription doesn't start until 0:58 also i'm not going to format this and this is gonna look ugly because it doesn't differentiate between who's speaking but this is super impressive because it only took like 3-4 minutes to produce this

Shoegazer [he/him] · edit-2 2 years ago

deleted by creator

TheOwlReturns [comrade/them] · 2 years ago

Anyone know if it supports speaker diarization? Or maybe some fork of it does?

crime [she/her, any] · 2 years ago

Based on the help text it doesn't seem to yet. There's a discussion thread on github requesting it though

Shoegazer [he/him] · edit-2 2 years ago

Can this transcribe and translate songs? One of my hobbies is transcribing songs and sometimes it’s very difficult to hear

Edit: it seems like it does https://youtu.be/Ph6K_0ttsSc (cw: Chan culture)

Shoegazer [he/him] · edit-2 2 years ago

deleted by creator

OpenAI has recently open-sourced its automatic transcription and translation system

OpenAI has recently open-sourced its automatic transcription and translation system

GitHub - openai/whisper