Late last year, OpenAI announced Whisper, a new speech-to-text language model that is extremely accurate in translating many spoken languages into text. The whisper repository contains instructions for installation and use.
# Install whisper and its dependencies. pip3 install git+https://github.com/openai/whisper.git # (When needed) Update whisper. pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git # Make sure ffmpeg is installed. brew install ffmpeg # Translate speech into text. whisper my_audio_file.mp3 --language English
One thing I do quite regularly for my YouTube channel is extract the audio track, convert it to text using an online tool (I used to use Welder until they were bought out by Veed), and then hand-edit the file to fix references to product names, people, etc.