Transcribing recorded audio and video to text using Whisper AI on a Mac

Late last year, OpenAI announced Whisper, a new speech-to-text language model that is extremely accurate in translating many spoken languages into text. The whisper repository contains instructions for installation and use.


# Install whisper and its dependencies.
pip3 install git+ 

# (When needed) Update whisper.
pip3 install --upgrade --no-deps --force-reinstall git+

# Make sure ffmpeg is installed.
brew install ffmpeg

# Translate speech into text.
whisper my_audio_file.mp3 --language English

One thing I do quite regularly for my YouTube channel is extract the audio track, convert it to text using an online tool (I used to use Welder until they were bought out by Veed), and then hand-edit the file to fix references to product names, people, etc.

How to transcribe audio to text using Dictation on a Mac

You can use the Dictation feature built into your Mac to transcribe audio files, and in my experience, it's been about 98-99% accurate, so it saves a lot of time if you want to index your audio files, or you need a transcript for some other purpose.

These instructions were last updated for macOS Monterey 12.4.

First, open up System Preferences, go to Keyboard, then the 'Dictation' tab:

Apple Dictation System Preferences

Turn on Dictation, and when prompted, accept the terms for Apple's Dictation service. Also take note of the 'Shortcut' (e.g. 'press dictation key' or 'press control twice'. You'll use that to activate dictation later.