ai

A PCIe Coral TPU FINALLY works on Raspberry Pi 5

Coral.ai TPUs are AI accelerators used for tasks like machine vision and audio processing. Raspberry Pis are often integrated into small robotics and IoT products—or used to analyze live video feeds with Frigate.

Until today, nobody I know of has been able to get a PCI Express Coral TPU working on the Raspberry Pi. The Compute Module 4, unfortunately, had some quirks in its PCIe implementation, preventing the use of the Coral over PCIe.

Google Coral TPU running over PCIe on Raspberry Pi 5

The Raspberry Pi 5 has a much improved PCIe bus—capable of reaching Gen 3 speeds even!—and I've already tested the first PCIe NVMe HATs for Pi 5.

So can the Pi 5 handle the Coral TPU natively over PCIe?

Yes. Though currently, you need to tweak a few things to get it working.

Testing the Coral TPU Accelerator (M.2 or PCIe) in Docker

Google Coral TPU in PCIe carrier

I recently tried setting up an M.2 Coral TPU on a machine running Debian 12 'Bookworm', which ships with Python 3.11, making the installation of the pyCoral library very difficult (maybe impossible for now?).

Some of the devs responded 'just install an older Ubuntu or Debian release' in the GitHub issues, as that would give me a compatible Python version (3.9 or earlier)... but in this case I didn't want to do that.

Transcribing recorded audio and video to text using Whisper AI on a Mac

Late last year, OpenAI announced Whisper, a new speech-to-text language model that is extremely accurate in translating many spoken languages into text. The whisper repository contains instructions for installation and use.

tl;dr:

# Install whisper and its dependencies.
pip3 install git+https://github.com/openai/whisper.git 

# (When needed) Update whisper.
pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

# Make sure ffmpeg is installed.
brew install ffmpeg

# Translate speech into text.
whisper my_audio_file.mp3 --language English

One thing I do quite regularly for my YouTube channel is extract the audio track, convert it to text using an online tool (I used to use Welder until they were bought out by Veed), and then hand-edit the file to fix references to product names, people, etc.