2026-04-11
annotate-video-cli: turning mp4 files into Markdown
I built a CLI tool that transcribes video files and YouTube videos into clean Markdown, using either the OpenAI Whisper API or a fully offline WhisperKit backend on Apple Silicon.
During a Dropsolid webinar I recorded an mp4. I needed the text. I wanted to use Whisper for the transcription, but also the option to run it locally on my Mac without sending anything to the cloud. I couldn't find a tool that did both. So I built one.
What it does
annotate-video-cli is a command-line tool that takes a video file (or a YouTube URL) and outputs a clean Markdown document with the transcription.
annotate transcribe --file ~/Downloads/webinar.mp4
# → ~/Downloads/2026-04-10-ANNOTATION-webinar.md
Or for YouTube:
annotate transcribe-yt "https://youtu.be/abc123"
Two backends
This was the core reason I built it. You pick your backend:
- API mode - sends audio to the OpenAI Whisper API. Fast, easy, billed per minute.
- Local mode - runs WhisperKit CoreML models on Apple Silicon. Fully offline, free, no API key needed.
You can set a default in .env or override it per run with --local.
Other useful things it does
- Supports mp4, mov, mkv, avi, webm, m4v and anything ffmpeg can read
- Optional timestamps with
--timestamps - Multilingual: pass
--language nlor let Whisper auto-detect - Long video support: files over 25 MB are split into chunks and stitched back together in API mode
- Auto-named output files:
YYYY-MM-DD-ANNOTATION-filename.md
Why Markdown?
Because that's what I actually need. Not a raw transcript, not a subtitle file, but something I can paste into a document, summarise, or feed to another tool. Markdown is the right format for that.
Requirements
macOS on Apple Silicon (M1/M2/M3/M4), Homebrew, Python 3.9+. The install script handles ffmpeg, yt-dlp and whisperkit-cli automatically.
Get it
github.com/woutersf/annotate-video-cli-local
Pull requests welcome.