---
name: whisper-srt
description: Generate .srt / .vtt / .json subtitles from any audio or video using OpenAI Whisper (local, no API). Auto-detects language, supports translation to English, outputs word-level timestamps when needed. Optionally burns the subtitles directly into the video as a styled hardcoded track. Distinct from lyric-engine (music videos) — whisper-srt targets spoken content: interviews, podcasts, vlogs, lectures, voice memos. Use when the user asks for subtitles, captions, transcribe to SRT, "add captions", "burn subs", or wants a transcript file.
---


# whisper-srt

Spoken audio/video → subtitle files. Local Whisper, no API key.

## Basic .srt

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input interview.mp4
# → interview.srt (sidecar)
```

## All formats at once

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input clip.mp3 --formats srt,vtt,json,txt
# → clip.srt, clip.vtt, clip.json, clip.txt
```

## Translate to English

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input spanish.mp4 --task translate
# always outputs English regardless of source language
```

## Burn captions into the video

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input vlog.mp4 --burn
# → vlog_subbed.mp4 (hardcoded subtitle track, white text + black box)
```

## Word-level timestamps

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input clip.mp4 --word-timestamps --formats json
# JSON output includes per-word start/end times
```

## Pick model size

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input clip.mp4 --model small
# tiny | base | small (default) | medium | large-v3 | turbo
```

`turbo` is the speed/quality sweet spot for most languages.

## Force language

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input clip.mp4 --language es
# skip auto-detect, faster
```

## Folder mode

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input ~/podcasts --recursive --formats srt
```

## Burn-in style

```bash
python3 ~/.claude/skills/whisper-srt/scripts/srt.py --input vlog.mp4 --burn \
  --font-size 32 --font-color "FFFFFF" --bg-color "000000" --bg-alpha 0.7
```

## First-run install

Auto-installs `openai-whisper` + `torch`. First run downloads the chosen model (`small` ≈ 466MB, `turbo` ≈ 1.5GB). Cached forever after.

## Pairs well with

- `vid-rip` → rip a video → `whisper-srt --burn` → uploadable captioned clip
- `screenshot-text` (visual text) + `whisper-srt` (spoken text) → full transcript
- `lyric-engine` for music video lyrics; `whisper-srt` for everything else
