---
name: clip-search
description: Semantic search inside any video file. Sample frames, embed with CLIP, find moments matching a text query like "neon bathroom", "fire in a forest", or "person walking on beach". Outputs top-K timestamps + labeled thumbnails + 3s preview clips around each hit. Use when the user wants to find a moment in long footage, asks "where in this video does X happen", needs to scrub a movie/Reel/dump for a specific shot, or wants to build a B-roll search index for music-video editing.
---


# clip-search

Find moments in a video by *describing* them. CLIP under the hood.

## Basic search

```bash
python3 ~/.claude/skills/clip-search/scripts/search.py \
  --video movie.mp4 \
  --query "neon bathroom" \
  --top 5
# → movie_search/  (1_00-02-14_0.312.jpg + 1_00-02-14.mp4, 2_..., ...)
```

## Tune sampling

```bash
python3 ~/.claude/skills/clip-search/scripts/search.py \
  --video set.mp4 --query "fire" --every 0.5 --top 10
# samples a frame every 0.5s — denser, slower, more accurate
```

## Skip preview clips

```bash
python3 ~/.claude/skills/clip-search/scripts/search.py \
  --video clip.mp4 --query "skull" --no-preview
```

## Flags

- `--video` — input file
- `--query` — free-text description
- `--top` — number of matches (default 5)
- `--every` — sample interval in seconds (default 1.0)
- `--output` — output folder (default `<video>_search`)
- `--no-preview` — skip the 3s preview MP4s, only thumbnails
- `--model` — CLIP variant (default `ViT-B-32`)

## How it works

1. ffmpeg samples a frame every `--every` seconds
2. open_clip embeds each frame + the query text
3. cosine similarity ranks frames
4. Top-K → thumbnails + 3s preview clips centered on each timestamp

## First-run install

Install CLIP deps once:

```bash
pip install --break-system-packages open-clip-torch torch torchvision pillow
```

First search downloads the CLIP model weights (~340MB for `ViT-B-32`). Cached forever after that.

## Pairs well with

- `clip-hunter` — search the freshly-downloaded clip pack for the exact shot
- `music-video` — find every "rain" moment across hours of footage in seconds
- `gif-board` — board the hits to pick the best one
