---
name: ref-rip
description: Multi-source reference image scraper. Pulls 100+ images in parallel from Internet Archive, DuckDuckGo (Google/Bing proxy), Bing Images, Pinterest, Film-Grab, FrameSet, Tumblr, Flickr, eBay, and tshirtslayer.com for any text query — e.g. "mugshot pictures of dogs", "Breaking Bad still", "vintage band tee". Each scrape uses a fresh seed for new options every run. Auto-scores by resolution + size + source bias and keeps the top N. Optional contact-sheet PNG preview.
---


# ref-rip

Multi-source image reference scraper. Hand it a text query and it returns 100+ download options pulled in parallel from ten sources, deduped, quality-scored, and saved locally with a manifest.

## Source pools (all hit in parallel)

| Source | Best for | Method |
|---|---|---|
| **Internet Archive** | Public-domain archival photos, historical scans | Official advancedsearch API |
| **DuckDuckGo Images** | Broad coverage (proxies Google + Bing) | i.js endpoint, vqd token |
| **Bing Images** | Results DDG misses | Direct async-search HTML scrape |
| **Pinterest** | Moodboard / aesthetic refs | gallery-dl primary, DDG fallback |
| **Film-Grab** | Curated cinematography stills from feature films | film-grab.com search → gallery scrape |
| **FrameSet** | AI-curated film frames (frameset.app) | DDG site-restricted search |
| **Tumblr** | Niche subculture, fashion, internet aesthetics | gallery-dl primary, DDG fallback |
| **Flickr** | Photographer portfolios, travel, nature | gallery-dl primary, DDG fallback |
| **eBay** | Product listing photos (high-res `s-l1600` upgrade) | Search HTML scrape, listing imgs only |
| **tshirtslayer.com** | Band tees, music merch, rare shirt photos | Search HTML scrape, gallery imgs |

Every run picks a fresh time-based seed unless `--seed` is given, so re-running the same query gives new options. Hits are scored on `pixels + filesize + source-bias` and the top N are kept. Sources without dimension metadata get a neutral default so they aren't unfairly outranked.

## Quick use

```bash
python3 ~/.claude/skills/ref-rip/scripts/ref_rip.py "Breaking Bad still"
```

Drops 100 images into `./refs/Breaking_Bad_still_<seed>/` plus `manifest.json`. Default sources is all ten.

## Common flags

```bash
# 200 results + contact-sheet preview
ref_rip.py "vintage gas station" -n 200 --contact-sheet

# Only Pinterest + Film-Grab + Tumblr (moodboard mode)
ref_rip.py "monochrome brutalist architecture" --sources pinterest,filmgrab,tumblr

# Only eBay + tshirtslayer (merch / product mode)
ref_rip.py "metallica vintage tour shirt" --sources ebay,tshirtslayer

# Reproducible run
ref_rip.py "found polaroids" --seed 42

# Manifest only (no downloads)
ref_rip.py "rusted machinery" --no-download --json

# Custom output dir
ref_rip.py "cathedral interior" -o ~/Refs/cathedral
```

## Output

```
refs/<query>_<seed>/
├── 000_filmgrab_There_Will_Be_Blood_a1b2c3d4.jpg
├── 001_ebay_metallica_kill_em_all_e5f6g7h8.jpg
├── 002_tshirtslayer_iron_maiden_77_9c10857f.jpg
├── 003_ia_kennel_records_1923_...
├── ...
├── manifest.json         # full metadata for every hit
└── contact_sheet.png     # if --contact-sheet
```

`manifest.json` schema per hit: `url, source, title, width, height, bytes, page_url, license, score, saved_path`.

## Source-specific notes

**Pinterest, Tumblr, Flickr** — install once for best results:
```bash
pip3 install --user gallery-dl
```
Without it, these three fall back to DDG `site:DOMAIN` queries — still works, just less direct.

**Unblock Pinterest pinimg.com / login walls — `--cookies BROWSER`:**
```bash
ref_rip.py "y2k fashion" --cookies chrome
```
Pulls cookies from your logged-in browser (chrome, safari, firefox, edge, brave, chromium) and passes them to gallery-dl. This is the only way to bypass Pinterest's CDN block + Tumblr/Flickr login walls. **Quit the browser first** — Chrome/Safari lock the cookie DB while running.

**Film-Grab** — uses film-grab.com directly. Best query style is movie-aware: `"Drive 2011 Refn"`, `"Blade Runner 2049 cinematography"`. Generic queries find fewer matching post pages.

**FrameSet** — uses DDG site-restricted search since frameset.app is a JS SPA. May return fewer hits than other sources.

**eBay** — only listing photos (`i.ebayimg.com`). Auto-upgrades thumbnail size suffix from `s-l500` to `s-l1600` for highest non-watermarked res. Great for product/merch refs.

**tshirtslayer.com** — search HTML → user-submitted shirt gallery photos. Strips `/styles/THUMB/public/` from URLs to get originals where possible.

## Tuning

- `-n / --count` — total kept (default 100)
- `--per-source` — how many each source fetches before global scoring/dedup (default `count // 2`)
- `--workers` — parallel download threads (default 8)
- `--sources` — comma list, subset of: `ia, ddg, bing, pinterest, filmgrab, frameset, tumblr, flickr, ebay, tshirtslayer`

## When to use

- Building moodboards / reference packs for an edit, photo shoot, design comp, or AI image prompt
- Hunting public-domain assets for `black-mirror`, `feverdream`, `sedition-haze`, or any video edit skill
- Source material for `mood-board`, `palette-rip`, `sticker-rip` downstream
- Hunting band-tee designs, vintage merch, product photos for design/branding

## Notes

- Internet Archive results carry license info in the manifest; other sources don't (check before redistributing)
- HTML-scrape sources (Bing, Film-Grab, eBay, tshirtslayer) can shift if those sites change their markup; the four other sources stay solid
- Pure stdlib + optional Pillow (for contact sheet) + optional gallery-dl (for Pinterest/Tumblr/Flickr)
