---
name: zine-rip
description: Rip any Internet Archive zine/magazine/book collection into searchable per-page JPEGs + auto-cropped photo regions + an OCR keyword index, optionally imported into Photos.app. Built originally for the Unsound industrial-zine archive (1983-1986, 10 issues, 579 pages, 576 cropped photos) but works on any IA item with PDF + _text.pdf companions. Use when the user sends an archive.org/details/... URL or identifier and wants the source material as image refs, not as PDFs.
---


# zine-rip

Turns any Internet Archive item containing PDFs into a clean, searchable, per-page image library — and pulls just the photographs out of each page if you want the visuals separated from text.

## What it does

Given an IA identifier like `unsound-zine` or a full URL like `https://archive.org/details/some-magazine`, the skill:

1. **Downloads** every `*.pdf` in the item (parallel) — skips the `_text.pdf` and `_djvu.txt` companions when downloading the originals
2. **Converts** each PDF to per-page JPEGs at 200 DPI quality 88 via `pdftoppm`
3. **Builds an OCR text index** by also pulling each `_text.pdf` (which IA generates automatically) and running `pdftotext -layout` on it
4. **Crops photo regions** out of each page using a PIL+numpy connected-components heuristic — dilates dark pixels into blobs, filters by area + aspect ratio, emits each photo block as its own JPEG. No OpenCV needed.
5. **Builds contact sheets** — one for the covers (page 01 of every issue) and one sampling N crops per issue
6. **Optionally imports to Photos.app** into a named album (regular album — Apple removed AppleScript write access to Shared Albums in 2018; user can promote manually if needed)
7. **Optional keyword search** across the OCR text index → maps query → (issue, page#, jpeg path) → optionally copies matching pages to a destination folder

## Quick use

```bash
# Full pipeline — download, convert, crop, build sheets
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip unsound-zine

# Just download + convert (no cropping)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip my-magazine --no-crop

# Search the text index
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "marclay"
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "borbetomagus" --copy ~/Desktop/borb_pages

# Import all crops into Photos.app under album name
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py to-photos unsound-zine --album rips

# Re-crop with looser thresholds (more crops, includes smaller blocks)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py crop unsound-zine --min-area 0.025
```

## Output layout

Every IA item gets its own dir at `~/Desktop/<identifier>/`:

```
<identifier>/
├── _pdfs/                          ← original PDFs (each issue or chapter)
├── _pdfs_text/                     ← IA's OCR _text.pdf companions
├── _text/                          ← extracted .txt per PDF
├── _photos/                        ← auto-cropped photo regions per issue
│   └── <pdf-basename>/
│       └── p##_r#.jpg
├── <pdf-basename>/                 ← per-page full JPEGs
│   └── page-##.jpg
├── _covers_contact_sheet.jpg
└── _photos_contact_sheet.jpg
```

## When to use

- User shares an archive.org link to a zine, magazine, art book, or comic
- User wants visual reference material lifted out of scanned print
- Building a local research archive of historical print media
- Feeding source material into editorial-style edit engines (sedition-haze, shadow-cast, mood-board)

## Dependencies

- `pdftoppm`, `pdftotext` (poppler — `brew install poppler`)
- `Pillow` + `numpy` (Python stdlib + standard scientific stack)
- macOS `osascript` for the `to-photos` subcommand (no-op on other platforms)

All standard. No ML deps, no API keys.

## Notes on the cropping heuristic

The cropper assumes 80s/90s zine layout: dark photographs sit in dense ink-block regions while text columns dilate into thin discontinuous strips. Tunable via `--min-area` (fraction of page area) and `--min-dim` (minimum width/height as fraction of page). Defaults work well for industrial-zine layouts with photo blocks ~10-30% of page area. For magazine layouts with smaller photos, lower `--min-area` to 0.02 or below.

For pages where text and photo are interlocked (e.g. text wrapping around a photo), the bounding box may include some surrounding text — that's actually useful for retaining context. If you want text-stripped pure photos, do a second pass through `sticker-rip` or manual crop.

## Companion skills

- `ref-rip` — for sourcing reference images directly from the live web (Pinterest, IA, etc) when you don't have a specific item to rip
- `mood-board` — collage the cropped photos into a moodboard
- `sedition-haze` / `shadow-cast` / `feverdream` — pump the crops back into video edits as overlay textures
