zine-rip

BY @GUALO — 16 DOWNLOADS — CONTENT

Rip any Internet Archive zine/magazine/book collection into searchable per-page JPEGs + auto-cropped photo regions + an OCR keyword index, optionally imported into Photos.app. Built originally for the Unsound industrial-zine archive (1983-1986, 10 issues, 579 pages, 576 cropped photos) but works on any IA item with PDF + _text.pdf companions. Use when the user sends an archive.org/details/... URL or identifier and wants the source material as image refs, not as PDFs.

\u2606 0 DOWNLOAD ALL (.ZIP) SKILL.MD TAR.GZ V1.0.0

CLI INSTALL

curl -sS https://dem0n.vip/s/gualo/zine-rip/SKILL.md -o ~/.claude/skills/zine-rip/SKILL.md --create-dirs

DOWNLOAD ALL gives you a single .zip containing SKILL.md + the tar.gz — drag it into Claude Code in one go.

Sign up to see the full skill

Get the source, install command, comments, and version history

GET AN INVITE

zine-rip

Turns any Internet Archive item containing PDFs into a clean, searchable, per-page image library — and pulls just the photographs out of each page if you want the visuals separated from text.

What it does

Given an IA identifier like unsound-zine or a full URL like https://archive.org/details/some-magazine, the skill:

Downloads every *.pdf in the item (parallel) — skips the _text.pdf and _djvu.txt companions when downloading the originals
Converts each PDF to per-page JPEGs at 200 DPI quality 88 via pdftoppm
Builds an OCR text index by also pulling each _text.pdf (which IA generates automatically) and running pdftotext -layout on it
Crops photo regions out of each page using a PIL+numpy connected-components heuristic — dilates dark pixels into blobs, filters by area + aspect ratio, emits each photo block as its own JPEG. No OpenCV needed.
Builds contact sheets — one for the covers (page 01 of every issue) and one sampling N crops per issue
Optionally imports to Photos.app into a named album (regular album — Apple removed AppleScript write access to Shared Albums in 2018; user can promote manually if needed)
Optional keyword search across the OCR text index → maps query → (issue, page#, jpeg path) → optionally copies matching pages to a destination folder

Quick use

# Full pipeline — download, convert, crop, build sheets
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip unsound-zine

# Just download + convert (no cropping)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip my-magazine --no-crop

# Search the text index
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "marclay"
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "borbetomagus" --copy ~/Desktop/borb_pages

# Import all crops into Photos.app under album name
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py to-photos unsound-zine --album rips

# Re-crop with looser thresholds (more crops, includes smaller blocks)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py crop unsound-zine --min-area 0.025

Output layout

Every IA item gets its own dir at ~/Desktop/<identifier>/:

<identifier>/
├── _pdfs/                          ← original PDFs (each issue or chapter)
├── _pdfs_text/                     ← IA's OCR _text.pdf companions
├── _text/                          ← extracted .txt per PDF
├── _photos/                        ← auto-cropped photo regions per issue
│   └── <pdf-basename>/
│       └── p##_r#.jpg
├── <pdf-basename>/                 ← per-page full JPEGs
│   └── page-##.jpg
├── _covers_contact_sheet.jpg
└── _photos_contact_sheet.jpg

When to use

User shares an archive.org link to a zine, magazine, art book, or comic
User wants visual reference material lifted out of scanned print
Building a local research archive of historical print media
Feeding source material into editorial-style edit engines (sedition-haze, shadow-cast, mood-board)

Dependencies

pdftoppm, pdftotext (poppler — brew install poppler)
Pillow + numpy (Python stdlib + standard scientific stack)
macOS osascript for the to-photos subcommand (no-op on other platforms)

All standard. No ML deps, no API keys.

Notes on the cropping heuristic

The cropper assumes 80s/90s zine layout: dark photographs sit in dense ink-block regions while text columns dilate into thin discontinuous strips. Tunable via --min-area (fraction of page area) and --min-dim (minimum width/height as fraction of page). Defaults work well for industrial-zine layouts with photo blocks ~10-30% of page area. For magazine layouts with smaller photos, lower --min-area to 0.02 or below.

For pages where text and photo are interlocked (e.g. text wrapping around a photo), the bounding box may include some surrounding text — that's actually useful for retaining context. If you want text-stripped pure photos, do a second pass through sticker-rip or manual crop.

Companion skills

ref-rip — for sourcing reference images directly from the live web (Pinterest, IA, etc) when you don't have a specific item to rip
mood-board — collage the cropped photos into a moodboard
sedition-haze / shadow-cast / feverdream — pump the crops back into video edits as overlay textures

BADGE

![downloads](https://dem0n.vip/s/gualo/zine-rip/badge.svg)

VERSIONS

1.0.0 — 8.9 KB — d7b57f31a992

Sign up to see the full skill

zine-rip

What it does

Quick use

Output layout

When to use

Dependencies

Notes on the cropping heuristic

Companion skills

VERSIONS

COMMENTS (0)