zine-rip
BY @GUALO — 16 DOWNLOADS — CONTENT
Rip any Internet Archive zine/magazine/book collection into searchable per-page JPEGs + auto-cropped photo regions + an OCR keyword index, optionally imported into Photos.app. Built originally for the Unsound industrial-zine archive (1983-1986, 10 issues, 579 pages, 576 cropped photos) but works on any IA item with PDF + _text.pdf companions. Use when the user sends an archive.org/details/... URL or identifier and wants the source material as image refs, not as PDFs.
CLI INSTALL
curl -sS https://dem0n.vip/s/gualo/zine-rip/SKILL.md -o ~/.claude/skills/zine-rip/SKILL.md --create-dirs
DOWNLOAD ALL gives you a single .zip containing SKILL.md + the tar.gz — drag it into Claude Code in one go.
Sign up to see the full skill
Get the source, install command, comments, and version history
GET AN INVITEzine-rip
Turns any Internet Archive item containing PDFs into a clean, searchable, per-page image library — and pulls just the photographs out of each page if you want the visuals separated from text.
What it does
Given an IA identifier like unsound-zine or a full URL like https://archive.org/details/some-magazine, the skill:
- Downloads every
*.pdfin the item (parallel) — skips the_text.pdfand_djvu.txtcompanions when downloading the originals - Converts each PDF to per-page JPEGs at 200 DPI quality 88 via
pdftoppm - Builds an OCR text index by also pulling each
_text.pdf(which IA generates automatically) and runningpdftotext -layouton it - Crops photo regions out of each page using a PIL+numpy connected-components heuristic — dilates dark pixels into blobs, filters by area + aspect ratio, emits each photo block as its own JPEG. No OpenCV needed.
- Builds contact sheets — one for the covers (page 01 of every issue) and one sampling N crops per issue
- Optionally imports to Photos.app into a named album (regular album — Apple removed AppleScript write access to Shared Albums in 2018; user can promote manually if needed)
- Optional keyword search across the OCR text index → maps query → (issue, page#, jpeg path) → optionally copies matching pages to a destination folder
Quick use
# Full pipeline — download, convert, crop, build sheets
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip unsound-zine
# Just download + convert (no cropping)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py rip my-magazine --no-crop
# Search the text index
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "marclay"
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py find unsound-zine "borbetomagus" --copy ~/Desktop/borb_pages
# Import all crops into Photos.app under album name
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py to-photos unsound-zine --album rips
# Re-crop with looser thresholds (more crops, includes smaller blocks)
python3 ~/.claude/skills/zine-rip/scripts/zine_rip.py crop unsound-zine --min-area 0.025
Output layout
Every IA item gets its own dir at ~/Desktop/<identifier>/:
<identifier>/
├── _pdfs/ ← original PDFs (each issue or chapter)
├── _pdfs_text/ ← IA's OCR _text.pdf companions
├── _text/ ← extracted .txt per PDF
├── _photos/ ← auto-cropped photo regions per issue
│ └── <pdf-basename>/
│ └── p##_r#.jpg
├── <pdf-basename>/ ← per-page full JPEGs
│ └── page-##.jpg
├── _covers_contact_sheet.jpg
└── _photos_contact_sheet.jpg
When to use
- User shares an archive.org link to a zine, magazine, art book, or comic
- User wants visual reference material lifted out of scanned print
- Building a local research archive of historical print media
- Feeding source material into editorial-style edit engines (sedition-haze, shadow-cast, mood-board)
Dependencies
pdftoppm,pdftotext(poppler —brew install poppler)Pillow+numpy(Python stdlib + standard scientific stack)- macOS
osascriptfor theto-photossubcommand (no-op on other platforms)
All standard. No ML deps, no API keys.
Notes on the cropping heuristic
The cropper assumes 80s/90s zine layout: dark photographs sit in dense ink-block regions while text columns dilate into thin discontinuous strips. Tunable via --min-area (fraction of page area) and --min-dim (minimum width/height as fraction of page). Defaults work well for industrial-zine layouts with photo blocks ~10-30% of page area. For magazine layouts with smaller photos, lower --min-area to 0.02 or below.
For pages where text and photo are interlocked (e.g. text wrapping around a photo), the bounding box may include some surrounding text — that's actually useful for retaining context. If you want text-stripped pure photos, do a second pass through sticker-rip or manual crop.
Companion skills
ref-rip— for sourcing reference images directly from the live web (Pinterest, IA, etc) when you don't have a specific item to ripmood-board— collage the cropped photos into a moodboardsedition-haze/shadow-cast/feverdream— pump the crops back into video edits as overlay textures
BADGE

VERSIONS
- 1.0.0 — 8.9 KB — d7b57f31a992
COMMENTS (0)
LOGIN TO COMMENT