translate-book-parallel

Translate entire books (PDF/DOCX/EPUB) into any language using Claude Code parallel subagents with resumable chunked pipeline

INSTALLATION
npx skills add https://github.com/aradotso/trending-skills --skill translate-book-parallel
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Translate Book (Parallel Subagents)

Skill by ara.so — Daily 2026 Skills collection.

A Claude Code skill that translates entire books (PDF/DOCX/EPUB) into any language using parallel subagents. Each chunk gets an isolated context window — preventing truncation and context accumulation that plague single-session translation.

Pipeline Overview

Input (PDF/DOCX/EPUB)

  │

  ▼

Calibre ebook-convert → HTMLZ → HTML → Markdown

  │

  ▼

Split into chunks (~6000 chars each)

  │  manifest.json tracks SHA-256 hashes

  ▼

Parallel subagents (8 concurrent by default)

  │  each: read chunk → translate → write output_chunk*.md

  ▼

Validate (manifest hash check, 1:1 source↔output match)

  │

  ▼

Merge → Pandoc → HTML (with TOC) → Calibre → DOCX / EPUB / PDF

Prerequisites

# 1. Calibre (provides ebook-convert)

# macOS

brew install --cask calibre

# Linux

sudo apt-get install calibre

# Or download from https://calibre-ebook.com/

# 2. Pandoc

brew install pandoc        # macOS

sudo apt-get install pandoc # Linux

# 3. Python dependencies

pip install pypandoc beautifulsoup4

Verify all tools are available:

ebook-convert --version

pandoc --version

python3 -c "import pypandoc; print('pypandoc ok')"

Installation

Option A: npx (recommended)

npx skills add deusyu/translate-book -a claude-code -g

Option B: ClawHub

clawhub install translate-book

Option C: Git clone

git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book

Usage in Claude Code

Once the skill is installed, use natural language inside Claude Code:

translate /path/to/book.pdf to Chinese
translate ~/Downloads/mybook.epub to Japanese
/translate-book translate /path/to/book.docx to French

The skill orchestrates the full pipeline automatically.

Supported Languages

Code

Language

zh

Chinese

en

English

ja

Japanese

ko

Korean

fr

French

de

German

es

Spanish

Language codes are extensible — add new ones in the skill definition.

Running Pipeline Steps Manually

Step 1: Convert to Markdown Chunks

python3 scripts/convert.py /path/to/book.pdf --olang zh

This produces inside {book_name}_temp/:

  • chunk0001.md, chunk0002.md, ... (source chunks, ~6000 chars each)
  • manifest.json (SHA-256 hashes for validation)
# For EPUB input

python3 scripts/convert.py /path/to/book.epub --olang ja

# For DOCX input

python3 scripts/convert.py /path/to/book.docx --olang fr

Step 2: Translate (Parallel Subagents)

The skill handles this step — it launches 8 concurrent subagents per batch, each translating one chunk independently:

# Each subagent receives exactly this task:

Read chunk0042.md → translate to target language → write output_chunk0042.md

Resumable: Already-translated chunks (valid output_chunk*.md files) are skipped on re-run.

Step 3: Merge and Build All Formats

python3 scripts/merge_and_build.py \

  --temp-dir book_name_temp \

  --title "《Book Title in Target Language》"

Before merging, validation checks:

  • Every source chunk has a matching output file (1:1)
  • Source chunk hashes match manifest.json (no stale outputs)
  • No output files are empty

Outputs produced:

File

Description

output.md

Merged translated Markdown

book.html

Web version with floating TOC

book.docx

Word document

book.epub

E-book format

book.pdf

Print-ready PDF

Project Structure

translate-book/

├── SKILL.md                    # Claude Code skill definition (orchestrator)

├── scripts/

│   ├── convert.py              # PDF/DOCX/EPUB → Markdown chunks via Calibre HTMLZ

│   ├── manifest.py             # SHA-256 chunk tracking and merge validation

│   ├── merge_and_build.py      # Merge chunks → HTML → DOCX/EPUB/PDF

│   ├── calibre_html_publish.py # Calibre wrapper for format conversion

│   ├── template.html           # Web HTML template with floating TOC

│   └── template_ebook.html     # Ebook HTML template

└── README.md

How Manifest Validation Works

# scripts/manifest.py (conceptual usage)

# During convert.py — records source hashes

manifest = {

    "chunk0001.md": "sha256:abc123...",

    "chunk0002.md": "sha256:def456...",

    # ...

}

# During merge_and_build.py — validates before merging

# 1. Check every chunk has a corresponding output_chunk

# 2. Re-hash source chunks and compare against manifest

# 3. Reject if any hash mismatches (stale/corrupt output)

# 4. Reject if any output file is empty

If validation fails, the script auto-deletes stale output.md and re-merges from valid chunk outputs.

Real-World Example: Translate a Technical Book

# 1. Install the skill

npx skills add deusyu/translate-book -a claude-code -g

# 2. Open Claude Code in your working directory

cd ~/books

# 3. Say in Claude Code:

# "translate clean-code.pdf to Chinese"

# Claude Code will:

# - Run convert.py to split into chunks

# - Launch 8 parallel subagents per batch

# - Each subagent translates one chunk

# - Validate all outputs via manifest

# - Merge and build all formats

# 4. Outputs appear in:

ls clean-code_temp/

# chunk0001.md  chunk0002.md  ...  (source)

# output_chunk0001.md  ...         (translated)

# manifest.json

# output.md

# book.html

# book.docx

# book.epub

# book.pdf

Resuming an Interrupted Translation

# If translation is interrupted, just re-run the same command:

# "translate clean-code.pdf to Chinese"

# The skill detects existing output_chunk*.md files

# and skips already-translated chunks automatically.

# Only missing or failed chunks are retried.

Changing Output Metadata After Translation

If you need to update the title, author, template, or image assets without re-translating:

# Delete only the final artifacts (keeps translated chunks)

cd book_name_temp/

rm -f output.md book*.html book.docx book.epub book.pdf

# Re-run merge step

python3 ../scripts/merge_and_build.py \

  --temp-dir . \

  --title "《New Title》"

Do NOT delete chunk files — those are your translated content. Only delete final artifacts when changing metadata.

Troubleshooting

Problem

Solution

Calibre ebook-convert not found

Install Calibre; ensure ebook-convert is in $PATH

Manifest validation failed

Source chunks changed — re-run convert.py

Missing source chunk

Source file deleted — re-run convert.py to regenerate

Incomplete translation

Re-run the skill — resumes from last valid chunk

Changed title/template but output unchanged

Delete output.md, book*.html, book.docx, book.epub, book.pdf then re-run merge_and_build.py

output.md exists but manifest invalid

Script auto-deletes stale output and re-merges

PDF generation fails

Verify Calibre has PDF output support; try ebook-convert --help

Empty output chunks

Retry failed chunks; check API rate limits

Diagnosing Chunk Issues

# Check which chunks are missing translation

ls book_temp/chunk*.md | wc -l          # total source chunks

ls book_temp/output_chunk*.md | wc -l   # translated chunks so far

# Find missing output chunks

for f in book_temp/chunk*.md; do

  base=$(basename "$f" .md)

  out="book_temp/output_${base}.md"

  if [ ! -f "$out" ] || [ ! -s "$out" ]; then

    echo "Missing: $out"

  fi

done

# Check manifest

cat book_temp/manifest.json | python3 -m json.tool | head -30

Configuration Tips

  • Chunk size: ~6000 chars per chunk is the default. Smaller chunks = more parallelism but more API calls.
  • Concurrency: Default is 8 parallel subagents per batch. Adjust in SKILL.md if hitting rate limits.
  • Languages: Add new language codes to the skill triggers and translation prompt in SKILL.md.
  • Templates: Customize scripts/template.html and scripts/template_ebook.html for different HTML/ebook styling.

Key Design Principles

  • Isolated context per chunk — each subagent starts fresh, preventing context overflow on long books
  • Hash-based integrity — SHA-256 tracking catches stale or corrupt translated chunks before merging
  • Resumable at chunk granularity — never re-translate what's already done
  • Format-agnostic input — Calibre handles PDF/DOCX/EPUB normalization before the pipeline begins
  • Multiple output formats — single pipeline produces HTML, DOCX, EPUB, and PDF simultaneously
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card