parallel-web-extract

Extract content from multiple URLs in parallel, token-efficiently. Handles webpages, articles, PDFs, and JavaScript-heavy sites with a single command Runs in a forked context to minimize token overhead compared to built-in WebFetch Supports batch extraction of multiple URLs with optional focus objectives Requires parallel-cli installation and authentication; outputs extracted content as markdown to a local file for follow-up queries

INSTALLATION
npx skills add https://github.com/parallel-web/parallel-agent-skills --skill parallel-web-extract
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2a

Options if needed:

  • --objective "focus area" to focus extraction on a specific goal (also silences the "neither objective nor search_queries" warning that V1 emits when neither is set)
  • -q "keyword" (repeatable) to prioritize keywords in excerpts
  • --full-content to include the complete page body (for long articles, PDFs, or when excerpts may not capture what you need)
  • --full-content-max-chars N to cap full-content size per result
  • --no-excerpts to strip excerpts when you only want full content

Handling failed extractions

If the response has an errors field, an empty results array, or a 404/timeout for the URL, do NOT fabricate content. Tell the user the extraction failed, surface the upstream status, and suggest:

  • Verifying the URL (the page may have moved)
  • Retrying with --full-content if excerpts came back empty but the page exists
  • Using parallel-cli search to locate the current URL if the page was renamed

Response format

Return content as:

Page Title

Then the extracted content verbatim, with these rules:

  • Keep content verbatim - do not paraphrase or summarize
  • Parse lists exhaustively - extract EVERY numbered/bulleted item
  • Strip only obvious noise: nav menus, footers, ads
  • Preserve all facts, names, numbers, dates, quotes

After the response, mention the output file path (/tmp/$FILENAME.json) so the user knows it's available for follow-up questions.

Setup

Requires parallel-cli (installed and authenticated). If parallel-cli --version fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card