defuddle

Extract clean article content from web pages, removing ads and clutter to return readable Markdown with metadata. Parses URLs or local HTML files and outputs clean Markdown with frontmatter (title, author, publication date, word count) Supports JSON metadata extraction including featured images, domain, favicon, and parse timing Includes a guided workflow: extract content, preview summary, save to user-specified directory, and confirm file location Works best on article-style pages (blogs, news, documentation); not designed for JavaScript-heavy or single-page applications Requires Node.js, npm, and jsdom as a peer dependency

INSTALLATION
npx skills add https://github.com/joeseesun/defuddle-skill --skill defuddle
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$29

defuddle parse "<url>" -m -j

Step 2: Present a summary to the user

Show the user:

  • Title: from JSON title field
  • Author: from JSON author field
  • Source: domain
  • Word count: from JSON wordCount field
  • A brief preview (first 2-3 sentences)

Step 3: Ask where to save

If this is the first time using defuddle in this conversation, ask the user:

"Save to which directory? (e.g. ~/Documents, ~/Desktop, or a custom path)"

Remember the user's chosen directory for subsequent uses in the same conversation.

Step 4: Save as Markdown file

Write the file with frontmatter + full content:

---

title: {title}

author: {author}

source: {url}

date: {published or "Unknown"}

clipped: {today's date YYYY-MM-DD}

wordCount: {wordCount}

---

# {title}

{markdown content}

File naming: Use the article title as filename, sanitized for filesystem:

  • Replace special characters with spaces
  • Trim whitespace
  • Example: The Shape of the Essay Field.md

Step 5: Confirm to user

Tell the user the file path where it was saved.

CLI Reference

defuddle parse <source> [options]

Arguments:

  • <source> — URL (https://...) or local HTML file path

Options:

Flag

Description

-m, --markdown

Convert content to Markdown

-j, --json

Output as JSON with full metadata

-o, --output <file>

Write to file instead of stdout

-p, --property <name>

Extract single property (title, description, domain, author, published, wordCount, content)

--debug

Verbose logging

JSON Response Fields

When using -j, the response includes:

  • title — Article title
  • author — Author name
  • published — Publication date
  • description — Meta description
  • content — Extracted Markdown (when -m used)
  • domain — Source domain
  • favicon — Favicon URL
  • image — Featured image URL
  • site — Site name
  • wordCount — Word count
  • parseTime — Processing time in ms

Notes

  • Requires Node.js and npm
  • jsdom is required as a peer dependency
  • Works best with article-style pages (blogs, news, documentation)
  • Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card