SKILL.md
$2a
cd ~/.claude/skills/web-fetch && bun install
Default Workflow
Use this as the default flow for any URL:
URL="<url>"
CONTENT_TYPE="$(curl -sIL "$URL" | awk -F': ' 'tolower($1)=="content-type"{print tolower($2)}' | tr -d '\r' | tail -1)"
if echo "$CONTENT_TYPE" | grep -q "markdown"; then
curl -sL "$URL"
else
curl -sL "$URL" \
| html2markdown \
--include-selector "article,main,[role=main]" \
--exclude-selector "nav,header,footer,script,style"
fi
Known Site Selectors
Site
Include Selector
Exclude Selector
platform.claude.com
#content-container
-
docs.anthropic.com
#content-container
-
developer.mozilla.org
article
-
github.com (docs)
article
nav,.sidebar
Generic
article,main,[role=main]
nav,header,footer,script,style
Example:
curl -sL "<url>" \
| html2markdown \
--include-selector "#content-container" \
--exclude-selector "nav,header,footer"
Finding the Right Selector
When a site isn't in the patterns list:
# Check what content containers exist
curl -s "<url>" | grep -o '<article[^>]*>\|<main[^>]*>\|id="[^"]*content[^"]*"' | head -10
# Test a selector
curl -sL "<url>" | html2markdown --include-selector "<selector>" | head -30
# Check line count
curl -sL "<url>" | html2markdown --include-selector "<selector>" | wc -l
Universal Fallback Script
When selectors produce poor output, run the bundled parser:
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"
If already in the skill directory:
bun fetch.ts "<url>"
Options Reference
--include-selector "CSS" # Keep only matching elements
--exclude-selector "CSS" # Remove matching elements
--domain "https://..." # Convert relative links to absolute
Troubleshooting
Empty output with selectors: The page might be markdown-native. Check headers first:
curl -sIL "<url>" | grep -i '^content-type:'
Wrong content selected: The site may have multiple article/main regions:
curl -s "<url>" | grep -o '<article[^>]*>'
**html2markdown not found**: Install it, then retry selector-based extraction.
**bun or script deps missing**: Run cd ~/.claude/skills/web-fetch && bun install.
Missing code blocks: Check if the site uses non-standard code formatting.
Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.