SKILL.md
$27
Identify:
- Site structure: Flat vs. deep hierarchy
- Framework: Next.js, static, SPA, etc.
- Key paths: Sitemap, robots.txt, API, static assets
Best Practices
Redirect Chains & Loops
- Fix multi-hop redirects; point directly to final URL
- Loops: URLs redirecting back to themselves; break the cycle
Broken Links (4xx)
- Fix broken internal/external links; 301 or remove
- Audit regularly; update or remove broken links
Site Architecture
Principle
Guideline
Depth
Important pages within 3–4 clicks from homepage
Orphan pages
Add internal links to pages with no incoming links; see internal-links for link strategy
Hierarchy
Logical structure; hub pages link to content
Pagination vs Infinite Scroll
Problem: With infinite scroll, crawlers cannot emulate user behavior (scroll, click "Load more"); content loaded after initial page load is not discoverable. Same applies to masonry + infinite scroll, lazy-loaded lists, and similar patterns.
Solution: Prefer pagination for key content. If keeping infinite scroll, make it search-friendly per Google's recommendations:
Requirement
Practice
Component pages
Chunk content into paginated pages accessible without JavaScript
Full URLs
Each page has unique URL (e.g. ?page=1, ?lastid=567); avoid #1
No overlap
Each item listed once in series; no duplication across pages
Direct access
URL works in new tab; no cookie/history dependency
pushState/replaceState
Update URL as user scrolls; enables back/forward, shareable links
404 for out-of-bounds
?page=999 returns 404 when only 998 pages exist
Reference: Infinite scroll search-friendly recommendations (Google Search Central, 2014)
Pagination (Traditional)
- Reference links to next/previous pages;
rel="prev"/rel="next"where applicable
- Avoid dynamic-only loading; ensure links in HTML
Crawl Budget
Crawl budget is the number of URLs Googlebot will crawl on your site in a given period. Large sites (10,000+ pages) may waste up to 30% of crawl budget on duplicates, redirects, and low-value URLs.
Waste source
Fix
Duplicate URLs
Canonical; consolidate; 301 to preferred
Redirect chains
Point directly to final URL
Parameter proliferation
Use rel="canonical"; consider Clean-param (Yandex)
Low-value pages
noindex for thin/duplicate; see indexing
Crawl traps
Avoid infinite URL generation (e.g. faceted filters)
Sitemap: Include only indexable, canonical URLs. See xml-sitemap, canonical-tag.
AI Crawler Optimization
AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) now represent ~28% of Googlebot's crawl volume. Their behavior differs from search engines—optimizing for both improves GEO (AI search visibility). See generative-engine-optimization for GEO strategy. Vercel/MERJ study (Dec 2024):
Factor
AI Crawlers (GPTBot, Claude)
Googlebot
JavaScript
Do not execute JS; cannot read client-side rendered content
Full JS rendering
404 rate
~34% of fetches hit 404s
~8%
Redirects
~14% of fetches follow redirects
~1.5%
Content in initial HTML
JSON, RSC in initial response can be indexed
Same
Recommendations for AI crawlability:
Practice
Action
Server-side rendering
Critical content in initial HTML. Use SSR, ISR, or SSG. See rendering-strategies for full guide.
URL management
Keep sitemaps updated; use consistent URL patterns; avoid outdated /static/ assets that cause 404s. AI crawlers frequently hit outdated URLs.
Redirects
Fix redirect chains; point directly to final URL. AI crawlers waste ~14% of fetches on redirects.
404 handling
Fix broken links; remove or redirect outdated URLs. High 404 rates suggest AI crawlers may use stale URL lists.
Reference: The rise of the AI crawler (Vercel, 2024)
Common Issues
Issue
Check
Redirect chains
Update links to point directly to final URL
Broken links
301 or remove; audit internal and external
Orphan pages
Add internal links from hub or navigation; see internal-links for strategy
Infinite scroll
Provide paginated component pages; or replace with pagination for key content; see above
AI crawlers missing content
Ensure critical content in initial HTML; see rendering-strategies
Output Format
- Redirect audit: Chains and loops to fix
- Broken link audit: 4xx links to fix
- Site structure: Orphan pages, hierarchy
- Pagination: Implementation for crawlable content
- AI crawler: SSR/URL/redirect checks if GEO or AI visibility is a goal
Related Skills
- seo-strategy: SEO workflow; crawlability is Technical phase (P0)
- website-structure: Plan which pages to build, page priority, structure planning; use before or alongside crawlability audit
- robots-txt: robots.txt configuration; AI crawler allow/block (GPTBot, ClaudeBot)
- xml-sitemap: URL discovery; keep updated to reduce AI crawler 404s
- google-search-console: Index status, Coverage report
- indexing: Fix indexing issues
- internal-links: Internal linking best practices
- masonry: Masonry + infinite scroll has same crawl issue; layout skill references this for SEO
- generative-engine-optimization: GEO strategy; AI search visibility; crawlability enables AI citation
- canonical-tag: Canonical reduces crawl budget waste on duplicates
- rendering-strategies: SSR, SSG, CSR; content in initial HTML; crawler visibility