SKILL.md
$28
Database Selection Guide
Match the user's intent to the right database(s).
By Use Case
User is asking about...
Primary database(s)
Also consider
Papers on a biomedical topic
PubMed
Semantic Scholar, OpenAlex
Full text of a biomedical article
PMC
CORE
Biology preprints
bioRxiv
Semantic Scholar, OpenAlex
Health/medical preprints
medRxiv
Semantic Scholar, OpenAlex
Physics, math, or CS preprints
arXiv
Semantic Scholar, OpenAlex
Papers across all fields
OpenAlex
Semantic Scholar, Crossref
A specific paper by DOI
Crossref
Unpaywall, Semantic Scholar
Open access PDF for a paper
Unpaywall
CORE, PMC
Citation graph (who cites whom)
Semantic Scholar
OpenAlex
Author's publications
Semantic Scholar
OpenAlex
Paper recommendations
Semantic Scholar
--
Full text (any field)
CORE
PMC (biomedical only)
Journal/publisher metadata
Crossref
OpenAlex
Funder information
Crossref
OpenAlex
Convert between PMID/PMCID/DOI
PMC (ID Converter)
Crossref
Recent preprints by date
bioRxiv, medRxiv
arXiv
Cross-Database Queries
User is asking about...
Databases to query
Everything about a paper (metadata + citations + OA)
Crossref + Semantic Scholar + Unpaywall
Comprehensive literature search
PubMed + OpenAlex + Semantic Scholar
Find and read a paper
PubMed (find) + Unpaywall (OA link) + PMC or CORE (full text)
Preprint and its published version
bioRxiv/medRxiv + Crossref
Author overview with citation metrics
Semantic Scholar + OpenAlex
When a query spans multiple needs (e.g., "find papers about CRISPR and get me the PDFs"), query the relevant databases in parallel.
Common Identifier Formats
Different databases use different identifier systems. If a query fails, the identifier format may be wrong.
Identifier
Format
Example
Used by
DOI
10.xxxx/xxxxx
10.1038/nature12373
All databases
PMID
Integer
34567890
PubMed, PMC, Semantic Scholar
PMCID
PMC + digits
PMC7029759
PMC, Europe PMC
arXiv ID
YYMM.NNNNN
2103.15348
arXiv, Semantic Scholar
OpenAlex ID
W + digits
W2741809807
OpenAlex
Semantic Scholar ID
40-char hex
649def34f8be...
Semantic Scholar
ORCID
0000-XXXX-XXXX-XXXX
0000-0001-6187-6610
OpenAlex, Crossref
ISSN
XXXX-XXXX
0028-0836
Crossref, OpenAlex
Cross-referencing IDs: Semantic Scholar accepts DOI, PMID, PMCID, and arXiv ID via prefixes (e.g., DOI:10.1038/nature12373, PMID:34567890, ARXIV:2103.15348). OpenAlex accepts DOI and PMID via prefixes (doi:10.1038/..., pmid:34567890). Use the PMC ID Converter to translate between PMID, PMCID, and DOI.
API Keys and Access
Most of these databases are fully open. A few benefit from API keys for higher rate limits.
Databases requiring or benefiting from API keys
Database
Env Variable
Required?
Registration
NCBI (PubMed, PMC)
NCBI_API_KEY
No (3 req/s without, 10 with)
https://www.ncbi.nlm.nih.gov/account/settings/
CORE
CORE_API_KEY
Yes for full text
https://core.ac.uk/services/api
Semantic Scholar
S2_API_KEY
No (shared pool without)
https://www.semanticscholar.org/product/api#api-key-form
OpenAlex
OPENALEX_API_KEY
Recommended
https://openalex.org/settings/api
Fully open databases (no key needed)
Database
Notes
bioRxiv / medRxiv
No auth, no documented rate limits
arXiv
No auth, max 1 request per 3 seconds
Crossref
No auth; add mailto param for polite pool (2x rate limit)
Unpaywall
No auth; requires email parameter
Loading API keys
- Check the environment first -- the key may already be exported (e.g.,
$NCBI_API_KEY).
- **Fall back to
.env** -- check.envin the current working directory.
- Proceed without -- most APIs still work at lower rate limits. Tell the user which key is missing and how to get one.
Making API Calls
Use your environment's HTTP fetch tool to call REST endpoints:
Platform
HTTP Fetch Tool
Fallback
Claude Code
WebFetch
curl via Bash
Gemini CLI
web_fetch
curl via shell
Windsurf
read_url_content
curl via terminal
Cursor
No dedicated fetch tool
curl via run_terminal_cmd
Codex CLI
No dedicated fetch tool
curl via shell
Cline
No dedicated fetch tool
curl via execute_command
If the fetch tool fails, fall back to curl via whatever shell tool is available.
Special cases
- arXiv returns Atom XML, not JSON. Parse it or use
curland extract the relevant fields. Consider piping through a simple parser if available.
- PMC eFetch returns JATS XML for full text. This is expected -- full text articles are in XML format.
- Crossref and Unpaywall benefit from including a
mailtoparameter or email for the polite/fast pool.
Request guidelines
- For NCBI APIs (PubMed, PMC): max 3 req/sec without key, 10 with key. Make requests sequentially.
- For arXiv: max 1 request every 3 seconds. Be patient.
- For Crossref: 5 req/sec (public), 10 req/sec (polite pool with
mailto).
- For other APIs with no strict limits, you can query multiple databases in parallel.
- If you get HTTP 429 (rate limit), wait briefly and retry once.
Error recovery
- Check the identifier format -- use the Common Identifier Formats table. A PMID won't work in arXiv, an arXiv ID won't work in PubMed directly.
- Try alternative identifiers -- if a DOI fails in one database, try the title or PMID instead.
- Try a different database -- if PubMed returns nothing for a CS paper, try Semantic Scholar or OpenAlex.
- Report the failure -- tell the user which database failed, the error, and what you tried instead.
Output Format
Structure your response like this:
## Databases Queried
- **PubMed** -- esearch + esummary for "CRISPR gene therapy"
- **Unpaywall** -- DOI lookup for 10.1038/...
## Results
### PubMed
[raw JSON response or formatted results]
### Unpaywall
[raw JSON response]
If results are very large, present the most relevant portion and note that more data is available. But default to showing the full raw JSON -- the user asked for it.
Available Databases
Read the relevant reference file before making any API call.
Biomedical Literature
Database
Reference File
What it covers
PubMed
references/pubmed.md
37M+ biomedical citations, abstracts, MeSH terms
PMC
references/pmc.md
10M+ full-text biomedical articles (JATS XML), ID conversion
Preprint Servers
Database
Reference File
What it covers
bioRxiv
references/biorxiv.md
Biology preprints (browse by date/DOI, no keyword search)
medRxiv
references/medrxiv.md
Health sciences preprints (browse by date/DOI, no keyword search)
arXiv
references/arxiv.md
Physics, math, CS, biology, economics preprints (keyword search, Atom XML)
Multidisciplinary Indexes
Database
Reference File
What it covers
OpenAlex
references/openalex.md
250M+ works, authors, institutions, topics, citation data
Crossref
references/crossref.md
150M+ DOI metadata, journals, funders, references
Semantic Scholar
references/semantic-scholar.md
200M+ papers, citation graphs, AI-generated TLDRs, recommendations
Open Access & Full Text
Database
Reference File
What it covers
CORE
references/core.md
37M+ full texts from OA repositories worldwide
Unpaywall
references/unpaywall.md
OA status and PDF links for any DOI