SKILL.md

$28

Database Selection Guide

Match the user's intent to the right database(s).

By Use Case

User is asking about...

Primary database(s)

Also consider

Papers on a biomedical topic

PubMed

Semantic Scholar, OpenAlex

Full text of a biomedical article

PMC

CORE

Biology preprints

bioRxiv

Semantic Scholar, OpenAlex

Health/medical preprints

medRxiv

Semantic Scholar, OpenAlex

Physics, math, or CS preprints

arXiv

Semantic Scholar, OpenAlex

Papers across all fields

OpenAlex

Semantic Scholar, Crossref

A specific paper by DOI

Crossref

Unpaywall, Semantic Scholar

Open access PDF for a paper

Unpaywall

CORE, PMC

Citation graph (who cites whom)

Semantic Scholar

OpenAlex

Author's publications

Semantic Scholar

OpenAlex

Paper recommendations

Semantic Scholar

Full text (any field)

CORE

PMC (biomedical only)

Journal/publisher metadata

Crossref

OpenAlex

Funder information

Crossref

OpenAlex

Convert between PMID/PMCID/DOI

PMC (ID Converter)

Crossref

Recent preprints by date

bioRxiv, medRxiv

arXiv

Cross-Database Queries

User is asking about...

Databases to query

Everything about a paper (metadata + citations + OA)

Crossref + Semantic Scholar + Unpaywall

Comprehensive literature search

PubMed + OpenAlex + Semantic Scholar

Find and read a paper

PubMed (find) + Unpaywall (OA link) + PMC or CORE (full text)

Preprint and its published version

bioRxiv/medRxiv + Crossref

Author overview with citation metrics

Semantic Scholar + OpenAlex

When a query spans multiple needs (e.g., "find papers about CRISPR and get me the PDFs"), query the relevant databases in parallel.

Common Identifier Formats

Different databases use different identifier systems. If a query fails, the identifier format may be wrong.

Identifier

Format

Example

Used by

DOI

10.xxxx/xxxxx

10.1038/nature12373

All databases

PMID

Integer

34567890

PubMed, PMC, Semantic Scholar

PMCID

PMC + digits

PMC7029759

PMC, Europe PMC

arXiv ID

YYMM.NNNNN

2103.15348

arXiv, Semantic Scholar

OpenAlex ID

W + digits

W2741809807

OpenAlex

Semantic Scholar ID

40-char hex

649def34f8be...

Semantic Scholar

ORCID

0000-XXXX-XXXX-XXXX

0000-0001-6187-6610

OpenAlex, Crossref

ISSN

XXXX-XXXX

0028-0836

Crossref, OpenAlex

Cross-referencing IDs: Semantic Scholar accepts DOI, PMID, PMCID, and arXiv ID via prefixes (e.g., DOI:10.1038/nature12373, PMID:34567890, ARXIV:2103.15348). OpenAlex accepts DOI and PMID via prefixes (doi:10.1038/..., pmid:34567890). Use the PMC ID Converter to translate between PMID, PMCID, and DOI.

API Keys and Access

Most of these databases are fully open. A few benefit from API keys for higher rate limits.

Databases requiring or benefiting from API keys

Database

Env Variable

Required?

Registration

NCBI (PubMed, PMC)

NCBI_API_KEY

No (3 req/s without, 10 with)

https://www.ncbi.nlm.nih.gov/account/settings/

CORE

CORE_API_KEY

Yes for full text

https://core.ac.uk/services/api

Semantic Scholar

S2_API_KEY

No (shared pool without)

https://www.semanticscholar.org/product/api#api-key-form

OpenAlex

OPENALEX_API_KEY

Recommended

https://openalex.org/settings/api

Fully open databases (no key needed)

Database

Notes

bioRxiv / medRxiv

No auth, no documented rate limits

arXiv

No auth, max 1 request per 3 seconds

Crossref

No auth; add mailto param for polite pool (2x rate limit)

Unpaywall

No auth; requires email parameter

Loading API keys

Check the environment first -- the key may already be exported (e.g., $NCBI_API_KEY).

**Fall back to .env** -- check .env in the current working directory.

Proceed without -- most APIs still work at lower rate limits. Tell the user which key is missing and how to get one.

Making API Calls

Use your environment's HTTP fetch tool to call REST endpoints:

Platform

HTTP Fetch Tool

Fallback

Claude Code

WebFetch

curl via Bash

Gemini CLI

web_fetch

curl via shell

Windsurf

read_url_content

curl via terminal

Cursor

No dedicated fetch tool

curl via run_terminal_cmd

Codex CLI

No dedicated fetch tool

curl via shell

Cline

No dedicated fetch tool

curl via execute_command

If the fetch tool fails, fall back to curl via whatever shell tool is available.

Special cases

arXiv returns Atom XML, not JSON. Parse it or use curl and extract the relevant fields. Consider piping through a simple parser if available.

PMC eFetch returns JATS XML for full text. This is expected -- full text articles are in XML format.

Crossref and Unpaywall benefit from including a mailto parameter or email for the polite/fast pool.

Request guidelines

For NCBI APIs (PubMed, PMC): max 3 req/sec without key, 10 with key. Make requests sequentially.

For arXiv: max 1 request every 3 seconds. Be patient.

For Crossref: 5 req/sec (public), 10 req/sec (polite pool with mailto).

For other APIs with no strict limits, you can query multiple databases in parallel.

If you get HTTP 429 (rate limit), wait briefly and retry once.

Error recovery

Check the identifier format -- use the Common Identifier Formats table. A PMID won't work in arXiv, an arXiv ID won't work in PubMed directly.

Try alternative identifiers -- if a DOI fails in one database, try the title or PMID instead.

Try a different database -- if PubMed returns nothing for a CS paper, try Semantic Scholar or OpenAlex.

Report the failure -- tell the user which database failed, the error, and what you tried instead.

Output Format

Structure your response like this:

## Databases Queried

- **PubMed** -- esearch + esummary for "CRISPR gene therapy"

- **Unpaywall** -- DOI lookup for 10.1038/...

## Results

### PubMed

[raw JSON response or formatted results]

### Unpaywall

[raw JSON response]

If results are very large, present the most relevant portion and note that more data is available. But default to showing the full raw JSON -- the user asked for it.

Available Databases

Read the relevant reference file before making any API call.

Biomedical Literature

Database

Reference File

What it covers

PubMed

references/pubmed.md

37M+ biomedical citations, abstracts, MeSH terms

PMC

references/pmc.md

10M+ full-text biomedical articles (JATS XML), ID conversion

Preprint Servers

Database

Reference File

What it covers

bioRxiv

references/biorxiv.md

Biology preprints (browse by date/DOI, no keyword search)

medRxiv

references/medrxiv.md

Health sciences preprints (browse by date/DOI, no keyword search)

arXiv

references/arxiv.md

Physics, math, CS, biology, economics preprints (keyword search, Atom XML)

Multidisciplinary Indexes

Database

Reference File

What it covers

OpenAlex

references/openalex.md

250M+ works, authors, institutions, topics, citation data

Crossref

references/crossref.md

150M+ DOI metadata, journals, funders, references

Semantic Scholar

references/semantic-scholar.md

200M+ papers, citation graphs, AI-generated TLDRs, recommendations

Open Access & Full Text

Database

Reference File

What it covers

CORE

references/core.md

37M+ full texts from OA repositories worldwide

Unpaywall

references/unpaywall.md

OA status and PDF links for any DOI

paper-lookup

SKILL.md

Database Selection Guide

By Use Case

Cross-Database Queries

Common Identifier Formats

API Keys and Access

Databases requiring or benefiting from API keys

Fully open databases (no key needed)

Loading API keys

Making API Calls

Special cases

Request guidelines

Error recovery

Output Format

Available Databases

Biomedical Literature

Preprint Servers

Multidisciplinary Indexes

Open Access & Full Text

Stop writing automation&scrapers

paper-lookup

SKILL.md

Database Selection Guide

By Use Case

Cross-Database Queries

Common Identifier Formats

API Keys and Access

Databases requiring or benefiting from API keys

Fully open databases (no key needed)

Loading API keys

Making API Calls

Special cases

Request guidelines

Error recovery

Output Format

Available Databases

Biomedical Literature

Preprint Servers

Multidisciplinary Indexes

Open Access &#x26; Full Text

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Open Access & Full Text