openalex-database

Query and analyze scholarly literature using the OpenAlex database. This skill should be used when searching for academic papers, analyzing research trends,…

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill openalex-database
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$29

Install required package using uv:

uv pip install requests

No API key required - OpenAlex is completely open.

Core Capabilities

1. Search for Papers

Use for: Finding papers by title, abstract, or topic

# Simple search

results = client.search_works(

    search="machine learning",

    per_page=100

)

# Search with filters

results = client.search_works(

    search="CRISPR gene editing",

    filter_params={

        "publication_year": ">2020",

        "is_oa": "true"

    },

    sort="cited_by_count:desc"

)

2. Find Works by Author

Use for: Getting all publications by a specific researcher

Use the two-step pattern (entity name → ID → works):

from scripts.query_helpers import find_author_works

works = find_author_works(

    author_name="Jennifer Doudna",

    client=client,

    limit=100

)

Manual two-step approach:

# Step 1: Get author ID

author_response = client._make_request(

    '/authors',

    params={'search': 'Jennifer Doudna', 'per-page': 1}

)

author_id = author_response['results'][0]['id'].split('/')[-1]

# Step 2: Get works

works = client.search_works(

    filter_params={"authorships.author.id": author_id}

)

3. Find Works from Institution

Use for: Analyzing research output from universities or organizations

from scripts.query_helpers import find_institution_works

works = find_institution_works(

    institution_name="Stanford University",

    client=client,

    limit=200

)

4. Highly Cited Papers

Use for: Finding influential papers in a field

from scripts.query_helpers import find_highly_cited_recent_papers

papers = find_highly_cited_recent_papers(

    topic="quantum computing",

    years=">2020",

    client=client,

    limit=100

)

5. Open Access Papers

Use for: Finding freely available research

from scripts.query_helpers import get_open_access_papers

papers = get_open_access_papers(

    search_term="climate change",

    client=client,

    oa_status="any",  # or "gold", "green", "hybrid", "bronze"

    limit=200

)

6. Publication Trends Analysis

Use for: Tracking research output over time

from scripts.query_helpers import get_publication_trends

trends = get_publication_trends(

    search_term="artificial intelligence",

    filter_params={"is_oa": "true"},

    client=client

)

# Sort and display

for trend in sorted(trends, key=lambda x: x['key'])[-10:]:

    print(f"{trend['key']}: {trend['count']} publications")

7. Research Output Analysis

Use for: Comprehensive analysis of author or institution research

from scripts.query_helpers import analyze_research_output

analysis = analyze_research_output(

    entity_type='institution',  # or 'author'

    entity_name='MIT',

    client=client,

    years='>2020'

)

print(f"Total works: {analysis['total_works']}")

print(f"Open access: {analysis['open_access_percentage']}%")

print(f"Top topics: {analysis['top_topics'][:5]}")

8. Batch Lookups

Use for: Getting information for multiple DOIs, ORCIDs, or IDs efficiently

dois = [

    "https://doi.org/10.1038/s41586-021-03819-2",

    "https://doi.org/10.1126/science.abc1234",

    # ... up to 50 DOIs

]

works = client.batch_lookup(

    entity_type='works',

    ids=dois,

    id_field='doi'

)

9. Random Sampling

Use for: Getting representative samples for analysis

# Small sample

works = client.sample_works(

    sample_size=100,

    seed=42,  # For reproducibility

    filter_params={"publication_year": "2023"}

)

# Large sample (>10k) - automatically handles multiple requests

works = client.sample_works(

    sample_size=25000,

    seed=42,

    filter_params={"is_oa": "true"}

)

10. Citation Analysis

Use for: Finding papers that cite a specific work

# Get the work

work = client.get_entity('works', 'https://doi.org/10.1038/s41586-021-03819-2')

# Get citing papers using cited_by_api_url

import requests

citing_response = requests.get(

    work['cited_by_api_url'],

    params={'mailto': client.email, 'per-page': 200}

)

citing_works = citing_response.json()['results']

11. Topic and Subject Analysis

Use for: Understanding research focus areas

# Get top topics for an institution

topics = client.group_by(

    entity_type='works',

    group_field='topics.id',

    filter_params={

        "authorships.institutions.id": "I136199984",  # MIT

        "publication_year": ">2020"

    }

)

for topic in topics[:10]:

    print(f"{topic['key_display_name']}: {topic['count']} works")

12. Large-Scale Data Extraction

Use for: Downloading large datasets for analysis

# Paginate through all results

all_papers = client.paginate_all(

    endpoint='/works',

    params={

        'search': 'synthetic biology',

        'filter': 'publication_year:2020-2024'

    },

    max_results=10000

)

# Export to CSV

import csv

with open('papers.csv', 'w', newline='', encoding='utf-8') as f:

    writer = csv.writer(f)

    writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status'])

    for paper in all_papers:

        writer.writerow([

            paper.get('title', 'N/A'),

            paper.get('publication_year', 'N/A'),

            paper.get('cited_by_count', 0),

            paper.get('doi', 'N/A'),

            paper.get('open_access', {}).get('oa_status', 'closed')

        ])

Critical Best Practices

Always Use Email for Polite Pool

Add email to get 10x rate limit (1 req/sec → 10 req/sec):

client = OpenAlexClient(email="your-email@example.edu")

Use Two-Step Pattern for Entity Lookups

Never filter by entity names directly - always get ID first:

# ✅ Correct

# 1. Search for entity → get ID

# 2. Filter by ID

# ❌ Wrong

# filter=author_name:Einstein  # This doesn't work!

Use Maximum Page Size

Always use per-page=200 for efficient data retrieval:

results = client.search_works(search="topic", per_page=200)

Batch Multiple IDs

Use batch_lookup() for multiple IDs instead of individual requests:

# ✅ Correct - 1 request for 50 DOIs

works = client.batch_lookup('works', doi_list, 'doi')

# ❌ Wrong - 50 separate requests

for doi in doi_list:

    work = client.get_entity('works', doi)

Use Sample Parameter for Random Data

Use sample_works() with seed for reproducible random sampling:

# ✅ Correct

works = client.sample_works(sample_size=100, seed=42)

# ❌ Wrong - random page numbers bias results

# Using random page numbers doesn't give true random sample

Select Only Needed Fields

Reduce response size by selecting specific fields:

results = client.search_works(

    search="topic",

    select=['id', 'title', 'publication_year', 'cited_by_count']

)

Common Filter Patterns

Date Ranges

# Single year

filter_params={"publication_year": "2023"}

# After year

filter_params={"publication_year": ">2020"}

# Range

filter_params={"publication_year": "2020-2024"}

Multiple Filters (AND)

# All conditions must match

filter_params={

    "publication_year": ">2020",

    "is_oa": "true",

    "cited_by_count": ">100"

}

Multiple Values (OR)

# Any institution matches

filter_params={

    "authorships.institutions.id": "I136199984|I27837315"  # MIT or Harvard

}

Collaboration (AND within attribute)

# Papers with authors from BOTH institutions

filter_params={

    "authorships.institutions.id": "I136199984+I27837315"  # MIT AND Harvard

}

Negation

# Exclude type

filter_params={

    "type": "!paratext"

}

Entity Types

OpenAlex provides these entity types:

  • works - Scholarly documents (articles, books, datasets)
  • authors - Researchers with disambiguated identities
  • institutions - Universities and research organizations
  • sources - Journals, repositories, conferences
  • topics - Subject classifications
  • publishers - Publishing organizations
  • funders - Funding agencies

Access any entity type using consistent patterns:

client.search_works(...)

client.get_entity('authors', author_id)

client.group_by('works', 'topics.id', filter_params={...})

External IDs

Use external identifiers directly:

# DOI for works

work = client.get_entity('works', 'https://doi.org/10.7717/peerj.4375')

# ORCID for authors

author = client.get_entity('authors', 'https://orcid.org/0000-0003-1613-5981')

# ROR for institutions

institution = client.get_entity('institutions', 'https://ror.org/02y3ad647')

# ISSN for sources

source = client.get_entity('sources', 'issn:0028-0836')

Reference Documentation

Detailed API Reference

See references/api_guide.md for:

  • Complete filter syntax
  • All available endpoints
  • Response structures
  • Error handling
  • Performance optimization
  • Rate limiting details

Common Query Examples

See references/common_queries.md for:

  • Complete working examples
  • Real-world use cases
  • Complex query patterns
  • Data export workflows
  • Multi-step analysis procedures

Scripts

openalex_client.py

Main API client with:

  • Automatic rate limiting
  • Exponential backoff retry logic
  • Pagination support
  • Batch operations
  • Error handling

Use for direct API access with full control.

query_helpers.py

High-level helper functions for common operations:

  • find_author_works() - Get papers by author
  • find_institution_works() - Get papers from institution
  • find_highly_cited_recent_papers() - Get influential papers
  • get_open_access_papers() - Find OA publications
  • get_publication_trends() - Analyze trends over time
  • analyze_research_output() - Comprehensive analysis

Use for common research queries with simplified interfaces.

Troubleshooting

Rate Limiting

If encountering 403 errors:

  • Ensure email is added to requests
  • Verify not exceeding 10 req/sec
  • Client automatically implements exponential backoff

Empty Results

If searches return no results:

  • Check filter syntax (see references/api_guide.md)
  • Use two-step pattern for entity lookups (don't filter by names)
  • Verify entity IDs are correct format

Timeout Errors

For large queries:

  • Use pagination with per-page=200
  • Use select= to limit returned fields
  • Break into smaller queries if needed

Rate Limits

  • Default: 1 request/second, 100k requests/day
  • Polite pool (with email): 10 requests/second, 100k requests/day

Always use polite pool for production workflows by providing email to client.

Notes

  • No authentication required
  • All data is open and free
  • Rate limits apply globally, not per IP
  • Use LitLLM with OpenRouter if LLM-based analysis is needed (don't use Perplexity API directly)
  • Client handles pagination, retries, and rate limiting automatically
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card