SKILL.md

$29

Visual Enhancement with Scientific Schematics

⚠️ MANDATORY: Every literature review MUST include at least 1-2 AI-generated figures using the scientific-schematics skill.

This is not optional. Literature reviews without visual elements are incomplete. Before finalizing any document:

Generate at minimum ONE schematic or diagram (e.g., PRISMA flow diagram for systematic reviews)

Prefer 2-3 figures for comprehensive reviews (search strategy flowchart, thematic synthesis diagram, conceptual framework)

How to generate figures:

Use the scientific-schematics skill to generate AI-powered publication-quality diagrams

Simply describe your desired diagram in natural language

Nano Banana Pro will automatically generate, review, and refine the schematic

How to generate schematics:

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

Create publication-quality images with proper formatting

Review and refine through multiple iterations

Ensure accessibility (colorblind-friendly, high contrast)

Save outputs in the figures/ directory

When to add schematics:

PRISMA flow diagrams for systematic reviews

Literature search strategy flowcharts

Thematic synthesis diagrams

Research gap visualization maps

Citation network diagrams

Conceptual framework illustrations

Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.

Core Workflow

Literature reviews follow a structured, multi-phase workflow:

Phase 1: Planning and Scoping

Define Research Question: Use PICO framework (Population, Intervention, Comparison, Outcome) for clinical/biomedical reviews

Example: "What is the efficacy of CRISPR-Cas9 (I) for treating sickle cell disease (P) compared to standard care (C)?"

Establish Scope and Objectives:

Define clear, specific research questions

Determine review type (narrative, systematic, scoping, meta-analysis)

Set boundaries (time period, geographic scope, study types)

Develop Search Strategy:

Identify 2-4 main concepts from research question

List synonyms, abbreviations, and related terms for each concept

Plan Boolean operators (AND, OR, NOT) to combine terms

Select minimum 3 complementary databases

**Use the parallel-web skill (parallel-cli search) for initial scoping** to quickly gauge the landscape before formal database searches

Set Inclusion/Exclusion Criteria:

Date range (e.g., last 10 years: 2015-2024)

Language (typically English, or specify multilingual)

Publication types (peer-reviewed, preprints, reviews)

Study designs (RCTs, observational, in vitro, etc.)

Document all criteria clearly

Phase 2: Systematic Literature Search

Multi-Database Search:

Select databases appropriate for the domain. Always start with parallel-web for broad academic coverage, then supplement with domain-specific databases.

Web-Based Academic Search (parallel-web skill — START HERE):

Use parallel-cli search with academic domain filtering for broad scholarly coverage

Run two searches: academic-focused + general to catch all relevant sources

# Academic-focused search across scholarly sources

parallel-cli search "your research topic" -q "keyword1" -q "keyword2" \

  --json --max-results 10 --excerpt-max-chars-total 27000 \

  --include-domains "scholar.google.com,arxiv.org,pubmed.ncbi.nlm.nih.gov,semanticscholar.org,biorxiv.org,medrxiv.org,ncbi.nlm.nih.gov,nature.com,science.org,ieee.org,acm.org,springer.com,wiley.com,cell.com,pnas.org,nih.gov" \

  -o sources/litreview_<topic>-academic.json

# General search for supplementary sources

parallel-cli search "your research topic" -q "keyword1" -q "keyword2" \

  --json --max-results 10 --excerpt-max-chars-total 27000 \

  -o sources/litreview_<topic>-general.json

Use parallel-cli extract to fetch full content from specific paper URLs or PDFs found in search results

parallel-cli extract "https://arxiv.org/abs/XXXX.XXXXX" --json

Biomedical & Life Sciences:

Use gget skill: gget search pubmed "search terms" for PubMed/PMC

Use gget skill: gget search biorxiv "search terms" for preprints

Use bioservices skill for ChEMBL, KEGG, UniProt, etc.

General Scientific Literature:

Search arXiv via direct API (preprints in physics, math, CS, q-bio)

Search Semantic Scholar via API (200M+ papers, cross-disciplinary)

Use Google Scholar for comprehensive coverage (manual or careful scraping)

Specialized Databases:

Use gget alphafold for protein structures

Use gget cosmic for cancer genomics

Use datacommons-client for demographic/statistical data

Use specialized databases as appropriate for the domain

Document Search Parameters:

## Search Strategy

### Database: PubMed

- **Date searched**: 2024-10-25

- **Date range**: 2015-01-01 to 2024-10-25

- **Search string**:

("CRISPR"[Title] OR "Cas9"[Title])

AND ("sickle cell"[MeSH] OR "SCD"[Title/Abstract])

AND 2015:2024[Publication Date]

- **Results**: 247 articles

Repeat for each database searched.

Export and Aggregate Results:

Export results in JSON format from each database

Combine all results into a single file

Use scripts/search_databases.py for post-processing:

python search_databases.py combined_results.json \

  --deduplicate \

  --format markdown \

  --output aggregated_results.md

Phase 3: Screening and Selection

Deduplication:

python search_databases.py results.json --deduplicate --output unique_results.json

Removes duplicates by DOI (primary) or title (fallback)

Document number of duplicates removed

Title Screening:

Review all titles against inclusion/exclusion criteria

Exclude obviously irrelevant studies

Document number excluded at this stage

Abstract Screening:

Read abstracts of remaining studies

Apply inclusion/exclusion criteria rigorously

Document reasons for exclusion

Full-Text Screening:

Obtain full texts of remaining studies

Conduct detailed review against all criteria

Document specific reasons for exclusion

Record final number of included studies

Create PRISMA Flow Diagram:

Initial search: n = X

├─ After deduplication: n = Y

├─ After title screening: n = Z

├─ After abstract screening: n = A

└─ Included in review: n = B

Phase 4: Data Extraction and Quality Assessment

Extract Key Data from each included study:

Study metadata (authors, year, journal, DOI)

Study design and methods

Sample size and population characteristics

Key findings and results

Limitations noted by authors

Funding sources and conflicts of interest

Assess Study Quality:

For RCTs: Use Cochrane Risk of Bias tool

For observational studies: Use Newcastle-Ottawa Scale

For systematic reviews: Use AMSTAR 2

Rate each study: High, Moderate, Low, or Very Low quality

Consider excluding very low-quality studies

Organize by Themes:

Identify 3-5 major themes across studies

Group studies by theme (studies may appear in multiple themes)

Note patterns, consensus, and controversies

Phase 5: Synthesis and Analysis

Create Review Document from template:

cp assets/review_template.md my_literature_review.md

Write Thematic Synthesis (NOT study-by-study summaries):

Organize Results section by themes or research questions

Synthesize findings across multiple studies within each theme

Compare and contrast different approaches and results

Identify consensus areas and points of controversy

Highlight the strongest evidence

Example structure:

#### 3.3.1 Theme: CRISPR Delivery Methods

Multiple delivery approaches have been investigated for therapeutic

gene editing. Viral vectors (AAV) were used in 15 studies^1-15^ and

showed high transduction efficiency (65-85%) but raised immunogenicity

concerns^3,7,12^. In contrast, lipid nanoparticles demonstrated lower

efficiency (40-60%) but improved safety profiles^16-23^.

Critical Analysis:

Evaluate methodological strengths and limitations across studies

Assess quality and consistency of evidence

Identify knowledge gaps and methodological gaps

Note areas requiring future research

Write Discussion:

Interpret findings in broader context

Discuss clinical, practical, or research implications

Acknowledge limitations of the review itself

Compare with previous reviews if applicable

Propose specific future research directions

Phase 6: Citation Verification

CRITICAL: All citations must be verified for accuracy before final submission.

Verify All DOIs:

python scripts/verify_citations.py my_literature_review.md

This script:

Extracts all DOIs from the document

Verifies each DOI resolves correctly

Retrieves metadata from CrossRef

Generates verification report

Outputs properly formatted citations

Review Verification Report:

Check for any failed DOIs

Verify author names, titles, and publication details match

Correct any errors in the original document

Re-run verification until all citations pass

Format Citations Consistently:

Choose one citation style and use throughout (see references/citation_styles.md)

Common styles: APA, Nature, Vancouver, Chicago, IEEE

Use verification script output to format citations correctly

Ensure in-text citations match reference list format

Phase 7: Document Generation

Generate PDF:

python scripts/generate_pdf.py my_literature_review.md \

  --citation-style apa \

  --output my_review.pdf

Options:

--citation-style: apa, nature, chicago, vancouver, ieee

--no-toc: Disable table of contents

--no-numbers: Disable section numbering

--check-deps: Check if pandoc/xelatex are installed

Review Final Output:

Check PDF formatting and layout

Verify all sections are present

Ensure citations render correctly

Check that figures/tables appear properly

Verify table of contents is accurate

Quality Checklist:

All DOIs verified with verify_citations.py

Citations formatted consistently

PRISMA flow diagram included (for systematic reviews)

Search methodology fully documented

Inclusion/exclusion criteria clearly stated

Results organized thematically (not study-by-study)

Quality assessment completed

Limitations acknowledged

References complete and accurate

PDF generates without errors

Database-Specific Search Guidance

PubMed / PubMed Central

Access via gget skill:

# Search PubMed

gget search pubmed "CRISPR gene editing" -l 100

# Search with filters

# Use PubMed Advanced Search Builder to construct complex queries

# Then execute via gget or direct Entrez API

Search tips:

Use MeSH terms: "sickle cell disease"[MeSH]

Field tags: [Title], [Title/Abstract], [Author]

Date filters: 2020:2024[Publication Date]

Boolean operators: AND, OR, NOT

See MeSH browser: https://meshb.nlm.nih.gov/search

bioRxiv / medRxiv

Access via gget skill:

gget search biorxiv "CRISPR sickle cell" -l 50

Important considerations:

Preprints are not peer-reviewed

Verify findings with caution

Check if preprint has been published (CrossRef)

Note preprint version and date

arXiv

Access via direct API or WebFetch:

# Example search categories:

# q-bio.QM (Quantitative Methods)

# q-bio.GN (Genomics)

# q-bio.MN (Molecular Networks)

# cs.LG (Machine Learning)

# stat.ML (Machine Learning Statistics)

# Search format: category AND terms

search_query = "cat:q-bio.QM AND ti:\"single cell sequencing\""

Semantic Scholar

Access via direct API (requires API key, or use free tier):

200M+ papers across all fields

Excellent for cross-disciplinary searches

Provides citation graphs and paper recommendations

Use for finding highly influential papers

Specialized Biomedical Databases

Use appropriate skills:

ChEMBL: bioservices skill for chemical bioactivity

UniProt: gget or bioservices skill for protein information

KEGG: bioservices skill for pathways and genes

COSMIC: gget skill for cancer mutations

AlphaFold: gget alphafold for protein structures

PDB: gget or direct API for experimental structures

Citation Chaining

Expand search via citation networks:

Forward citations (papers citing key papers):

Use parallel-cli search to find papers citing a specific work:

parallel-cli search "papers citing [Author et al. Year] [paper title]" \

  -q "citing" -q "[key author]" \

  --json --max-results 10 --excerpt-max-chars-total 27000 \

  --include-domains "scholar.google.com,semanticscholar.org,arxiv.org,pubmed.ncbi.nlm.nih.gov" \

  -o sources/litreview_forward_citations.json

Use Google Scholar "Cited by"

Use Semantic Scholar or OpenAlex APIs

Identifies newer research building on seminal work

Backward citations (references from key papers):

Use parallel-cli extract to fetch full text of key papers and extract their reference lists:

parallel-cli extract "https://doi.org/10.xxxx/yyyy" --json

Extract references from included papers

Identify highly cited foundational work

Find papers cited by multiple included studies

Citation Style Guide

Detailed formatting guidelines are in references/citation_styles.md. Quick reference:

APA (7th Edition)

In-text: (Smith et al., 2023)

Reference: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Title. Journal, 22(4), 301-318. https://doi.org/10.xxx/yyy

Nature

In-text: Superscript numbers^1,2^

Reference: Smith, J. D., Johnson, M. L. & Williams, K. R. Title. Nat. Rev. Drug Discov. 22, 301-318 (2023).

Vancouver

In-text: Superscript numbers^1,2^

Reference: Smith JD, Johnson ML, Williams KR. Title. Nat Rev Drug Discov. 2023;22(4):301-18.

Always verify citations with verify_citations.py before finalizing.

Prioritizing High-Impact Papers (CRITICAL)

Always prioritize influential, highly-cited papers from reputable authors and top venues. Quality matters more than quantity in literature reviews.

#### Citation Count Thresholds

Use citation counts to identify the most impactful papers:

Paper Age

Citation Threshold

Classification

0-3 years

20+ citations

Noteworthy

0-3 years

100+ citations

Highly Influential

3-7 years

100+ citations

Significant

3-7 years

500+ citations

Landmark Paper

7+ years

500+ citations

Seminal Work

7+ years

1000+ citations

Foundational

#### Journal and Venue Tiers

Prioritize papers from higher-tier venues:

Tier 1 (Always Prefer): Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, Nature Medicine, Nature Biotechnology

Tier 2 (Strong Preference): High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML for ML/AI)

Tier 3 (Include When Relevant): Respected specialized journals (IF 5-10)

Tier 4 (Use Sparingly): Lower-impact peer-reviewed venues

#### Author Reputation Assessment

Prefer papers from:

Senior researchers with high h-index (>40 in established fields)

Leading research groups at recognized institutions (Harvard, Stanford, MIT, Oxford, etc.)

Authors with multiple Tier-1 publications in the relevant field

Researchers with recognized expertise (awards, editorial positions, society fellows)

#### Identifying Seminal Papers

For any topic, identify foundational work by:

High citation count (typically 500+ for papers 5+ years old)

Frequently cited by other included studies (appears in many reference lists)

Published in Tier-1 venues (Nature, Science, Cell family)

Written by field pioneers (often cited as establishing concepts)

Best Practices

Search Strategy

Start with parallel-web: Use parallel-cli search with academic domains for initial broad coverage before querying specialized databases

Use multiple databases (minimum 3): Ensures comprehensive coverage — parallel-web counts as one source

Include preprint servers: Captures latest unpublished findings

Document everything: Search strings, dates, result counts for reproducibility — save all parallel-cli output to sources/

Test and refine: Run pilot searches, review results, adjust search terms

Sort by citations: When available, sort search results by citation count to surface influential work first

Use parallel-cli extract: Fetch full content from promising URLs found during search to verify relevance before full-text screening

Screening and Selection

Use multiple databases (minimum 3): Ensures comprehensive coverage

Include preprint servers: Captures latest unpublished findings

Document everything: Search strings, dates, result counts for reproducibility

Test and refine: Run pilot searches, review results, adjust search terms

Screening and Selection

Use clear criteria: Document inclusion/exclusion criteria before screening

Screen systematically: Title → Abstract → Full text

Document exclusions: Record reasons for excluding studies

Consider dual screening: For systematic reviews, have two reviewers screen independently

Synthesis

Organize thematically: Group by themes, NOT by individual studies

Synthesize across studies: Compare, contrast, identify patterns

Be critical: Evaluate quality and consistency of evidence

Identify gaps: Note what's missing or understudied

Quality and Reproducibility

Assess study quality: Use appropriate quality assessment tools

Verify all citations: Run verify_citations.py script

Document methodology: Provide enough detail for others to reproduce

Follow guidelines: Use PRISMA for systematic reviews

Writing

Be objective: Present evidence fairly, acknowledge limitations

Be systematic: Follow structured template

Be specific: Include numbers, statistics, effect sizes where available

Be clear: Use clear headings, logical flow, thematic organization

Common Pitfalls to Avoid

Single database search: Misses relevant papers; always search multiple databases

No search documentation: Makes review irreproducible; document all searches

Study-by-study summary: Lacks synthesis; organize thematically instead

Unverified citations: Leads to errors; always run verify_citations.py

Too broad search: Yields thousands of irrelevant results; refine with specific terms

Too narrow search: Misses relevant papers; include synonyms and related terms

Ignoring preprints: Misses latest findings; include bioRxiv, medRxiv, arXiv

No quality assessment: Treats all evidence equally; assess and report quality

Publication bias: Only positive results published; note potential bias

Outdated search: Field evolves rapidly; clearly state search date

Example Workflow

Complete workflow for a biomedical literature review:

# 1. Create review document from template

cp assets/review_template.md crispr_sickle_cell_review.md

# 2. Start with parallel-web for broad academic search

parallel-cli search "CRISPR Cas9 sickle cell disease gene therapy efficacy" \

  -q "CRISPR" -q "sickle cell" -q "gene therapy" \

  --json --max-results 10 --excerpt-max-chars-total 27000 \

  --include-domains "scholar.google.com,arxiv.org,pubmed.ncbi.nlm.nih.gov,semanticscholar.org,biorxiv.org,nature.com,science.org,cell.com,pnas.org,nih.gov" \

  -o sources/litreview_crispr_scd-academic.json

parallel-cli search "CRISPR sickle cell disease clinical trials treatment" \

  -q "CRISPR" -q "sickle cell" \

  --json --max-results 10 --excerpt-max-chars-total 27000 \

  -o sources/litreview_crispr_scd-general.json

# 3. Search specialized databases using appropriate skills

# - Use gget skill for PubMed, bioRxiv

# - Use direct API access for arXiv, Semantic Scholar

# - Export results in JSON format

# 4. Aggregate and process results (combine parallel-cli + database results)

python scripts/search_databases.py combined_results.json \

  --deduplicate \

  --rank citations \

  --year-start 2015 \

  --year-end 2024 \

  --format markdown \

  --output search_results.md \

  --summary

# 5. Screen results and extract data

# - Use parallel-cli extract to fetch full content from promising URLs

# - Manually screen titles, abstracts, full texts

# - Extract key data into the review document

# - Organize by themes

# 6. Write the review following template structure

# - Introduction with clear objectives

# - Detailed methodology section

# - Results organized thematically

# - Critical discussion

# - Clear conclusions

# 7. Verify all citations

python scripts/verify_citations.py crispr_sickle_cell_review.md

# Review the citation report

cat crispr_sickle_cell_review_citation_report.json

# Fix any failed citations and re-verify

python scripts/verify_citations.py crispr_sickle_cell_review.md

# 8. Generate professional PDF

python scripts/generate_pdf.py crispr_sickle_cell_review.md \

  --citation-style nature \

  --output crispr_sickle_cell_review.pdf

# 9. Review final PDF and markdown outputs

Integration with Other Skills

This skill works seamlessly with other scientific skills:

Web Search & Extraction (parallel-web skill — PRIMARY)

parallel-cli search: Broad academic and general web search with domain filtering — use for initial scoping, finding papers, citation chaining, and supplementary searches

parallel-cli extract: Fetch full content from paper URLs, journal websites, and preprint servers — use for reading abstracts, extracting reference lists, and verifying paper details

parallel-cli search --include-domains: Academic-focused search across scholarly domains (arxiv.org, pubmed, nature.com, etc.)

Database Access Skills

gget: PubMed, bioRxiv, COSMIC, AlphaFold, Ensembl, UniProt

bioservices: ChEMBL, KEGG, Reactome, UniProt, PubChem

datacommons-client: Demographics, economics, health statistics

Analysis Skills

pydeseq2: RNA-seq differential expression (for methods sections)

scanpy: Single-cell analysis (for methods sections)

anndata: Single-cell data (for methods sections)

biopython: Sequence analysis (for background sections)

Visualization Skills

matplotlib: Generate figures and plots for review

seaborn: Statistical visualizations

Writing Skills

brand-guidelines: Apply institutional branding to PDF

internal-comms: Adapt review for different audiences

Resources

Bundled Resources

Scripts:

scripts/verify_citations.py: Verify DOIs and generate formatted citations

scripts/generate_pdf.py: Convert markdown to professional PDF

scripts/search_databases.py: Process, deduplicate, and format search results

References:

references/citation_styles.md: Detailed citation formatting guide (APA, Nature, Vancouver, Chicago, IEEE)

references/database_strategies.md: Comprehensive database search strategies

Assets:

assets/review_template.md: Complete literature review template with all sections

External Resources

Guidelines:

PRISMA (Systematic Reviews): http://www.prisma-statement.org/

Cochrane Handbook: https://training.cochrane.org/handbook

AMSTAR 2 (Review Quality): https://amstar.ca/

Tools:

MeSH Browser: https://meshb.nlm.nih.gov/search

PubMed Advanced Search: https://pubmed.ncbi.nlm.nih.gov/advanced/

Boolean Search Guide: https://www.ncbi.nlm.nih.gov/books/NBK3827/

Citation Styles:

APA Style: https://apastyle.apa.org/

Nature Portfolio: https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards

NLM/Vancouver: https://www.nlm.nih.gov/bsd/uniform_requirements.html

Dependencies

Required CLI Tools

# parallel-cli (PRIMARY — for web search and URL extraction)

curl -fsSL https://parallel.ai/install.sh | bash

# Or: uv tool install "parallel-web-tools[cli]"

# Authenticate: parallel-cli auth

Required Python Packages

pip install requests  # For citation verification

Required System Tools

# For PDF generation

brew install pandoc  # macOS

apt-get install pandoc  # Linux

# For LaTeX (PDF generation)

brew install --cask mactex  # macOS

apt-get install texlive-xetex  # Linux

Check dependencies:

python scripts/generate_pdf.py --check-deps

Summary

This literature-review skill provides:

Systematic methodology following academic best practices

Parallel-web powered search using parallel-cli search for fast, broad academic literature discovery with scholarly domain filtering

Multi-database integration via existing scientific skills (gget, bioservices, datacommons-client)

Citation verification ensuring accuracy and credibility

Professional output in markdown and PDF formats

Comprehensive guidance covering the entire review process

Quality assurance with verification and validation tools

Reproducibility through detailed documentation requirements

Conduct thorough, rigorous literature reviews that meet academic standards and provide comprehensive synthesis of current knowledge in any domain.

literature-review

SKILL.md

Visual Enhancement with Scientific Schematics

Core Workflow

Phase 1: Planning and Scoping

Phase 2: Systematic Literature Search

Phase 3: Screening and Selection

Phase 4: Data Extraction and Quality Assessment

Phase 5: Synthesis and Analysis

Phase 6: Citation Verification

Phase 7: Document Generation

Database-Specific Search Guidance

PubMed / PubMed Central

bioRxiv / medRxiv

arXiv

Semantic Scholar

Specialized Biomedical Databases

Citation Chaining

Citation Style Guide

APA (7th Edition)

Nature

Vancouver

Prioritizing High-Impact Papers (CRITICAL)

Best Practices

Search Strategy

Screening and Selection

Screening and Selection

Synthesis

Quality and Reproducibility

Writing

Common Pitfalls to Avoid

Example Workflow

Integration with Other Skills

Web Search &#x26; Extraction (parallel-web skill — PRIMARY)

Database Access Skills

Analysis Skills

Visualization Skills

Writing Skills

Resources

Bundled Resources

External Resources

Dependencies

Required CLI Tools

Required Python Packages

Required System Tools

Summary

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Web Search & Extraction (parallel-web skill — PRIMARY)