tooluniverse-image-analysis

Production-ready microscopy image analysis and quantitative imaging data skill for colony morphometry, cell counting, fluorescence quantification, and…

INSTALLATION
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-image-analysis
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Microscopy Image Analysis and Quantitative Imaging Data

Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image.

LOOK UP, DON'T GUESS

When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.

When to Use

  • Microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
  • Colony morphometry, cell counting statistics, fluorescence quantification
  • Statistical comparisons (t-test, ANOVA, Dunnett's, Mann-Whitney, Cohen's d, power analysis)
  • Regression models (polynomial, spline) for dose-response or ratio data
  • Imaging software output (ImageJ, CellProfiler, QuPath)

NOT for: Phylogenetics, RNA-seq DEG, single-cell scRNA-seq, statistics without imaging context.

Core Principles

  • Data-first - Load and inspect all CSV/TSV before analysis
  • Question-driven - Parse the exact statistic requested
  • Statistical rigor - Effect sizes, multiple comparison corrections, model selection
  • Imaging-aware - Understand ImageJ/CellProfiler columns (Area, Circularity, Round, Intensity)
  • Precision - Match expected answer format (integer, range, decimal places)

Required Packages

import pandas as pd, numpy as np

from scipy import stats

from scipy.interpolate import BSpline, make_interp_spline

import statsmodels.api as sm

from statsmodels.formula.api import ols

from statsmodels.stats.power import TTestIndPower

from patsy import dmatrix, bs, cr

# Optional: skimage, cv2, tifffile

Workflow Decision Tree

PRE-QUANTIFIED DATA (CSV/TSV) → Load → Parse question → Statistical analysis

RAW IMAGES (TIFF, PNG) → Load → Segment → Measure → Analyze (see references/)

Statistical comparison:

  Two groups → t-test or Mann-Whitney

  Multiple groups vs control → Dunnett's test

  Two factors → Two-way ANOVA

  Effect size → Cohen's d + power analysis

Regression:

  Dose-response → Polynomial (quadratic/cubic)

  Ratio optimization → Natural spline

  Model comparison → R-squared, F-stat, AIC/BIC

Analysis Workflow

Phase 0: Question Parsing and Data Discovery

import os, glob, pandas as pd

csv_files = glob.glob(os.path.join(".", '**', '*.csv'), recursive=True)

df = pd.read_csv(csv_files[0])

print(f"Shape: {df.shape}, Columns: {list(df.columns)}")

Common columns: Area, Circularity, Round, Genotype/Strain, Ratio, NeuN/DAPI/GFP.

Phase 1-3: Grouped Stats → Statistical Testing → Regression

See references/statistical_analysis.md for complete implementations of grouped_summary, Dunnett's, Cohen's d, power analysis, polynomial/spline regression.

Common BixBench Patterns

Pattern

Example Question

Workflow

Colony Morphometry (bix-18)

"Mean circularity of genotype with largest area?"

Group by Genotype → max mean Area → report Circularity

Cell Counting (bix-19)

"Cohen's d for NeuN counts?"

Filter → split by Condition → pooled SD → Cohen's d

Multi-Group (bix-41)

"How many ratios equivalent to control?"

Dunnett's for Area AND Circularity → count non-significant in BOTH

Regression (bix-54)

"Peak frequency from natural spline?"

Ratio→frequency → spline(df=4) → grid search peak → CI

Raw Image Processing

from scripts.segment_cells import count_cells_in_image

result = count_cells_in_image(image_path="cells.tif", channel=0, min_area=50)

Segmentation: Nuclei → Otsu+watershed; Colonies → Otsu; Phase contrast → adaptive threshold.

See references/segmentation.md, references/cell_counting.md, references/image_processing.md.

R-to-Python Equivalents

  • R Dunnett (multcomp::glht) → scipy.stats.dunnett() (scipy >= 1.10)
  • R natural spline (ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knots
  • R t.test()scipy.stats.ttest_ind()
  • R aov()statsmodels.formula.api.ols() + sm.stats.anova_lm()

Answer Formatting

  • "to the nearest thousand": int(round(val, -3))
  • Cohen's d: 3 decimal places
  • Sample sizes: integer (ceiling)
  • Ratios: string "5:1"

Evidence Grading

Grade

Criteria

Strong

p < 0.001, d > 0.8, N >= 30/group

Moderate

p < 0.05, 0.5 <= d < 0.8

Weak

p < 0.05, d < 0.5 or low N

Insufficient

p >= 0.05 or N < 5/group

Circularity near 1.0 = round/healthy; < 0.5 = irregular. Post-hoc power < 0.80 = underpowered.

References

Scripts: segment_cells.py, measure_fluorescence.py, batch_process.py, colony_morphometry.py, statistical_comparison.py

Docs: statistical_analysis.md, cell_counting.md, segmentation.md, fluorescence_analysis.md, image_processing.md

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card