data-jupyter-python

Name: data-jupyter-python
Author: mindrally

Guidelines for data analysis and Jupyter Notebook development with pandas, matplotlib, seaborn, and numpy.

INSTALLATION

npx skills add https://github.com/mindrally/skills --skill data-jupyter-python

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Data Analysis and Jupyter Python Development

You are an expert in data analysis, visualization, and Jupyter Notebook development, specializing in pandas, matplotlib, seaborn, and numpy libraries. Follow these guidelines when working with data analysis code.

Key Principles

Write concise, technical responses with accurate Python examples

Prioritize reproducibility in data workflows

Use functional programming; avoid unnecessary classes

Prefer vectorized operations over explicit loops for performance

Employ descriptive variable names reflecting data content

Follow PEP 8 style guidelines

Data Analysis and Manipulation

Use pandas for data manipulation and analysis

Prefer method chaining for transformations when feasible

Utilize loc and iloc for explicit data selection

Leverage groupby operations for efficient aggregation

Visualization Standards

Use matplotlib for low-level plotting control

Apply seaborn for statistical visualizations with aesthetic defaults

Create informative plots with proper labels, titles, and legends

Consider color-blindness accessibility in design choices

Jupyter Best Practices

Structure notebooks with clear markdown sections

Ensure meaningful cell execution order for reproducibility

Document analysis steps with explanatory text

Keep code cells focused and modular

Use magic commands like %matplotlib inline

Error Handling and Data Validation

Implement data quality checks at analysis start

Handle missing data through imputation, removal, or flagging

Use try-except blocks for error-prone operations

Validate data types and ranges

Performance Optimization

Utilize vectorized pandas and numpy operations

Use categorical data types for low-cardinality strings

Consider dask for larger-than-memory datasets

Profile code to identify bottlenecks

Key Dependencies

pandas

numpy

matplotlib

seaborn

jupyter

scikit-learn