ml-pipeline

Production-grade ML pipeline infrastructure with experiment tracking, orchestration, feature stores, and automated model lifecycle management. Covers end-to-end pipeline design: data validation, feature engineering, distributed training orchestration, experiment tracking, and model evaluation gates Supports multiple orchestration frameworks (Kubeflow, Airflow, Prefect) and experiment tracking systems (MLflow, Weights & Biases) with code templates and reference guides Enforces reproducibility through versioning (DVC, Git tags, model registry), pinned dependencies, logged hyperparameters, and containerized environments Includes data validation checkpoints, hyperparameter tuning configuration, A/B testing patterns, and deployment strategies with rollback support

INSTALLATION
npx skills add https://github.com/jeffallan/claude-skills --skill ml-pipeline
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2c

Topic

Reference

Load When

Feature Engineering

references/feature-engineering.md

Feature pipelines, transformations, feature stores, Feast, data validation

Training Pipelines

references/training-pipelines.md

Training orchestration, distributed training, hyperparameter tuning, resource management

Experiment Tracking

references/experiment-tracking.md

MLflow, Weights & Biases, experiment logging, model registry

Pipeline Orchestration

references/pipeline-orchestration.md

Kubeflow Pipelines, Airflow, Prefect, DAG design, workflow automation

Model Validation

references/model-validation.md

Evaluation strategies, validation workflows, A/B testing, shadow deployment

Code Templates

MLflow Experiment Logging (minimal reproducible example)

import mlflow

import mlflow.sklearn

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, f1_score

import numpy as np

# Pin random state for reproducibility

SEED = 42

np.random.seed(SEED)

mlflow.set_experiment("my-classifier-experiment")

with mlflow.start_run():

    # Log all hyperparameters — never hardcode silently

    params = {"n_estimators": 100, "max_depth": 5, "random_state": SEED}

    mlflow.log_params(params)

    model = RandomForestClassifier(**params)

    model.fit(X_train, y_train)

    preds = model.predict(X_test)

    # Log metrics

    mlflow.log_metric("accuracy", accuracy_score(y_test, preds))

    mlflow.log_metric("f1", f1_score(y_test, preds, average="weighted"))

    # Log and register the model artifact

    mlflow.sklearn.log_model(model, artifact_path="model",

                             registered_model_name="my-classifier")

Kubeflow Pipeline Component (single-step template)

from kfp.v2 import dsl

from kfp.v2.dsl import component, Input, Output, Dataset, Model, Metrics

@component(base_image="python:3.10", packages_to_install=["scikit-learn", "mlflow"])

def train_model(

    train_data: Input[Dataset],

    model_output: Output[Model],

    metrics_output: Output[Metrics],

    n_estimators: int = 100,

    max_depth: int = 5,

):

    import pandas as pd

    from sklearn.ensemble import RandomForestClassifier

    import pickle, json

    df = pd.read_csv(train_data.path)

    X, y = df.drop("label", axis=1), df["label"]

    model = RandomForestClassifier(n_estimators=n_estimators,

                                   max_depth=max_depth, random_state=42)

    model.fit(X, y)

    with open(model_output.path, "wb") as f:

        pickle.dump(model, f)

    metrics_output.log_metric("train_samples", len(df))

@dsl.pipeline(name="training-pipeline")

def training_pipeline(data_path: str, n_estimators: int = 100):

    train_step = train_model(n_estimators=n_estimators)

    # Chain additional steps (validate, register, deploy) here

Data Validation Checkpoint (Great Expectations style)

import great_expectations as ge

def validate_training_data(df):

    """Run schema and distribution checks. Raise on failure — never skip."""

    gdf = ge.from_pandas(df)

    results = gdf.expect_column_values_to_not_be_null("label")

    results &= gdf.expect_column_values_to_be_between("feature_1", 0, 1)

    if not results["success"]:

        raise ValueError(f"Data validation failed: {results['result']}")

    return df  # safe to proceed to training

Constraints

Always:

  • Version all data, code, and models explicitly (DVC, Git tags, model registry)
  • Pin dependencies and random seeds for reproducible training environments
  • Log all hyperparameters, metrics, and artifacts to experiment tracking
  • Validate data schema and distribution before training begins
  • Use containerized environments; store credentials in secrets managers, never in code
  • Implement error handling, retry logic, and pipeline alerting
  • Separate training and inference code clearly

Never:

  • Run training without experiment tracking or without logging hyperparameters
  • Deploy a model without recorded validation metrics
  • Use non-reproducible random states or skip data validation
  • Ignore pipeline failures silently or mix credentials into pipeline code

Output Format

When implementing a pipeline, provide:

  • Complete pipeline definition (Kubeflow DAG, Airflow DAG, or equivalent) — use the templates above as starting structure
  • Feature engineering code with inline data validation calls
  • Training script with MLflow (or equivalent) experiment logging
  • Model evaluation code with explicit pass/fail thresholds
  • Deployment configuration and rollback strategy
  • Brief explanation of architecture decisions and reproducibility measures

Knowledge Reference

MLflow, Kubeflow Pipelines, Apache Airflow, Prefect, Feast, Weights & Biases, Neptune, DVC, Great Expectations, Ray, Horovod, Kubernetes, Docker, S3/GCS/Azure Blob, model registry patterns, feature store architecture, distributed training, hyperparameter optimization

Documentation

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card