domain-ml

Name: domain-ml
Author: actionbook

Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction,…

INSTALLATION

npx skills add https://github.com/actionbook/rust-skills --skill domain-ml

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Machine Learning Domain

Layer 3: Domain Constraints

Domain Constraints → Design Implications

Domain Rule

Design Constraint

Rust Implication

Large data

Efficient memory

Zero-copy, streaming

GPU acceleration

CUDA/Metal support

candle, tch-rs

Model portability

Standard formats

ONNX

Batch processing

Throughput over latency

Batched inference

Numerical precision

Float handling

ndarray, careful f32/f64

Reproducibility

Deterministic

Seeded random, versioning

Critical Constraints

Memory Efficiency

RULE: Avoid copying large tensors

WHY: Memory bandwidth is bottleneck

RUST: References, views, in-place ops

GPU Utilization

RULE: Batch operations for GPU efficiency

WHY: GPU overhead per kernel launch

RUST: Batch sizes, async data loading

Model Portability

RULE: Use standard model formats

WHY: Train in Python, deploy in Rust

RUST: ONNX via tract or candle

Trace Down ↓

From constraints to design (Layer 2):

"Need efficient data pipelines"

    ↓ m10-performance: Streaming, batching

    ↓ polars: Lazy evaluation

"Need GPU inference"

    ↓ m07-concurrency: Async data loading

    ↓ candle/tch-rs: CUDA backend

"Need model loading"

    ↓ m12-lifecycle: Lazy init, caching

    ↓ tract: ONNX runtime

Use Case → Framework

Use Case

Recommended

Why

Inference only

tract (ONNX)

Lightweight, portable

Training + inference

candle, burn

Pure Rust, GPU

PyTorch models

tch-rs

Direct bindings

Data pipelines

polars

Fast, lazy eval

Key Crates

Purpose

Crate

Tensors

ndarray

ONNX inference

tract

ML framework

candle, burn

PyTorch bindings

tch-rs

Data processing

polars

Embeddings

fastembed

Design Patterns

Pattern

Purpose

Implementation

Model loading

Once, reuse

OnceLock<Model>

Batching

Throughput

Collect then process

Streaming

Large data

Iterator-based

GPU async

Parallelism

Data loading parallel to compute

Code Pattern: Inference Server

use std::sync::OnceLock;

use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &#x26;'static SimplePlan<...> {

    MODEL.get_or_init(|| {

        tract_onnx::onnx()

            .model_for_path("model.onnx")

            .unwrap()

            .into_optimized()

            .unwrap()

            .into_runnable()

            .unwrap()

    })

}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {

    let model = get_model();

    let input = tract_ndarray::arr1(&#x26;input).into_shape((1, input.len()))?;

    let result = model.run(tvec!(input.into()))?;

    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())

}

Code Pattern: Batched Inference

async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {

    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {

        // Stack inputs into batch tensor

        let batch_tensor = stack_inputs(batch);

        // Run inference on batch

        let batch_output = model.run(batch_tensor).await;

        // Unstack results

        results.extend(unstack_outputs(batch_output));

    }

    results

}

Common Mistakes

Mistake

Domain Violation

Fix

Clone tensors

Memory waste

Use views

Single inference

GPU underutilized

Batch processing

Load model per request

Slow

Singleton pattern

Sync data loading

GPU idle

Async pipeline

Trace to Layer 1

Constraint

Layer 2 Pattern

Layer 1 Implementation

Memory efficiency

Zero-copy

ndarray views

Model singleton

Lazy init

OnceLock

Batch processing

Chunked iteration

chunks() + parallel

GPU async

Concurrent loading

tokio::spawn + GPU

Related Skills

When

See

Performance

m10-performance

Lazy initialization

m12-lifecycle

Async patterns

m07-concurrency

Memory efficiency

m01-ownership

domain-ml

SKILL.md

Machine Learning Domain

Domain Constraints → Design Implications

Critical Constraints

Memory Efficiency

GPU Utilization

Model Portability

Trace Down ↓

Use Case → Framework

Key Crates

Design Patterns

Code Pattern: Inference Server

Code Pattern: Batched Inference

Common Mistakes

Trace to Layer 1

Related Skills

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers