m10-performance

Name: m10-performance
Author: actionbook

CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache,…

INSTALLATION

npx skills add https://github.com/actionbook/rust-skills --skill m10-performance

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

Have you measured? (Don't guess)

What's the acceptable performance?

Will optimization add complexity?

Performance Decision → Implementation

Goal

Design Choice

Implementation

Reduce allocations

Pre-allocate, reuse

with_capacity, object pools

Improve cache

Contiguous data

Vec, SmallVec

Parallelize

Data parallelism

rayon, threads

Avoid copies

Zero-copy

References, Cow<T>

Reduce indirection

Inline data

smallvec, arrays

Thinking Prompt

Before optimizing:

Have you measured?

Profile first → flamegraph, perf

Benchmark → criterion, cargo bench

Identify actual hotspots

What's the priority?

Algorithm (10x-1000x improvement)

Data structure (2x-10x)

Allocation (2x-5x)

Cache (1.5x-3x)

What's the trade-off?

Complexity vs speed

Memory vs CPU

Latency vs throughput

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?"

    ↑ Ask: What's the performance SLA?

    ↑ Check: domain-* (latency requirements)

    ↑ Check: Business requirements (acceptable response time)

Question

Trace To

Ask

Latency requirements

domain-*

What's acceptable response time?

Throughput needs

domain-*

How many requests per second?

Memory constraints

domain-*

What's the memory budget?

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations"

    ↓ m01-ownership: Use references, avoid clone

    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"

    ↓ m07-concurrency: Choose rayon or threads

    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"

    ↓ Data layout: Prefer Vec over HashMap when possible

    ↓ Access patterns: Sequential over random access

Quick Reference

Tool

Purpose

cargo bench

Micro-benchmarks

criterion

Statistical benchmarks

perf / flamegraph

CPU profiling

heaptrack

Allocation tracking

valgrind / cachegrind

Cache analysis

Optimization Priority

1. Algorithm choice     (10x - 1000x)

2. Data structure       (2x - 10x)

3. Allocation reduction (2x - 5x)

4. Cache optimization   (1.5x - 3x)

5. SIMD/Parallelism     (2x - 8x)

Common Techniques

Technique

When

How

Pre-allocation

Known size

Vec::with_capacity(n)

Avoid cloning

Hot paths

Use references or Cow<T>

Batch operations

Many small ops

Collect then process

SmallVec

Usually small

smallvec::SmallVec<[T; N]>

Inline buffers

Fixed-size data

Arrays over Vec

Common Mistakes

Mistake

Why Wrong

Better

Optimize without profiling

Wrong target

Profile first

Benchmark in debug mode

Meaningless

Always --release

Use LinkedList

Cache unfriendly

Vec or VecDeque

Hidden .clone()

Unnecessary allocs

Use references

Premature optimization

Wasted effort

Make it work first

Anti-Patterns

Anti-Pattern

Why Bad

Better

Clone to avoid lifetimes

Performance cost

Proper ownership

Box everything

Indirection cost

Stack when possible

HashMap for small sets

Overhead

Vec with linear search

String concat in loop

O(n^2)

String::with_capacity or format!

Related Skills

When

See

Reducing clones

m01-ownership

Concurrency options

m07-concurrency

Smart pointer choice

m02-resource

Domain requirements

domain-*

m10-performance

SKILL.md

Performance Optimization

Core Question

Performance Decision → Implementation

Thinking Prompt

Trace Up ↑

Trace Down ↓

Quick Reference

Optimization Priority

Common Techniques

Common Mistakes

Anti-Patterns

Related Skills

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers