m13-domain-error

Name: m13-domain-error
Author: zhanghandong

Design error handling by categorizing who handles each error and how they recover. Distinguish between user-facing errors (actionable messages), internal errors (debug details), system errors (monitoring), and transient vs. permanent failures to determine recovery strategy Use typed error enums with thiserror and implement is_retryable() checks to enable appropriate handling patterns Apply recovery strategies: retry with exponential backoff for transient failures, fallback values for degraded modes, circuit breakers for cascading failures, and timeouts for slow operations Include error context via .context() for debugging and structured logging with request IDs; avoid exposing internal details to end users

INSTALLATION

npx skills add https://github.com/zhanghandong/rust-skills --skill m13-domain-error

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Domain Error Strategy

Layer 2: Design Choices

Core Question

Who needs to handle this error, and how should they recover?

Before designing error types:

Is this user-facing or internal?

Is recovery possible?

What context is needed for debugging?

Error Categorization

Error Type

Audience

Recovery

Example

User-facing

End users

Guide action

InvalidEmail, NotFound

Internal

Developers

Debug info

DatabaseError, ParseError

System

Ops/SRE

Monitor/alert

ConnectionTimeout, RateLimited

Transient

Automation

Retry

NetworkError, ServiceUnavailable

Permanent

Human

Investigate

ConfigInvalid, DataCorrupted

Thinking Prompt

Before designing error types:

Who sees this error?

End user → friendly message, actionable

Developer → detailed, debuggable

Ops → structured, alertable

Can we recover?

Transient → retry with backoff

Degradable → fallback value

Permanent → fail fast, alert

What context is needed?

Call chain → anyhow::Context

Request ID → structured logging

Input data → error payload

Trace Up ↑

To domain constraints (Layer 3):

"How should I handle payment failures?"

    ↑ Ask: What are the business rules for retries?

    ↑ Check: domain-fintech (transaction requirements)

    ↑ Check: SLA (availability requirements)

Question

Trace To

Ask

Retry policy

domain-*

What's acceptable latency for retry?

User experience

domain-*

What message should users see?

Compliance

domain-*

What must be logged for audit?

Trace Down ↓

To implementation (Layer 1):

"Need typed errors"

    ↓ m06-error-handling: thiserror for library

    ↓ m04-zero-cost: Error enum design

"Need error context"

    ↓ m06-error-handling: anyhow::Context

    ↓ Logging: tracing with fields

"Need retry logic"

    ↓ m07-concurrency: async retry patterns

    ↓ Crates: tokio-retry, backoff

Quick Reference

Recovery Pattern

When

Implementation

Retry

Transient failures

exponential backoff

Fallback

Degraded mode

cached/default value

Circuit Breaker

Cascading failures

failsafe-rs

Timeout

Slow operations

tokio::time::timeout

Bulkhead

Isolation

separate thread pools

Error Hierarchy

#[derive(thiserror::Error, Debug)]

pub enum AppError {

    // User-facing

    #[error("Invalid input: {0}")]

    Validation(String),

    // Transient (retryable)

    #[error("Service temporarily unavailable")]

    ServiceUnavailable(#[source] reqwest::Error),

    // Internal (log details, show generic)

    #[error("Internal error")]

    Internal(#[source] anyhow::Error),

}

impl AppError {

    pub fn is_retryable(&#x26;self) -> bool {

        matches!(self, Self::ServiceUnavailable(_))

    }

}

Retry Pattern

use tokio_retry::{Retry, strategy::ExponentialBackoff};

async fn with_retry<F, T, E>(f: F) -> Result<T, E>

where

    F: Fn() -> impl Future<Output = Result<T, E>>,

    E: std::fmt::Debug,

{

    let strategy = ExponentialBackoff::from_millis(100)

        .max_delay(Duration::from_secs(10))

        .take(5);

    Retry::spawn(strategy, || f()).await

}

Common Mistakes

Mistake

Why Wrong

Better

Same error for all

No actionability

Categorize by audience

Retry everything

Wasted resources

Only transient errors

Infinite retry

DoS self

Max attempts + backoff

Expose internal errors

Security risk

User-friendly messages

No context

Hard to debug

.context() everywhere

Anti-Patterns

Anti-Pattern

Why Bad

Better

String errors

No structure

thiserror types

panic! for recoverable

Bad UX

Result with context

Ignore errors

Silent failures

Log or propagate

Box everywhere

Lost type info

thiserror

Error in happy path

Performance

Early validation

Related Skills

When

See

Error handling basics

m06-error-handling

Retry implementation

m07-concurrency

Domain modeling

m09-domain

User-facing APIs

domain-*

m13-domain-error

SKILL.md

Domain Error Strategy

Core Question

Error Categorization

Thinking Prompt

Trace Up ↑

Trace Down ↓

Quick Reference

Error Hierarchy

Retry Pattern

Common Mistakes

Anti-Patterns

Related Skills

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers