m13-domain-error

Design error handling by categorizing who handles each error and how they recover. Distinguish between user-facing errors (actionable messages), internal errors (debug details), system errors (monitoring), and transient vs. permanent failures to determine recovery strategy Use typed error enums with thiserror and implement is_retryable() checks to enable appropriate handling patterns Apply recovery strategies: retry with exponential backoff for transient failures, fallback values for degraded modes, circuit breakers for cascading failures, and timeouts for slow operations Include error context via .context() for debugging and structured logging with request IDs; avoid exposing internal details to end users

INSTALLATION
npx skills add https://github.com/zhanghandong/rust-skills --skill m13-domain-error
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Domain Error Strategy

Layer 2: Design Choices

Core Question

Who needs to handle this error, and how should they recover?

Before designing error types:

  • Is this user-facing or internal?
  • Is recovery possible?
  • What context is needed for debugging?

Error Categorization

Error Type

Audience

Recovery

Example

User-facing

End users

Guide action

InvalidEmail, NotFound

Internal

Developers

Debug info

DatabaseError, ParseError

System

Ops/SRE

Monitor/alert

ConnectionTimeout, RateLimited

Transient

Automation

Retry

NetworkError, ServiceUnavailable

Permanent

Human

Investigate

ConfigInvalid, DataCorrupted

Thinking Prompt

Before designing error types:

-

Who sees this error?

  • End user → friendly message, actionable
  • Developer → detailed, debuggable
  • Ops → structured, alertable

-

Can we recover?

  • Transient → retry with backoff
  • Degradable → fallback value
  • Permanent → fail fast, alert

-

What context is needed?

  • Call chain → anyhow::Context
  • Request ID → structured logging
  • Input data → error payload

Trace Up ↑

To domain constraints (Layer 3):

"How should I handle payment failures?"

    ↑ Ask: What are the business rules for retries?

    ↑ Check: domain-fintech (transaction requirements)

    ↑ Check: SLA (availability requirements)

Question

Trace To

Ask

Retry policy

domain-*

What's acceptable latency for retry?

User experience

domain-*

What message should users see?

Compliance

domain-*

What must be logged for audit?

Trace Down ↓

To implementation (Layer 1):

"Need typed errors"

    ↓ m06-error-handling: thiserror for library

    ↓ m04-zero-cost: Error enum design

"Need error context"

    ↓ m06-error-handling: anyhow::Context

    ↓ Logging: tracing with fields

"Need retry logic"

    ↓ m07-concurrency: async retry patterns

    ↓ Crates: tokio-retry, backoff

Quick Reference

Recovery Pattern

When

Implementation

Retry

Transient failures

exponential backoff

Fallback

Degraded mode

cached/default value

Circuit Breaker

Cascading failures

failsafe-rs

Timeout

Slow operations

tokio::time::timeout

Bulkhead

Isolation

separate thread pools

Error Hierarchy

#[derive(thiserror::Error, Debug)]

pub enum AppError {

    // User-facing

    #[error("Invalid input: {0}")]

    Validation(String),

    // Transient (retryable)

    #[error("Service temporarily unavailable")]

    ServiceUnavailable(#[source] reqwest::Error),

    // Internal (log details, show generic)

    #[error("Internal error")]

    Internal(#[source] anyhow::Error),

}

impl AppError {

    pub fn is_retryable(&self) -> bool {

        matches!(self, Self::ServiceUnavailable(_))

    }

}

Retry Pattern

use tokio_retry::{Retry, strategy::ExponentialBackoff};

async fn with_retry<F, T, E>(f: F) -> Result<T, E>

where

    F: Fn() -> impl Future<Output = Result<T, E>>,

    E: std::fmt::Debug,

{

    let strategy = ExponentialBackoff::from_millis(100)

        .max_delay(Duration::from_secs(10))

        .take(5);

    Retry::spawn(strategy, || f()).await

}

Common Mistakes

Mistake

Why Wrong

Better

Same error for all

No actionability

Categorize by audience

Retry everything

Wasted resources

Only transient errors

Infinite retry

DoS self

Max attempts + backoff

Expose internal errors

Security risk

User-friendly messages

No context

Hard to debug

.context() everywhere

Anti-Patterns

Anti-Pattern

Why Bad

Better

String errors

No structure

thiserror types

panic! for recoverable

Bad UX

Result with context

Ignore errors

Silent failures

Log or propagate

Box everywhere

Lost type info

thiserror

Error in happy path

Performance

Early validation

Related Skills

When

See

Error handling basics

m06-error-handling

Retry implementation

m07-concurrency

Domain modeling

m09-domain

User-facing APIs

domain-*

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card