SKILL.md
Root Cause Analysis
Table of Contents
- [Overview](#overview)
- [When to Use](#when-to-use)
- [Quick Start](#quick-start)
- [Reference Guides](#reference-guides)
- [Best Practices](#best-practices)
Overview
Root cause analysis (RCA) identifies underlying reasons for failures, enabling permanent solutions rather than temporary fixes.
When to Use
- Production incidents
- Customer-impacting issues
- Repeated problems
- Unexpected failures
- Performance degradation
Quick Start
Minimal working example:
Example: Website Down
Symptom: Website returned 503 Service Unavailable
Why 1: Why was website down?
Answer: Database connection pool exhausted
Why 2: Why was connection pool exhausted?
Answer: Queries taking too long, connections not released
Why 3: Why were queries slow?
Answer: Missing index on frequently queried column
Why 4: Why was index missing?
Answer: Performance testing didn't use production-like data volume
Why 5: Why wasn't production-like data used?
Answer: Load testing environment doesn't mirror production
Root Cause: Load testing environment under-provisioned
Solution: Update load testing environment with production-like data
Prevention: Establish environment parity requirements
Reference Guides
Detailed implementations in the references/ directory:
Guide
Contents
The 5 Whys Technique
Systematic RCA Process
RCA Report Template
Root Cause Analysis Techniques
Root Cause Analysis Techniques
Follow-Up & Prevention
Follow-Up & Prevention
Best Practices
✅ DO
- Follow established patterns and conventions
- Write clean, maintainable code
- Add appropriate documentation
- Test thoroughly before deploying
❌ DON'T
- Skip testing or validation
- Ignore error handling
- Hard-code configuration values