incident-response

Run an incident response workflow — triage, communicate, and write postmortem. Trigger with "we have an incident", "production is down", an alert that needs…

INSTALLATION

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill incident-response

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

/incident-response

Name: incident-response
Author: anthropics

If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.

Manage an incident from detection through postmortem.

Usage

/incident-response $ARGUMENTS

Modes

/incident-response new [description]     # Start a new incident

/incident-response update [status]       # Post a status update

/incident-response postmortem            # Generate postmortem from incident data

If no mode is specified, ask what phase the incident is in.

How It Works

┌─────────────────────────────────────────────────────────────────┐

│                    INCIDENT RESPONSE                               │

├─────────────────────────────────────────────────────────────────┤

│  Phase 1: TRIAGE                                                  │

│  ✓ Assess severity (SEV1-4)                                     │

│  ✓ Identify affected systems and users                          │

│  ✓ Assign roles (IC, comms, responders)                         │

│                                                                    │

│  Phase 2: COMMUNICATE                                              │

│  ✓ Draft internal status update                                  │

│  ✓ Draft customer communication (if needed)                     │

│  ✓ Set up war room and cadence                                   │

│                                                                    │

│  Phase 3: MITIGATE                                                 │

│  ✓ Document mitigation steps taken                               │

│  ✓ Track timeline of events                                      │

│  ✓ Confirm resolution                                            │

│                                                                    │

│  Phase 4: POSTMORTEM                                               │

│  ✓ Blameless postmortem document                                 │

│  ✓ Timeline reconstruction                                       │

│  ✓ Root cause analysis (5 whys)                                  │

│  ✓ Action items with owners                                      │

└─────────────────────────────────────────────────────────────────┘

Severity Classification

Level

Criteria

Response Time

SEV1

Service down, all users affected

Immediate, all-hands

SEV2

Major feature degraded, many users affected

Within 15 min

SEV3

Minor feature issue, some users affected

Within 1 hour

SEV4

Cosmetic or low-impact issue

Next business day

Communication Guidance

Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.

Output — Status Update

## Incident Update: [Title]

**Severity:** SEV[1-4] | **Status:** Investigating | Identified | Monitoring | Resolved

**Impact:** [Who/what is affected]

**Last Updated:** [Timestamp]

### Current Status

[What we know now]

### Actions Taken

- [Action 1]

- [Action 2]

### Next Steps

- [What's happening next and ETA]

### Timeline

| Time | Event |

|------|-------|

| [HH:MM] | [Event] |

Output — Postmortem

## Postmortem: [Incident Title]

**Date:** [Date] | **Duration:** [X hours] | **Severity:** SEV[X]

**Authors:** [Names] | **Status:** Draft

### Summary

[2-3 sentence plain-language summary]

### Impact

- [Users affected]

- [Duration of impact]

- [Business impact if quantifiable]

### Timeline

| Time (UTC) | Event |

|------------|-------|

| [HH:MM] | [Event] |

### Root Cause

[Detailed explanation of what caused the incident]

### 5 Whys

1. Why did [symptom]? → [Because...]

2. Why did [cause 1]? → [Because...]

3. Why did [cause 2]? → [Because...]

4. Why did [cause 3]? → [Because...]

5. Why did [cause 4]? → [Root cause]

### What Went Well

- [Things that worked]

### What Went Poorly

- [Things that didn't work]

### Action Items

| Action | Owner | Priority | Due Date |

|--------|-------|----------|----------|

| [Action] | [Person] | P0/P1/P2 | [Date] |

### Lessons Learned

[Key takeaways for the team]

If Connectors Available

If ~~monitoring is connected:

Pull alert details and metrics

Show graphs of affected metrics

If ~~incident management is connected:

Create or update incident in PagerDuty/Opsgenie

Page on-call responders

If ~~chat is connected:

Post status updates to incident channel

Create war room channel

Tips

Start writing immediately — Don't wait for complete information. Update as you learn more.

Keep updates factual — What we know, what we've done, what's next. No speculation.

Postmortems are blameless — Focus on systems and processes, not individuals.