SKILL.md
AWS Observability
Overview
Domain expertise for AWS observability across metrics, logs, and traces. Covers CloudWatch platform capabilities (alarms, dashboards, Log Insights, custom metrics, EMF), X-Ray trace analysis, CloudTrail operational auditing, and ADOT collector configuration.
Works best with the AWS MCP server — enables running CLI commands, querying CloudWatch, and validating configurations directly. All guidance also works with standard AWS CLI access.
Note: Reference files contain specific runtime versions, quota values, and feature matrices that may change. When precision matters (e.g., deploying to production, choosing a runtime, or checking a quota), confirm values against current AWS documentation rather than relying solely on the values in these files.
Routing
User need
Action
Writing Log Insights queries
Read log-insights.md
Configuring alarms (metric, composite, anomaly)
Read alarms.md
Publishing custom metrics or using EMF
Read metrics.md
Setting up X-Ray tracing or ADOT
Read tracing.md
Building dashboards
Read dashboards.md
Debugging observability issues
Read troubleshooting.md — starts with the 5 most common fixes
Debugging canary failures
Read synthetics.md — see Common failures table
CloudTrail operational auditing
Read cloudtrail.md
Setting up Lambda monitoring with CDK
Use alarm-template.ts as a starting point
Creating synthetic canaries
Read synthetics.md
Configuring ADOT collector
Use otel-config.yaml as a starting point
Spans multiple areas
Read the most specific reference first, then consult others as needed
Files
File
Content
Metric, composite, anomaly detection alarms — configuration, constraints, recommended defaults
Complete query syntax, commands, functions, known issues, reusable query library
Custom metrics, EMF spec, metric filters, high-resolution, retention
X-Ray → ADOT migration, sampling rules, annotations vs metadata, collector config
Widget types, cross-account/region, dynamic labels, sharing
Error → cause → fix for all observability services
Operational auditing, event types, S3+Athena queries
Canary runtime/blueprint constraints, VPC networking, common failures
Best-practice CDK Lambda monitoring (alarms + dashboard)
ADOT collector config for X-Ray traces + CloudWatch EMF metrics