monitoring-expert

Comprehensive monitoring, logging, metrics, tracing, and performance testing implementation for production systems. Covers structured logging (Pino/JSON), Prometheus metrics (counters, histograms, gauges), and OpenTelemetry distributed tracing with span instrumentation Includes Prometheus alerting rule configuration, RED/USE dashboard design patterns, and health check endpoint setup Provides load testing with k6 and Artillery, application profiling for CPU/memory bottlenecks, and capacity planning guidance Enforces best practices: correlation IDs for request tracking, no sensitive data in logs, alert thresholds on critical paths to prevent alert fatigue

INSTALLATION
npx skills add https://github.com/jeffallan/claude-skills --skill monitoring-expert
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2c

// Good — structured fields, includes correlation ID

logger.info({ requestId: req.id, userId: req.user.id, durationMs: elapsed }, 'order.created');

// Bad — string interpolation, no correlation

console.log(Order created for user ${userId});

### Prometheus Metrics (Node.js)

import { Counter, Histogram, register } from 'prom-client';

const httpRequests = new Counter({

name: 'http_requests_total',

help: 'Total HTTP requests',

labelNames: ['method', 'route', 'status'],

});

const httpDuration = new Histogram({

name: 'http_request_duration_seconds',

help: 'HTTP request latency',

labelNames: ['method', 'route'],

buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5],

});

// Instrument a route

app.use((req, res, next) => {

const end = httpDuration.startTimer({ method: req.method, route: req.path });

res.on('finish', () => {

httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode });

end();

});

next();

});

// Expose scrape endpoint

app.get('/metrics', async (req, res) => {

res.set('Content-Type', register.contentType);

res.end(await register.metrics());

});


### OpenTelemetry Tracing (Node.js)

import { NodeSDK } from '@opentelemetry/sdk-node';

import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

import { trace } from '@opentelemetry/api';

const sdk = new NodeSDK({

traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),

});

sdk.start();

// Manual span around a critical operation

const tracer = trace.getTracer('order-service');

async function processOrder(orderId) {

const span = tracer.startSpan('order.process');

span.setAttribute('order.id', orderId);

try {

const result = await db.saveOrder(orderId);

span.setStatus({ code: SpanStatusCode.OK });

return result;

} catch (err) {

span.recordException(err);

span.setStatus({ code: SpanStatusCode.ERROR });

throw err;

} finally {

span.end();

}

}


### Prometheus Alerting Rule

groups:

- name: api.rules

rules:

- alert: HighErrorRate

expr: |

rate(http_requests_total{status=~"5.."}[5m])

/ rate(http_requests_total[5m]) > 0.05

for: 2m

labels:

severity: critical

annotations:

summary: "Error rate above 5% on {{ $labels.route }}"


### k6 Load Test

import http from 'k6/http';

import { check, sleep } from 'k6';

export const options = {

stages: [

{ duration: '1m', target: 50 }, // ramp up

{ duration: '5m', target: 50 }, // sustained load

{ duration: '1m', target: 0 }, // ramp down

],

thresholds: {

http_req_duration: ['p(95)<500'], // 95th percentile < 500 ms

http_req_failed: ['rate<0.01'], // error rate < 1%

},

};

export default function () {

const res = http.get('https://api.example.com/orders');

check(res, { 'status is 200': (r) => r.status === 200 });

sleep(1);

}

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card