server-management

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill server-management
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Server Management

Server management principles for production operations.

Learn to THINK, not memorize commands.

1. Process Management Principles

Tool Selection

Scenario

Tool

Node.js app

PM2 (clustering, reload)

Any app

systemd (Linux native)

Containers

Docker/Podman

Orchestration

Kubernetes, Docker Swarm

Process Management Goals

Goal

What It Means

Restart on crash

Auto-recovery

Zero-downtime reload

No service interruption

Clustering

Use all CPU cores

Persistence

Survive server reboot

2. Monitoring Principles

What to Monitor

Category

Key Metrics

Availability

Uptime, health checks

Performance

Response time, throughput

Errors

Error rate, types

Resources

CPU, memory, disk

Alert Severity Strategy

Level

Response

Critical

Immediate action

Warning

Investigate soon

Info

Review daily

Monitoring Tool Selection

Need

Options

Simple/Free

PM2 metrics, htop

Full observability

Grafana, Datadog

Error tracking

Sentry

Uptime

UptimeRobot, Pingdom

3. Log Management Principles

Log Strategy

Log Type

Purpose

Application logs

Debug, audit

Access logs

Traffic analysis

Error logs

Issue detection

Log Principles

  • Rotate logs to prevent disk fill
  • Structured logging (JSON) for parsing
  • Appropriate levels (error/warn/info/debug)
  • No sensitive data in logs

4. Scaling Decisions

When to Scale

Symptom

Solution

High CPU

Add instances (horizontal)

High memory

Increase RAM or fix leak

Slow response

Profile first, then scale

Traffic spikes

Auto-scaling

Scaling Strategy

Type

When to Use

Vertical

Quick fix, single instance

Horizontal

Sustainable, distributed

Auto

Variable traffic

5. Health Check Principles

What Constitutes Healthy

Check

Meaning

HTTP 200

Service responding

Database connected

Data accessible

Dependencies OK

External services reachable

Resources OK

CPU/memory not exhausted

Health Check Implementation

  • Simple: Just return 200
  • Deep: Check all dependencies
  • Choose based on load balancer needs

6. Security Principles

Area

Principle

Access

SSH keys only, no passwords

Firewall

Only needed ports open

Updates

Regular security patches

Secrets

Environment vars, not files

Audit

Log access and changes

7. Troubleshooting Priority

When something's wrong:

  • Check if running (process status)
  • Check logs (error messages)
  • Check resources (disk, memory, CPU)
  • Check network (ports, DNS)
  • Check dependencies (database, APIs)

8. Anti-Patterns

❌ Don't

✅ Do

Run as root

Use non-root user

Ignore logs

Set up log rotation

Skip monitoring

Monitor from day one

Manual restarts

Auto-restart config

No backups

Regular backup schedule

Remember: A well-managed server is boring. That's the goal.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card