SKILL.md
Secrets Vault Manager
Tier: POWERFUL
Category: Engineering
Domain: Security / Infrastructure / DevOps
Overview
Production secret infrastructure management for teams running HashiCorp Vault, cloud-native secret stores, or hybrid architectures. This skill covers policy authoring, auth method configuration, automated rotation, dynamic secrets, audit logging, and incident response.
Distinct from env-secrets-manager which handles local .env file hygiene and leak detection. This skill operates at the infrastructure layer — Vault clusters, cloud KMS, certificate authorities, and CI/CD secret injection.
When to Use
- Standing up a new Vault cluster or migrating to a managed secret store
- Designing auth methods for services, CI runners, and human operators
- Implementing automated credential rotation (database, API keys, certificates)
- Auditing secret access patterns for compliance (SOC 2, ISO 27001, HIPAA)
- Responding to a secret leak that requires mass revocation
- Integrating secrets into Kubernetes workloads or CI/CD pipelines
HashiCorp Vault Patterns
Architecture Decisions
Decision
Recommendation
Rationale
Deployment mode
HA with Raft storage
No external dependency, built-in leader election
Auto-unseal
Cloud KMS (AWS KMS / Azure Key Vault / GCP KMS)
Eliminates manual unseal, enables automated restarts
Namespaces
One per environment (dev/staging/prod)
Blast-radius isolation, independent policies
Audit devices
File + syslog (dual)
Vault refuses requests if all audit devices fail — dual prevents outages
Auth Methods
AppRole — Machine-to-machine authentication for services and batch jobs.
# Enable AppRole
path "auth/approle/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
# Application-specific role
vault write auth/approle/role/payment-service \
token_ttl=1h \
token_max_ttl=4h \
secret_id_num_uses=1 \
secret_id_ttl=10m \
token_policies="payment-service-read"
Kubernetes — Pod-native authentication via service account tokens.
vault write auth/kubernetes/role/api-server \
bound_service_account_names=api-server \
bound_service_account_namespaces=production \
policies=api-server-secrets \
ttl=1h
OIDC — Human operator access via SSO provider (Okta, Azure AD, Google Workspace).
vault write auth/oidc/role/engineering \
bound_audiences="vault" \
allowed_redirect_uris="https://vault.example.com/ui/vault/auth/oidc/oidc/callback" \
user_claim="email" \
oidc_scopes="openid,profile,email" \
policies="engineering-read" \
ttl=8h
Secret Engines
Engine
Use Case
TTL Strategy
KV v2
Static secrets (API keys, config)
Versioned, manual rotation
Database
Dynamic DB credentials
1h default, 24h max
PKI
TLS certificates
90d leaf certs, 5y intermediate CA
Transit
Encryption-as-a-service
Key rotation every 90d
SSH
Signed SSH certificates
30m for interactive, 8h for automation
Policy Design
Follow least-privilege with path-based granularity:
# payment-service-read policy
path "secret/data/production/payment/*" {
capabilities = ["read"]
}
path "database/creds/payment-readonly" {
capabilities = ["read"]
}
# Deny access to admin paths explicitly
path "sys/*" {
capabilities = ["deny"]
}
Policy naming convention: {service}-{access-level} (e.g., payment-service-read, api-gateway-admin).
Cloud Secret Store Integration
Comparison Matrix
Feature
AWS Secrets Manager
Azure Key Vault
GCP Secret Manager
Rotation
Built-in Lambda
Custom logic via Functions
Cloud Functions
Versioning
Automatic
Manual or automatic
Automatic
Encryption
AWS KMS (default or CMK)
HSM-backed
Google-managed or CMEK
Access control
IAM policies + resource policy
RBAC + Access Policies
IAM bindings
Cross-region
Replication supported
Geo-redundant by default
Replication supported
Audit
CloudTrail
Azure Monitor + Diagnostic Logs
Cloud Audit Logs
Pricing model
Per-secret + per-API call
Per-operation + per-key
Per-secret version + per-access
When to Use Which
- AWS Secrets Manager: RDS/Aurora credential rotation out of the box. Best when fully on AWS.
- Azure Key Vault: Certificate management strength. Required for Azure AD integrated workloads.
- GCP Secret Manager: Simplest API surface. Best for GKE-native workloads with Workload Identity.
- HashiCorp Vault: Multi-cloud, dynamic secrets, PKI, transit encryption. Best for complex or hybrid environments.
SDK Access Patterns
Principle: Always fetch secrets at startup or via sidecar — never bake into images or config files.
# AWS Secrets Manager pattern
import boto3, json
def get_secret(secret_name, region="us-east-1"):
client = boto3.client("secretsmanager", region_name=region)
response = client.get_secret_value(SecretId=secret_name)
return json.loads(response["SecretString"])
# GCP Secret Manager pattern
from google.cloud import secretmanager
def get_secret(project_id, secret_id, version="latest"):
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{project_id}/secrets/{secret_id}/versions/{version}"
response = client.access_secret_version(request={"name": name})
return response.payload.data.decode("UTF-8")
# Azure Key Vault pattern
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
def get_secret(vault_url, secret_name):
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
return client.get_secret(secret_name).value
Secret Rotation Workflows
Rotation Strategy by Secret Type
Secret Type
Rotation Frequency
Method
Downtime Risk
Database passwords
30 days
Dual-account swap
Zero (A/B rotation)
API keys
90 days
Generate new, deprecate old
Zero (overlap window)
TLS certificates
60 days before expiry
ACME or Vault PKI
Zero (graceful reload)
SSH keys
90 days
Vault-signed certificates
Zero (CA-based)
Service tokens
24 hours
Dynamic generation
Zero (short-lived)
Encryption keys
90 days
Key versioning (rewrap)
Zero (version coexistence)
Database Credential Rotation (Dual-Account)
- Two database accounts exist:
app_user_aandapp_user_b
- Application currently uses
app_user_a
- Rotation rotates
app_user_bpassword, updates secret store
- Application switches to
app_user_bon next credential fetch
- After grace period,
app_user_apassword is rotated
- Cycle repeats
API Key Rotation (Overlap Window)
- Generate new API key with provider
- Store new key in secret store as
current, move old toprevious
- Deploy applications — they read
current
- After all instances restarted (or TTL expired), revoke
previous
- Monitoring confirms zero usage of old key before revocation
Dynamic Secrets
Dynamic secrets are generated on-demand with automatic expiration. Prefer dynamic secrets over static credentials wherever possible.
Database Dynamic Credentials (Vault)
# Configure database engine
vault write database/config/postgres \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/app" \
allowed_roles="app-readonly,app-readwrite" \
username="vault_admin" \
password="<admin-password>"
# Create role with TTL
vault write database/roles/app-readonly \
db_name=postgres \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl=1h \
max_ttl=24h
Cloud IAM Dynamic Credentials
Vault can generate short-lived AWS IAM credentials, Azure service principal passwords, or GCP service account keys — eliminating long-lived cloud credentials entirely.
SSH Certificate Authority
Replace SSH key distribution with a Vault-signed certificate model:
- Vault acts as SSH CA
- Users/machines request signed certificates with short TTL (30 min)
- SSH servers trust the CA public key — no
authorized_keysmanagement
- Certificates expire automatically — no revocation needed for normal operations
Audit Logging
What to Log
Event
Priority
Retention
Secret read access
HIGH
1 year minimum
Secret creation/update
HIGH
1 year minimum
Auth method login
MEDIUM
90 days
Policy changes
CRITICAL
2 years (compliance)
Failed access attempts
CRITICAL
1 year
Token creation/revocation
MEDIUM
90 days
Seal/unseal operations
CRITICAL
Indefinite
Anomaly Detection Signals
- Secret accessed from new IP/CIDR range
- Access volume spike (>3x baseline for a path)
- Off-hours access for human auth methods
- Service accessing secrets outside its policy scope (denied requests)
- Multiple failed auth attempts from single source
- Token created with unusually long TTL
Compliance Reporting
Generate periodic reports covering:
- Access inventory — Which identities accessed which secrets, when
- Rotation compliance — Secrets overdue for rotation
- Policy drift — Policies modified since last review
- Orphaned secrets — Secrets with no recent access (>90 days)
Use audit_log_analyzer.py to parse Vault or cloud audit logs for these signals.
Emergency Procedures
Secret Leak Response (Immediate)
Time target: Contain within 15 minutes of detection.
- Identify scope — Which secret(s) leaked, where (repo, log, error message, third party)
- Revoke immediately — Rotate the compromised credential at the source (provider API, Vault, cloud SM)
- Invalidate tokens — Revoke all Vault tokens that accessed the leaked secret
- Audit blast radius — Query audit logs for usage of the compromised secret in the exposure window
- Notify stakeholders — Security team, affected service owners, compliance (if PII/regulated data)
- Post-mortem — Document root cause, update controls to prevent recurrence
Vault Seal Operations
When to seal: Active security incident affecting Vault infrastructure, suspected key compromise.
Sealing stops all Vault operations. Use only as last resort.
Unseal procedure:
- Gather quorum of unseal key holders (Shamir threshold)
- Or confirm auto-unseal KMS key is accessible
- Unseal via
vault operator unsealor restart with auto-unseal
- Verify audit devices reconnected
- Check active leases and token validity
See references/emergency_procedures.md for complete playbooks.
CI/CD Integration
Vault Agent Sidecar (Kubernetes)
Vault Agent runs alongside application pods, handles authentication and secret rendering:
# Pod annotation for Vault Agent Injector
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "api-server"
vault.hashicorp.com/agent-inject-secret-db: "database/creds/app-readonly"
vault.hashicorp.com/agent-inject-template-db: |
{{- with secret "database/creds/app-readonly" -}}
postgresql://{{ .Data.username }}:{{ .Data.password }}@db:5432/app
{{- end }}
External Secrets Operator (Kubernetes)
For teams preferring declarative GitOps over agent sidecars:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: api-credentials
data:
- secretKey: api-key
remoteRef:
key: secret/data/production/api
property: key
GitHub Actions OIDC
Eliminate long-lived secrets in CI by using OIDC federation:
- name: Authenticate to Vault
uses: hashicorp/vault-action@v2
with:
url: https://vault.example.com
method: jwt
role: github-ci
jwtGithubAudience: https://vault.example.com
secrets: |
secret/data/ci/deploy api_key | DEPLOY_API_KEY ;
secret/data/ci/deploy db_password | DB_PASSWORD
Anti-Patterns
Anti-Pattern
Risk
Correct Approach
Hardcoded secrets in source code
Leak via repo, logs, error output
Fetch from secret store at runtime
Long-lived static tokens (>30 days)
Stale credentials, no accountability
Dynamic secrets or short TTL + rotation
Shared service accounts
No audit trail per consumer
Per-service identity with unique credentials
No rotation policy
Compromised creds persist indefinitely
Automated rotation on schedule
Secrets in environment variables on CI
Visible in build logs, process table
Vault Agent or OIDC-based injection
Single unseal key holder
Bus factor of 1, recovery blocked
Shamir split (3-of-5) or auto-unseal
No audit device configured
Zero visibility into access
Dual audit devices (file + syslog)
Wildcard policies (path "*")
Over-permissioned, violates least privilege
Explicit path-based policies per service
Tools
Script
Purpose
vault_config_generator.py
Generate Vault policy and auth config from application requirements
rotation_planner.py
Create rotation schedule from a secret inventory file
audit_log_analyzer.py
Analyze audit logs for anomalies and compliance gaps
Cross-References
- env-secrets-manager — Local
.envfile hygiene, leak detection, drift awareness
- senior-secops — Security operations, incident response, threat modeling
- ci-cd-pipeline-builder — Pipeline design where secrets are consumed
- docker-development — Container secret injection patterns
- helm-chart-builder — Kubernetes secret management in Helm charts