verification-&-quality-assurance

verification-&-quality-assurance — an installable skill for AI agents, published by ruvnet/ruflo.

INSTALLATION
npx skills add https://github.com/ruvnet/ruflo --skill verification-&-quality-assurance
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2b

Layer

What

CI job(s) in .github/workflows/v3-ci.yml

ADR

1 — install/behavioral smoke

Exercise user-visible failure modes against a real build

smoke-install-no-bsqlite (npm install on platforms w/o prebuilds), plugin-hooks-smoke (#1859/#1862 — hook flag parsing), mcp-protocol-smoke (#1874 — HTTP MCP wire format), memory-import-smoke (#1883/#1884 — WSL path + key sanitization), mcp-roundtrip-smoke (#1889 paired-tool round-trip + #1863 cli-no-crash + ADR-095 G2 consensus-transport)

ADR-102

1 — discoverability gate

Every MCP tool description must answer "use this over native when?"

tool-descriptions-auditscripts/audit-tool-descriptions.mjs, baseline at verification/mcp-tool-baseline.json (monotone-decreasing: noGuidance / tooShort / duplicates)

ADR-112

2 — cryptographic witness

Every documented fix's load-bearing marker must still be present in dist; Ed25519-signed, per-OS bundles

witness-verify (ubuntu/macos/windows) — plugins/ruflo-core/scripts/witness/verify.mjs against verification/<os>/manifest.md.json

ADR-103

3 — temporal history

When was a regression introduced

verification/<os>/history.jsonl + history.mjs (summary / regressions / timeline)

ADR-103

Run the guards locally

# Tool-description discoverability audit (ADR-112)

node scripts/audit-tool-descriptions.mjs                       # fails if any baseline count rises

node scripts/audit-tool-descriptions.mjs --update-baseline     # lock the new floor after a fix lands

# Behavioral smokes (each builds what it needs; safe to run individually)

node plugins/ruflo-core/scripts/test-hooks.mjs "node $PWD/v3/@claude-flow/cli/bin/cli.js"

node plugins/ruflo-core/scripts/test-mcp-protocol.mjs

node plugins/ruflo-core/scripts/test-memory-import.mjs

node plugins/ruflo-core/scripts/test-mcp-roundtrips.mjs        # #1889 paired-tool round-trip

node plugins/ruflo-core/scripts/test-cli-no-crash.mjs          # #1863 unhandled-exception class

node plugins/ruflo-core/scripts/test-consensus-transport.mjs   # ADR-095 G2 consensus transport

# Witness manifest — regenerate + verify

node scripts/regen-witness.mjs

node plugins/ruflo-core/scripts/witness/verify.mjs --manifest verification/macos/manifest.md.json

# Temporal history

node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl summary

node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl regressions

Adding a new guard

  • Behavioral smoke → write plugins/ruflo-core/scripts/test-<name>.mjs. Pattern: static dist-scan first (fast, always completes), behavioral probe second with an internal timeout + a process-level watchdog so CI never hangs. Add a step to the relevant job in v3-ci.yml.
  • Static gate with a baseline → write scripts/audit-<name>.mjs that scans, counts violations, and fails if the count exceeds a monotone-decreasing baseline in verification/<name>-baseline.json. Support --update-baseline. Add a CI job; wire it into witness-verify needs[] if it should gate publish.
  • Documented-fix marker → append { id, desc, file, marker } to verification/witness-fixes.json, run node scripts/regen-witness.mjs. The marker must be a substring the fix specifically creates (not a generic pattern like 'function').

Prerequisites

  • Ruflo installed (npx ruflo@alpha)
  • Git repository (for rollback features)
  • Node.js 18+ (for dashboard features)
  • @noble/ed25519 (for the witness verifier — a single runtime dep, npm i @noble/ed25519)

Quick Start

# View current truth scores

npx ruflo@alpha truth

# Run verification check

npx ruflo@alpha verify check

# Verify specific file with custom threshold

npx ruflo@alpha verify check --file src$app.js --threshold 0.98

# Rollback last failed verification

npx ruflo@alpha verify rollback --last-good

Complete Guide

Truth Scoring System

#### View Truth Metrics

Display comprehensive quality and reliability metrics for your codebase and agent tasks.

Basic Usage:

# View current truth scores (default: table format)

npx ruflo@alpha truth

# View scores for specific time period

npx ruflo@alpha truth --period 7d

# View scores for specific agent

npx ruflo@alpha truth --agent coder --period 24h

# Find files$tasks below threshold

npx ruflo@alpha truth --threshold 0.8

Output Formats:

# Table format (default)

npx ruflo@alpha truth --format table

# JSON for programmatic access

npx ruflo@alpha truth --format json

# CSV for spreadsheet analysis

npx ruflo@alpha truth --format csv

# HTML report with visualizations

npx ruflo@alpha truth --format html --export report.html

Real-time Monitoring:

# Watch mode with live updates

npx ruflo@alpha truth --watch

# Export metrics automatically

npx ruflo@alpha truth --export .claude-flow$metrics$truth-$(date +%Y%m%d).json

#### Truth Score Dashboard

Example dashboard output:

📊 Truth Metrics Dashboard

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Overall Truth Score: 0.947 ✅

Trend: ↗️ +2.3% (7d)

Top Performers:

  verification-agent   0.982 ⭐

  code-analyzer       0.971 ⭐

  test-generator      0.958 ✅

Needs Attention:

  refactor-agent      0.821 ⚠️

  docs-generator      0.794 ⚠️

Recent Tasks:

  task-456  0.991 ✅  "Implement auth"

  task-455  0.967 ✅  "Add tests"

  task-454  0.743 ❌  "Refactor API"

#### Metrics Explained

Truth Scores (0.0-1.0):

  • 1.0-0.95: Excellent ⭐ (production-ready)
  • 0.94-0.85: Good ✅ (acceptable quality)
  • 0.84-0.75: Warning ⚠️ (needs attention)
  • <0.75: Critical ❌ (requires immediate action)

Trend Indicators:

  • ↗️ Improving (positive trend)
  • → Stable (consistent performance)
  • ↘️ Declining (quality regression detected)

Statistics:

  • Mean Score: Average truth score across all measurements
  • Median Score: Middle value (less affected by outliers)
  • Standard Deviation: Consistency of scores (lower = more consistent)
  • Confidence Interval: Statistical reliability of measurements

Verification Checks

#### Run Verification

Execute comprehensive verification checks on code, tasks, or agent outputs.

File Verification:

# Verify single file

npx ruflo@alpha verify check --file src$app.js

# Verify directory recursively

npx ruflo@alpha verify check --directory src/

# Verify with auto-fix enabled

npx ruflo@alpha verify check --file src$utils.js --auto-fix

# Verify current working directory

npx ruflo@alpha verify check

Task Verification:

# Verify specific task output

npx ruflo@alpha verify check --task task-123

# Verify with custom threshold

npx ruflo@alpha verify check --task task-456 --threshold 0.99

# Verbose output for debugging

npx ruflo@alpha verify check --task task-789 --verbose

Batch Verification:

# Verify multiple files in parallel

npx ruflo@alpha verify batch --files "*.js" --parallel

# Verify with pattern matching

npx ruflo@alpha verify batch --pattern "src/**/*.ts"

# Integration test suite

npx ruflo@alpha verify integration --test-suite full

#### Verification Criteria

The verification system evaluates:

-

Code Correctness

  • Syntax validation
  • Type checking (TypeScript)
  • Logic flow analysis
  • Error handling completeness

-

Best Practices

  • Code style adherence
  • SOLID principles
  • Design patterns usage
  • Modularity and reusability

-

Security

  • Vulnerability scanning
  • Secret detection
  • Input validation
  • Authentication$authorization checks

-

Performance

  • Algorithmic complexity
  • Memory usage patterns
  • Database query optimization
  • Bundle size impact

-

Documentation

  • JSDoc/TypeDoc completeness
  • README accuracy
  • API documentation
  • Code comments quality

#### JSON Output for CI/CD

# Get structured JSON output

npx ruflo@alpha verify check --json > verification.json

# Example JSON structure:

{

  "overallScore": 0.947,

  "passed": true,

  "threshold": 0.95,

  "checks": [

    {

      "name": "code-correctness",

      "score": 0.98,

      "passed": true

    },

    {

      "name": "security",

      "score": 0.91,

      "passed": false,

      "issues": [...]

    }

  ]

}

Automatic Rollback

#### Rollback Failed Changes

Automatically revert changes that fail verification checks.

Basic Rollback:

# Rollback to last known good state

npx ruflo@alpha verify rollback --last-good

# Rollback to specific commit

npx ruflo@alpha verify rollback --to-commit abc123

# Interactive rollback with preview

npx ruflo@alpha verify rollback --interactive

Smart Rollback:

# Rollback only failed files (preserve good changes)

npx ruflo@alpha verify rollback --selective

# Rollback with automatic backup

npx ruflo@alpha verify rollback --backup-first

# Dry-run mode (preview without executing)

npx ruflo@alpha verify rollback --dry-run

Rollback Performance:

  • Git-based rollback: <1 second
  • Selective file rollback: <500ms
  • Backup creation: Automatic before rollback

Verification Reports

#### Generate Reports

Create detailed verification reports with metrics and visualizations.

Report Formats:

# JSON report

npx ruflo@alpha verify report --format json

# HTML report with charts

npx ruflo@alpha verify report --export metrics.html --format html

# CSV for data analysis

npx ruflo@alpha verify report --format csv --export metrics.csv

# Markdown summary

npx ruflo@alpha verify report --format markdown

Time-based Reports:

# Last 24 hours

npx ruflo@alpha verify report --period 24h

# Last 7 days

npx ruflo@alpha verify report --period 7d

# Last 30 days with trends

npx ruflo@alpha verify report --period 30d --include-trends

# Custom date range

npx ruflo@alpha verify report --from 2025-01-01 --to 2025-01-31

Report Content:

  • Overall truth scores
  • Per-agent performance metrics
  • Task completion quality
  • Verification pass$fail rates
  • Rollback frequency
  • Quality improvement trends
  • Statistical confidence intervals

Interactive Dashboard

#### Launch Dashboard

Run interactive web-based verification dashboard with real-time updates.

# Launch dashboard on default port (3000)

npx ruflo@alpha verify dashboard

# Custom port

npx ruflo@alpha verify dashboard --port 8080

# Export dashboard data

npx ruflo@alpha verify dashboard --export

# Dashboard with auto-refresh

npx ruflo@alpha verify dashboard --refresh 5s

Dashboard Features:

  • Real-time truth score updates (WebSocket)
  • Interactive charts and graphs
  • Agent performance comparison
  • Task history timeline
  • Rollback history viewer
  • Export to PDF/HTML
  • Filter by time period$agent$score

Configuration

#### Default Configuration

Set verification preferences in .claude-flow$config.json:

{

  "verification": {

    "threshold": 0.95,

    "autoRollback": true,

    "gitIntegration": true,

    "hooks": {

      "preCommit": true,

      "preTask": true,

      "postEdit": true

    },

    "checks": {

      "codeCorrectness": true,

      "security": true,

      "performance": true,

      "documentation": true,

      "bestPractices": true

    }

  },

  "truth": {

    "defaultFormat": "table",

    "defaultPeriod": "24h",

    "warningThreshold": 0.85,

    "criticalThreshold": 0.75,

    "autoExport": {

      "enabled": true,

      "path": ".claude-flow$metrics$truth-daily.json"

    }

  }

}

#### Threshold Configuration

Adjust verification strictness:

# Strict mode (99% accuracy required)

npx ruflo@alpha verify check --threshold 0.99

# Lenient mode (90% acceptable)

npx ruflo@alpha verify check --threshold 0.90

# Set default threshold

npx ruflo@alpha config set verification.threshold 0.98

Per-environment thresholds:

{

  "verification": {

    "thresholds": {

      "production": 0.99,

      "staging": 0.95,

      "development": 0.90

    }

  }

}

Integration Examples

#### CI/CD Integration

GitHub Actions:

name: Quality Verification

on: [push, pull_request]

jobs:

  verify:

    runs-on: ubuntu-latest

    steps:

      - uses: actions$checkout@v3

      - name: Install Dependencies

        run: npm install

      - name: Run Verification

        run: |

          npx ruflo@alpha verify check --json > verification.json

      - name: Check Truth Score

        run: |

          score=$(jq '.overallScore' verification.json)

          if (( $(echo "$score < 0.95" | bc -l) )); then

            echo "Truth score too low: $score"

            exit 1

          fi

      - name: Upload Report

        uses: actions$upload-artifact@v3

        with:

          name: verification-report

          path: verification.json

GitLab CI:

verify:

  stage: test

  script:

    - npx ruflo@alpha verify check --threshold 0.95 --json > verification.json

    - |

      score=$(jq '.overallScore' verification.json)

      if [ $(echo "$score < 0.95" | bc) -eq 1 ]; then

        echo "Verification failed with score: $score"

        exit 1

      fi

  artifacts:

    paths:

      - verification.json

    reports:

      junit: verification.json

#### Swarm Integration

Run verification automatically during swarm operations:

# Swarm with verification enabled

npx ruflo@alpha swarm --verify --threshold 0.98

# Hive Mind with auto-rollback

npx ruflo@alpha hive-mind --verify --rollback-on-fail

# Training pipeline with verification

npx ruflo@alpha train --verify --threshold 0.99

#### Pair Programming Integration

Enable real-time verification during collaborative development:

# Pair with verification

npx ruflo@alpha pair --verify --real-time

# Pair with custom threshold

npx ruflo@alpha pair --verify --threshold 0.97 --auto-fix

Advanced Workflows

#### Continuous Verification

Monitor codebase continuously during development:

# Watch directory for changes

npx ruflo@alpha verify watch --directory src/

# Watch with auto-fix

npx ruflo@alpha verify watch --directory src/ --auto-fix

# Watch with notifications

npx ruflo@alpha verify watch --notify --threshold 0.95

#### Monitoring Integration

Send metrics to external monitoring systems:

# Export to Prometheus

npx ruflo@alpha truth --format json | \

  curl -X POST https:/$pushgateway.example.com$metrics$job$claude-flow \

  -d @-

# Send to DataDog

npx ruflo@alpha verify report --format json | \

  curl -X POST "https:/$api.datadoghq.com$api$v1$series?api_key=${DD_API_KEY}" \

  -H "Content-Type: application$json" \

  -d @-

# Custom webhook

npx ruflo@alpha truth --format json | \

  curl -X POST https:/$metrics.example.com$api$truth \

  -H "Content-Type: application$json" \

  -d @-

#### Pre-commit Hooks

Automatically verify before commits:

# Install pre-commit hook

npx ruflo@alpha verify install-hook --pre-commit

# .git$hooks$pre-commit example:

#!$bin$bash

npx ruflo@alpha verify check --threshold 0.95 --json > $tmp$verify.json

score=$(jq '.overallScore' $tmp$verify.json)

if (( $(echo "$score < 0.95" | bc -l) )); then

  echo "❌ Verification failed with score: $score"

  echo "Run 'npx ruflo@alpha verify check --verbose' for details"

  exit 1

fi

echo "✅ Verification passed with score: $score"

Performance Metrics

Verification Speed:

  • Single file check: <100ms
  • Directory scan: <500ms (per 100 files)
  • Full codebase analysis: <5s (typical project)
  • Truth score calculation: <50ms

Rollback Speed:

  • Git-based rollback: <1s
  • Selective file rollback: <500ms
  • Backup creation: <2s

Dashboard Performance:

  • Initial load: <1s
  • Real-time updates: <100ms latency (WebSocket)
  • Chart rendering: 60 FPS

Troubleshooting

#### Common Issues

Low Truth Scores:

# Get detailed breakdown

npx ruflo@alpha truth --verbose --threshold 0.0

# Check specific criteria

npx ruflo@alpha verify check --verbose

# View agent-specific issues

npx ruflo@alpha truth --agent <agent-name> --format json

Rollback Failures:

# Check git status

git status

# View rollback history

npx ruflo@alpha verify rollback --history

# Manual rollback

git reset --hard HEAD~1

Verification Timeouts:

# Increase timeout

npx ruflo@alpha verify check --timeout 60s

# Verify in batches

npx ruflo@alpha verify batch --batch-size 10

Exit Codes

Verification commands return standard exit codes:

  • 0: Verification passed (score ≥ threshold)
  • 1: Verification failed (score < threshold)
  • 2: Error during verification (invalid input, system error)

Related Commands

  • npx ruflo@alpha pair - Collaborative development with verification
  • npx ruflo@alpha train - Training with verification feedback
  • npx ruflo@alpha swarm - Multi-agent coordination with quality checks
  • npx ruflo@alpha report - Generate comprehensive project reports

Best Practices

  • Set Appropriate Thresholds: Use 0.99 for critical code, 0.95 for standard, 0.90 for experimental
  • Enable Auto-rollback: Prevent bad code from persisting
  • Monitor Trends: Track improvement over time, not just current scores
  • Integrate with CI/CD: Make verification part of your pipeline
  • Use Watch Mode: Get immediate feedback during development
  • Export Metrics: Track quality metrics in your monitoring system
  • Review Rollbacks: Understand why changes were rejected
  • Train Agents: Use verification feedback to improve agent performance

Additional Resources

  • Truth Scoring Algorithm: See $docs$truth-scoring.md
  • Verification Criteria: See $docs$verification-criteria.md
  • Integration Examples: See $examples$verification/
  • API Reference: See $docs$api$verification.md
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card