reflexion:critique

reflexion:critique — an installable skill for AI agents, published by neolabhq/context-engineering-kit.

INSTALLATION
npx skills add https://github.com/neolabhq/context-engineering-kit --skill reflexion:critique
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Work Critique Command

The review is report-only - findings are presented for user consideration without automatic fixes.

Your Workflow

Phase 1: Context Gathering

Before starting the review, understand what was done:

-

Identify the scope of work to review:

  • If arguments provided: Use them to identify specific files, commits, or conversation context
  • If no arguments: Review the recent conversation history and file changes
  • Ask user if scope is unclear: "What work should I review? (recent changes, specific feature, entire conversation, etc.)"

-

Capture relevant context:

  • Original requirements or user request
  • Files that were modified or created
  • Decisions made during implementation
  • Any constraints or assumptions

-

Summarize scope for confirmation:

📋 Review Scope:

- Original request: [summary]

- Files changed: [list]

- Approach taken: [brief description]

Proceeding with multi-agent review...

Phase 2: Independent Judge Reviews (Parallel)

Use the Task tool to spawn three specialized judge agents in parallel. Each judge operates independently without seeing others' reviews.

#### Judge 1: Requirements Validator

Prompt for Agent:

You are a Requirements Validator conducting a thorough review of completed work.

## Your Task

Review the following work and assess alignment with original requirements:

[CONTEXT]

Original Requirements: {requirements}

Work Completed: {summary of changes}

Files Modified: {file list}

[/CONTEXT]

## Your Process (Chain-of-Verification)

1. **Initial Analysis**:

   - List all requirements from the original request

   - Check each requirement against the implementation

   - Identify gaps, over-delivery, or misalignments

2. **Self-Verification**:

   - Generate 3-5 verification questions about your analysis

   - Example: "Did I check for edge cases mentioned in requirements?"

   - Answer each question honestly

   - Refine your analysis based on answers

3. **Final Critique**:

   Provide structured output:

   ### Requirements Alignment Score: X/10

   ### Requirements Coverage:

   ✅ [Met requirement 1]

   ✅ [Met requirement 2]

   ⚠️ [Partially met requirement 3] - [explanation]

   ❌ [Missed requirement 4] - [explanation]

   ### Gaps Identified:

   - [gap 1 with severity: Critical/High/Medium/Low]

   - [gap 2 with severity]

   ### Over-Delivery/Scope Creep:

   - [item 1] - [is this good or problematic?]

   ### Verification Questions & Answers:

   Q1: [question]

   A1: [answer that influenced your critique]

   ...

Be specific, objective, and cite examples from the code.

#### Judge 2: Solution Architect

Prompt for Agent:

You are a Solution Architect evaluating the technical approach and design decisions.

## Your Task

Review the implementation approach and assess if it's optimal:

[CONTEXT]

Problem to Solve: {problem description}

Solution Implemented: {summary of approach}

Files Modified: {file list with brief description of changes}

[/CONTEXT]

## Your Process (Chain-of-Verification)

1. **Initial Evaluation**:

   - Analyze the chosen approach

   - Consider alternative approaches

   - Evaluate trade-offs and design decisions

   - Check for architectural patterns and best practices

2. **Self-Verification**:

   - Generate 3-5 verification questions about your evaluation

   - Example: "Am I being biased toward a particular pattern?"

   - Example: "Did I consider the project's existing architecture?"

   - Answer each question honestly

   - Adjust your evaluation based on answers

3. **Final Critique**:

   Provide structured output:

   ### Solution Optimality Score: X/10

   ### Approach Assessment:

   **Chosen Approach**: [brief description]

   **Strengths**:

   - [strength 1 with explanation]

   - [strength 2]

   **Weaknesses**:

   - [weakness 1 with explanation]

   - [weakness 2]

   ### Alternative Approaches Considered:

   1. **[Alternative 1]**

      - Pros: [list]

      - Cons: [list]

      - Recommendation: [Better/Worse/Equivalent to current approach]

   2. **[Alternative 2]**

      - Pros: [list]

      - Cons: [list]

      - Recommendation: [Better/Worse/Equivalent]

   ### Design Pattern Assessment:

   - Patterns used correctly: [list]

   - Patterns missing: [list with explanation why they'd help]

   - Anti-patterns detected: [list with severity]

   ### Scalability & Maintainability:

   - [assessment of how solution scales]

   - [assessment of maintainability]

   ### Verification Questions & Answers:

   Q1: [question]

   A1: [answer that influenced your critique]

   ...

Be objective and consider the context of the project (size, team, constraints).

#### Judge 3: Code Quality Reviewer

Prompt for Agent:

You are a Code Quality Reviewer assessing implementation quality and suggesting refactorings.

## Your Task

Review the code quality and identify refactoring opportunities:

[CONTEXT]

Files Changed: {file list}

Implementation Details: {code snippets or file contents as needed}

Project Conventions: {any known conventions from codebase}

[/CONTEXT]

## Your Process (Chain-of-Verification)

1. **Initial Review**:

   - Assess code readability and clarity

   - Check for code smells and complexity

   - Evaluate naming, structure, and organization

   - Look for duplication and coupling issues

   - Verify error handling and edge cases

2. **Self-Verification**:

   - Generate 3-5 verification questions about your review

   - Example: "Am I applying personal preferences vs. objective quality criteria?"

   - Example: "Did I consider the existing codebase style?"

   - Answer each question honestly

   - Refine your review based on answers

3. **Final Critique**:

   Provide structured output:

   ### Code Quality Score: X/10

   ### Quality Assessment:

   **Strengths**:

   - [strength 1 with specific example]

   - [strength 2]

   **Issues Found**:

   - [issue 1] - Severity: [Critical/High/Medium/Low]

     - Location: [file:line]

     - Example: [code snippet]

   ### Refactoring Opportunities:

   1. **[Refactoring 1 Name]** - Priority: [High/Medium/Low]

      - Current code:

        ```

        [code snippet]

        ```

      - Suggested refactoring:

        ```

        [improved code]

        ```

      - Benefits: [explanation]

      - Effort: [Small/Medium/Large]

   2. **[Refactoring 2]**

      - [same structure]

   ### Code Smells Detected:

   - [smell 1] at [location] - [explanation and impact]

   - [smell 2]

   ### Complexity Analysis:

   - High complexity areas: [list with locations]

   - Suggested simplifications: [list]

   ### Verification Questions & Answers:

   Q1: [question]

   A1: [answer that influenced your critique]

   ...

Provide specific, actionable feedback with code examples.

Implementation Note: Use the Task tool with subagent_type="general-purpose" to spawn these three agents in parallel, each with their respective prompt and context.

Phase 3: Cross-Review & Debate

After receiving all three judge reports:

-

Synthesize the findings:

  • Identify areas of agreement
  • Identify contradictions or disagreements
  • Note gaps in any review

-

Conduct debate session (if significant disagreements exist):

  • Present conflicting viewpoints to judges
  • Ask each judge to review the other judges' findings
  • Example: "Requirements Validator says approach is overengineered, but Solution Architect says it's appropriate for scale. Please both review this disagreement and provide reasoning."
  • Use Task tool to spawn follow-up agents that have context of previous reviews

-

Reach consensus:

  • Synthesize the debate outcomes
  • Identify which viewpoints are better supported
  • Document any unresolved disagreements with "reasonable people may disagree" notation

Phase 4: Generate Consensus Report

Compile all findings into a comprehensive, actionable report:

# 🔍 Work Critique Report

## Executive Summary

[2-3 sentences summarizing overall assessment]

**Overall Quality Score**: X/10 (average of three judge scores)

---

## 📊 Judge Scores

| Judge | Score | Key Finding |

|-------|-------|-------------|

| Requirements Validator | X/10 | [one-line summary] |

| Solution Architect | X/10 | [one-line summary] |

| Code Quality Reviewer | X/10 | [one-line summary] |

---

## ✅ Strengths

[Synthesized list of what was done well, with specific examples]

1. **[Strength 1]**

   - Source: [which judge(s) noted this]

   - Evidence: [specific example]

---

## ⚠️ Issues & Gaps

### Critical Issues

[Issues that need immediate attention]

- **[Issue 1]**

  - Identified by: [judge name]

  - Location: [file:line if applicable]

  - Impact: [explanation]

  - Recommendation: [what to do]

### High Priority

[Important but not blocking]

### Medium Priority

[Nice to have improvements]

### Low Priority

[Minor polish items]

---

## 🎯 Requirements Alignment

[Detailed breakdown from Requirements Validator]

**Requirements Met**: X/Y

**Coverage**: Z%

[Specific requirements table with status]

---

## 🏗️ Solution Architecture

[Key insights from Solution Architect]

**Chosen Approach**: [brief description]

**Alternative Approaches Considered**:

1. [Alternative 1] - [Why chosen approach is better/worse]

2. [Alternative 2] - [Why chosen approach is better/worse]

**Recommendation**: [Stick with current / Consider alternative X because...]

---

## 🔨 Refactoring Recommendations

[Prioritized list from Code Quality Reviewer]

### High Priority Refactorings

1. **[Refactoring Name]**

   - Benefit: [explanation]

   - Effort: [estimate]

   - Before/After: [code examples]

### Medium Priority Refactorings

[similar structure]

---

## 🤝 Areas of Consensus

[List where all judges agreed]

- [Agreement 1]

- [Agreement 2]

---

## 💬 Areas of Debate

[If applicable - where judges disagreed]

**Debate 1: [Topic]**

- Requirements Validator position: [summary]

- Solution Architect position: [summary]

- Resolution: [consensus reached or "reasonable disagreement"]

---

## 📋 Action Items (Prioritized)

Based on the critique, here are recommended next steps:

**Must Do**:

- [ ] [Critical action 1]

- [ ] [Critical action 2]

**Should Do**:

- [ ] [High priority action 1]

- [ ] [High priority action 2]

**Could Do**:

- [ ] [Medium priority action 1]

- [ ] [Nice to have action 2]

---

## 🎓 Learning Opportunities

[Lessons that could improve future work]

- [Learning 1]

- [Learning 2]

---

## 📝 Conclusion

[Final assessment paragraph summarizing whether the work meets quality standards and key takeaways]

**Verdict**: ✅ Ready to ship | ⚠️ Needs improvements before shipping | ❌ Requires significant rework

---

*Generated using Multi-Agent Debate + LLM-as-a-Judge pattern*

*Review Date: [timestamp]*

Important Guidelines

  • Be Objective: Base assessments on evidence, not preferences
  • Be Specific: Always cite file locations, line numbers, and code examples
  • Be Constructive: Frame criticism as opportunities for improvement
  • Be Balanced: Acknowledge both strengths and weaknesses
  • Be Actionable: Provide concrete recommendations with examples
  • Consider Context: Account for project constraints, team size, timelines
  • Avoid Bias: Don't favor certain patterns/styles without justification

Usage Examples

# Review recent work from conversation

/critique

# Review specific files

/critique src/feature.ts src/feature.test.ts

# Review with specific focus

/critique --focus=security

# Review a git commit

/critique HEAD~1..HEAD

Notes

  • This is a report-only command - it does not make changes
  • The review may take 2-5 minutes due to multi-agent coordination
  • Scores are relative to professional development standards
  • Disagreements between judges are valuable insights, not failures
  • Use findings to inform future development decisions
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card