SKILL.md

$28

Ideal scenarios:

Quick security scans (minutes, not hours)

Pattern-based bug and vulnerability detection

Enforcing coding standards and best practices

Finding known vulnerability patterns (OWASP, CWE)

Creating custom detection rules for your codebase

Data flow analysis with taint mode

Installation (CLI)

# pip (recommended)

python3 -m pip install semgrep

# Homebrew

brew install semgrep

# Docker

docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

Part 1: Running Scans

Name: semgrep
Author: semgrep

Quick Scan

semgrep --config auto .                    # Auto-detect rules

Using Rulesets

semgrep --config p/<RULESET> .             # Single ruleset

semgrep --config p/security-audit --config p/trailofbits .  # Multiple

Ruleset

Description

p/default

General security and code quality

p/security-audit

Comprehensive security rules

p/owasp-top-ten

OWASP Top 10 vulnerabilities

p/cwe-top-25

CWE Top 25 vulnerabilities

p/trailofbits

Trail of Bits security rules

p/python

Python-specific

p/javascript

JavaScript-specific

p/golang

Go-specific

Output Formats

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF

semgrep --config p/security-audit --json -o results.json .     # JSON

Scan Specific Paths

semgrep --config p/python app.py           # Single file

semgrep --config p/javascript src/         # Directory

semgrep --config auto --include='**/test/**' .  # Include tests

Configuration

.semgrepignore

tests/fixtures/

**/testdata/

generated/

vendor/

node_modules/

Suppress False Positives

password = get_from_vault()  # nosemgrep: hardcoded-password

dangerous_but_safe()  # nosemgrep

Part 2: Creating Custom Rules

When to Create Custom Rules

Detecting project-specific vulnerability patterns

Enforcing internal coding standards

Building security checks for custom frameworks

Creating taint-mode rules for data flow analysis

Approach Selection

Approach

Use When

Taint mode

Data flows from untrusted source to dangerous sink (injection vulnerabilities)

Pattern matching

Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values)

Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).

Quick Start: Pattern Matching

rules:

  - id: hardcoded-password

    languages: [python]

    message: "Hardcoded password detected: $PASSWORD"

    severity: ERROR

    pattern: password = "$PASSWORD"

Quick Start: Taint Mode

rules:

  - id: command-injection

    languages: [python]

    message: User input flows to command execution

    severity: ERROR

    mode: taint

    pattern-sources:

      - pattern: request.args.get(...)

      - pattern: request.form[...]

    pattern-sinks:

      - pattern: os.system(...)

      - pattern: subprocess.call($CMD, shell=True, ...)

    pattern-sanitizers:

      - pattern: shlex.quote(...)

Pattern Syntax Quick Reference

Syntax

Description

Example

...

Match anything

func(...)

$VAR

Capture metavariable

$FUNC($INPUT)

<... ...>

Deep expression match

<... user_input ...>

Operator

Description

pattern

Match exact pattern

patterns

All must match (AND)

pattern-either

Any matches (OR)

pattern-not

Exclude matches

pattern-inside

Match only inside context

pattern-not-inside

Match only outside context

metavariable-regex

Regex on captured value

Testing Rules

Test-first is mandatory. Create test files with annotations:

# test_rule.py

def test_vulnerable():

    user_input = request.args.get("id")

    # ruleid: my-rule-id

    cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe():

    user_input = request.args.get("id")

    # ok: my-rule-id

    cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

Run tests:

semgrep --test --config rule.yaml test-file

Command Reference

Task

Command

Run tests

semgrep --test --config rule.yaml test-file

Validate YAML

semgrep --validate --config rule.yaml

Dump AST

semgrep --dump-ast -l <lang> <file>

Debug taint flow

semgrep --dataflow-traces -f rule.yaml file

Rule Creation Workflow

Analyze the problem - Understand the bug pattern, determine taint vs pattern approach

Create test cases first - Write ruleid: and ok: annotations before the rule

Analyze AST - Run semgrep --dump-ast to understand code structure

Write the rule - Start simple, iterate

Test until 100% pass - No "missed lines" or "incorrect lines"

Optimize patterns - Remove redundancies only after tests pass

Output structure:

<rule-id>/

├── <rule-id>.yaml     # Semgrep rule

└── <rule-id>.<ext>    # Test file

Detailed References

Official Semgrep Documentation:

Rule Syntax - Complete YAML structure, operators, and options

Rule Schema - Full JSON schema specification

Local References:

Workflow Guide - Complete step-by-step rule creation process

Quick Reference - Pattern operators and taint components

Anti-Patterns to Avoid

Too broad:

# BAD: Matches any function call

pattern: $FUNC(...)

# GOOD: Specific dangerous function

pattern: eval(...)

Missing safe cases:

# BAD: Only tests vulnerable case

# ruleid: my-rule

dangerous(user_input)

# GOOD: Include safe cases

# ruleid: my-rule

dangerous(user_input)

# ok: my-rule

dangerous(sanitize(user_input))

Rationalizations to Reject

Shortcut

Why It's Wrong

"Semgrep found nothing, code is clean"

Semgrep is pattern-based; can't track complex cross-function data flow

"The pattern looks complete"

Untested rules have hidden false positives/negatives

"It matches the vulnerable case"

Matching vulnerabilities is half the job; verify safe cases don't match

"Taint mode is overkill"

For injection vulnerabilities, taint mode gives better precision

"One test case is enough"

Include edge cases: different coding styles, sanitized inputs, safe alternatives

CI/CD Integration

GitHub Actions

name: Semgrep

on:

  push:

    branches: [main]

  pull_request:

  schedule:

    - cron: '0 0 1 * *'

jobs:

  semgrep:

    runs-on: ubuntu-latest

    container:

      image: returntocorp/semgrep

    steps:

      - uses: actions/checkout@v4

        with:

          fetch-depth: 0

      - name: Run Semgrep

        run: |

          if [ "${{ github.event_name }}" = "pull_request" ]; then

            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}

          else

            semgrep ci

          fi

        env:

          SEMGREP_RULES: >-

            p/security-audit

            p/owasp-top-ten

            p/trailofbits