semgrep

Run Semgrep static analysis scans and create custom detection rules. Use when asked to scan code with Semgrep, find security vulnerabilities, write custom YAML…

INSTALLATION
npx skills add https://github.com/semgrep/skills --skill semgrep
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$28

Ideal scenarios:

  • Quick security scans (minutes, not hours)
  • Pattern-based bug and vulnerability detection
  • Enforcing coding standards and best practices
  • Finding known vulnerability patterns (OWASP, CWE)
  • Creating custom detection rules for your codebase
  • Data flow analysis with taint mode

Installation (CLI)

# pip (recommended)

python3 -m pip install semgrep

# Homebrew

brew install semgrep

# Docker

docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

Part 1: Running Scans

Quick Scan

semgrep --config auto .                    # Auto-detect rules

Using Rulesets

semgrep --config p/<RULESET> .             # Single ruleset

semgrep --config p/security-audit --config p/trailofbits .  # Multiple

Ruleset

Description

p/default

General security and code quality

p/security-audit

Comprehensive security rules

p/owasp-top-ten

OWASP Top 10 vulnerabilities

p/cwe-top-25

CWE Top 25 vulnerabilities

p/trailofbits

Trail of Bits security rules

p/python

Python-specific

p/javascript

JavaScript-specific

p/golang

Go-specific

Output Formats

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF

semgrep --config p/security-audit --json -o results.json .     # JSON

Scan Specific Paths

semgrep --config p/python app.py           # Single file

semgrep --config p/javascript src/         # Directory

semgrep --config auto --include='**/test/**' .  # Include tests

Configuration

.semgrepignore

tests/fixtures/

**/testdata/

generated/

vendor/

node_modules/

Suppress False Positives

password = get_from_vault()  # nosemgrep: hardcoded-password

dangerous_but_safe()  # nosemgrep

Part 2: Creating Custom Rules

When to Create Custom Rules

  • Detecting project-specific vulnerability patterns
  • Enforcing internal coding standards
  • Building security checks for custom frameworks
  • Creating taint-mode rules for data flow analysis

Approach Selection

Approach

Use When

Taint mode

Data flows from untrusted source to dangerous sink (injection vulnerabilities)

Pattern matching

Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values)

Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).

Quick Start: Pattern Matching

rules:

  - id: hardcoded-password

    languages: [python]

    message: "Hardcoded password detected: $PASSWORD"

    severity: ERROR

    pattern: password = "$PASSWORD"

Quick Start: Taint Mode

rules:

  - id: command-injection

    languages: [python]

    message: User input flows to command execution

    severity: ERROR

    mode: taint

    pattern-sources:

      - pattern: request.args.get(...)

      - pattern: request.form[...]

    pattern-sinks:

      - pattern: os.system(...)

      - pattern: subprocess.call($CMD, shell=True, ...)

    pattern-sanitizers:

      - pattern: shlex.quote(...)

Pattern Syntax Quick Reference

Syntax

Description

Example

...

Match anything

func(...)

$VAR

Capture metavariable

$FUNC($INPUT)

<... ...>

Deep expression match

<... user_input ...>

Operator

Description

pattern

Match exact pattern

patterns

All must match (AND)

pattern-either

Any matches (OR)

pattern-not

Exclude matches

pattern-inside

Match only inside context

pattern-not-inside

Match only outside context

metavariable-regex

Regex on captured value

Testing Rules

Test-first is mandatory. Create test files with annotations:

# test_rule.py

def test_vulnerable():

    user_input = request.args.get("id")

    # ruleid: my-rule-id

    cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe():

    user_input = request.args.get("id")

    # ok: my-rule-id

    cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

Run tests:

semgrep --test --config rule.yaml test-file

Command Reference

Task

Command

Run tests

semgrep --test --config rule.yaml test-file

Validate YAML

semgrep --validate --config rule.yaml

Dump AST

semgrep --dump-ast -l <lang> <file>

Debug taint flow

semgrep --dataflow-traces -f rule.yaml file

Rule Creation Workflow

  • Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
  • Create test cases first - Write ruleid: and ok: annotations before the rule
  • Analyze AST - Run semgrep --dump-ast to understand code structure
  • Write the rule - Start simple, iterate
  • Test until 100% pass - No "missed lines" or "incorrect lines"
  • Optimize patterns - Remove redundancies only after tests pass

Output structure:

<rule-id>/

├── <rule-id>.yaml     # Semgrep rule

└── <rule-id>.<ext>    # Test file

Detailed References

Official Semgrep Documentation:

  • Rule Syntax - Complete YAML structure, operators, and options

Local References:

Anti-Patterns to Avoid

Too broad:

# BAD: Matches any function call

pattern: $FUNC(...)

# GOOD: Specific dangerous function

pattern: eval(...)

Missing safe cases:

# BAD: Only tests vulnerable case

# ruleid: my-rule

dangerous(user_input)

# GOOD: Include safe cases

# ruleid: my-rule

dangerous(user_input)

# ok: my-rule

dangerous(sanitize(user_input))

Rationalizations to Reject

Shortcut

Why It's Wrong

"Semgrep found nothing, code is clean"

Semgrep is pattern-based; can't track complex cross-function data flow

"The pattern looks complete"

Untested rules have hidden false positives/negatives

"It matches the vulnerable case"

Matching vulnerabilities is half the job; verify safe cases don't match

"Taint mode is overkill"

For injection vulnerabilities, taint mode gives better precision

"One test case is enough"

Include edge cases: different coding styles, sanitized inputs, safe alternatives

CI/CD Integration

GitHub Actions

name: Semgrep

on:

  push:

    branches: [main]

  pull_request:

  schedule:

    - cron: '0 0 1 * *'

jobs:

  semgrep:

    runs-on: ubuntu-latest

    container:

      image: returntocorp/semgrep

    steps:

      - uses: actions/checkout@v4

        with:

          fetch-depth: 0

      - name: Run Semgrep

        run: |

          if [ "${{ github.event_name }}" = "pull_request" ]; then

            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}

          else

            semgrep ci

          fi

        env:

          SEMGREP_RULES: >-

            p/security-audit

            p/owasp-top-ten

            p/trailofbits

Resources

Rule Writing:

General:

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card