SKILL.md
$28
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based bug and vulnerability detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns (OWASP, CWE)
- Creating custom detection rules for your codebase
- Data flow analysis with taint mode
Installation (CLI)
# pip (recommended)
python3 -m pip install semgrep
# Homebrew
brew install semgrep
# Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
Part 1: Running Scans
Quick Scan
semgrep --config auto . # Auto-detect rules
Using Rulesets
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple
Ruleset
Description
p/default
General security and code quality
p/security-audit
Comprehensive security rules
p/owasp-top-ten
OWASP Top 10 vulnerabilities
p/cwe-top-25
CWE Top 25 vulnerabilities
p/trailofbits
Trail of Bits security rules
p/python
Python-specific
p/javascript
JavaScript-specific
p/golang
Go-specific
Output Formats
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF
semgrep --config p/security-audit --json -o results.json . # JSON
Scan Specific Paths
semgrep --config p/python app.py # Single file
semgrep --config p/javascript src/ # Directory
semgrep --config auto --include='**/test/**' . # Include tests
Configuration
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/
Suppress False Positives
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrep
Part 2: Creating Custom Rules
When to Create Custom Rules
- Detecting project-specific vulnerability patterns
- Enforcing internal coding standards
- Building security checks for custom frameworks
- Creating taint-mode rules for data flow analysis
Approach Selection
Approach
Use When
Taint mode
Data flows from untrusted source to dangerous sink (injection vulnerabilities)
Pattern matching
Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values)
Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).
Quick Start: Pattern Matching
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"
Quick Start: Taint Mode
rules:
- id: command-injection
languages: [python]
message: User input flows to command execution
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: os.system(...)
- pattern: subprocess.call($CMD, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
Pattern Syntax Quick Reference
Syntax
Description
Example
...
Match anything
func(...)
$VAR
Capture metavariable
$FUNC($INPUT)
<... ...>
Deep expression match
<... user_input ...>
Operator
Description
pattern
Match exact pattern
patterns
All must match (AND)
pattern-either
Any matches (OR)
pattern-not
Exclude matches
pattern-inside
Match only inside context
pattern-not-inside
Match only outside context
metavariable-regex
Regex on captured value
Testing Rules
Test-first is mandatory. Create test files with annotations:
# test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
Run tests:
semgrep --test --config rule.yaml test-file
Command Reference
Task
Command
Run tests
semgrep --test --config rule.yaml test-file
Validate YAML
semgrep --validate --config rule.yaml
Dump AST
semgrep --dump-ast -l <lang> <file>
Debug taint flow
semgrep --dataflow-traces -f rule.yaml file
Rule Creation Workflow
- Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
- Create test cases first - Write
ruleid:andok:annotations before the rule
- Analyze AST - Run
semgrep --dump-astto understand code structure
- Write the rule - Start simple, iterate
- Test until 100% pass - No "missed lines" or "incorrect lines"
- Optimize patterns - Remove redundancies only after tests pass
Output structure:
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file
Detailed References
Official Semgrep Documentation:
- Rule Syntax - Complete YAML structure, operators, and options
- Rule Schema - Full JSON schema specification
Local References:
- Workflow Guide - Complete step-by-step rule creation process
- Quick Reference - Pattern operators and taint components
Anti-Patterns to Avoid
Too broad:
# BAD: Matches any function call
pattern: $FUNC(...)
# GOOD: Specific dangerous function
pattern: eval(...)
Missing safe cases:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)
# GOOD: Include safe cases
# ruleid: my-rule
dangerous(user_input)
# ok: my-rule
dangerous(sanitize(user_input))
Rationalizations to Reject
Shortcut
Why It's Wrong
"Semgrep found nothing, code is clean"
Semgrep is pattern-based; can't track complex cross-function data flow
"The pattern looks complete"
Untested rules have hidden false positives/negatives
"It matches the vulnerable case"
Matching vulnerabilities is half the job; verify safe cases don't match
"Taint mode is overkill"
For injection vulnerabilities, taint mode gives better precision
"One test case is enough"
Include edge cases: different coding styles, sanitized inputs, safe alternatives
CI/CD Integration
GitHub Actions
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *'
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbits
Resources
Rule Writing:
- Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
General:
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules