SKILL.md
$27
Binary Pattern
Likely VM Type
Start With
while(1) { switch(bytecode[pc]) }
Switch-based dispatcher
Map each case to an operation
Indirect jump via table jmp [table + opcode*8]
Table-based dispatcher
Dump jump table, analyze handlers
Nested if-else chain on byte value
If-chain dispatcher
Same as switch, just different syntax
Stack push/pop dominant operations
Stack-based VM
Identify push, pop, arithmetic ops
reg[X] = ... array operations
Register-based VM
Map register indices to operations
2D grid + direction input
Maze challenge
Extract grid, apply BFS/DFS
1. CUSTOM VM IDENTIFICATION
1.1 Structural Indicators
VM Architecture Components:
┌─────────────────────────────────┐
│ Bytecode Program (data section)│
├─────────────────────────────────┤
│ Program Counter (pc/ip) │
│ Register File / Stack │
│ Memory / Data Area │
├─────────────────────────────────┤
│ Dispatcher Loop │
│ ├─ Fetch: opcode = code[pc] │
│ ├─ Decode: lookup handler │
│ └─ Execute: run handler │
└─────────────────────────────────┘
1.2 IDA/Ghidra Signatures
Switch dispatcher (most common in CTF):
while (running) {
unsigned char op = bytecode[pc++];
switch (op) {
case 0x00: /* nop */ break;
case 0x01: /* push imm */ stack[sp++] = bytecode[pc++]; break;
case 0x02: /* add */ stack[sp-2] += stack[sp-1]; sp--; break;
// ...
case 0xFF: /* halt */ running = 0; break;
}
}
Table dispatcher (more optimized):
typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };
while (running) {
handlers[bytecode[pc++]](&ctx);
}
2. ANALYSIS METHODOLOGY
Step 1: Find the Dispatcher
Look for:
- Large switch statement (many cases) in a loop
- Array of function pointers indexed by a byte from a data buffer
- Single function with high cyclomatic complexity
- Cross-references to a data buffer read byte-by-byte
Step 2: Map Opcodes to Operations
For each case/handler, determine:
Property
How to Identify
Opcode value
Case number or table index
Operation type
Register/stack modifications
Operand count
How many bytes consumed after opcode
Operand type
Immediate value, register index, or memory address
Side effects
Output, memory write, flag modification
Step 3: Extract Bytecode Program
# Typical extraction from binary
import struct
with open('challenge', 'rb') as f:
f.seek(bytecode_offset)
bytecode = f.read(bytecode_length)
# Or from IDA:
# bytecode = idc.get_bytes(bytecode_addr, bytecode_len)
Step 4: Write Custom Disassembler
OPCODES = {
0x00: ("nop", 0), # (mnemonic, operand_bytes)
0x01: ("push", 1), # push immediate byte
0x02: ("pop", 0),
0x03: ("add", 0),
0x04: ("sub", 0),
0x05: ("xor", 0),
0x06: ("cmp", 0),
0x07: ("jmp", 2), # jump to 16-bit address
0x08: ("je", 2),
0x09: ("jne", 2),
0x0A: ("mov", 2), # mov reg, imm
0x0B: ("load", 1), # load from memory[operand]
0x0C: ("store",1), # store to memory[operand]
0x0D: ("print",0),
0x0E: ("read", 0), # read input
0xFF: ("halt", 0),
}
def disassemble(bytecode):
pc = 0
while pc < len(bytecode):
op = bytecode[pc]
if op not in OPCODES:
print(f" {pc:04x}: UNKNOWN {op:#04x}")
pc += 1
continue
mnemonic, operand_size = OPCODES[op]
operands = bytecode[pc+1:pc+1+operand_size]
operand_str = ' '.join(f'{b:#04x}' for b in operands)
print(f" {pc:04x}: {mnemonic:8s} {operand_str}")
pc += 1 + operand_size
disassemble(bytecode)
Step 5: Analyze Disassembled Program
With the custom disassembly, apply standard reverse engineering:
- Identify input reading (read opcode)
- Trace data flow from input to comparison
- Determine success/failure conditions
- Extract the check logic (often XOR/ADD transformations of input compared against constants)
3. COMMON VM PATTERNS IN CTF
3.1 Stack-Based VM
Operations work on a stack (like JVM or Python bytecode).
Opcode
Operation
Stack Effect
PUSH imm
Push immediate value
[...] → [..., imm]
POP
Discard top
[..., a] → [...]
ADD
Add top two
[..., a, b] → [..., a+b]
SUB
Subtract
[..., a, b] → [..., a-b]
MUL
Multiply
[..., a, b] → [..., a*b]
XOR
Bitwise XOR
[..., a, b] → [..., a^b]
CMP
Compare
[..., a, b] → [..., (a==b)]
JMP addr
Unconditional jump
no change
JZ addr
Jump if top is zero
[..., a] → [...]
Output top as char
[..., a] → [...]
READ
Read char to stack
[...] → [..., input]
HALT
Stop execution
-
3.2 Register-Based VM
Operations use register indices (like x86, ARM).
Opcode
Format
Operation
MOV r, imm
0x01 RR II II
reg[R] = imm16
MOV r1, r2
0x02 R1 R2
reg[R1] = reg[R2]
ADD r1, r2
0x03 R1 R2
reg[R1] += reg[R2]
SUB r1, r2
0x04 R1 R2
reg[R1] -= reg[R2]
XOR r1, r2
0x05 R1 R2
reg[R1] ^= reg[R2]
CMP r1, r2
0x06 R1 R2
flags = compare(r1, r2)
JMP addr
0x07 AA AA
pc = addr
JE addr
0x08 AA AA
if equal: pc = addr
LOAD r, [addr]
0x09 RR AA
reg[R] = mem[addr]
STORE [addr], r
0x0A AA RR
mem[addr] = reg[R]
SYSCALL
0x0B
I/O operation based on reg[0]
HALT
0xFF
stop
3.3 Brainfuck-like / Esoteric VMs
BF Command
VM Equivalent
Description
>
INC ptr
Move data pointer right
<
DEC ptr
Move data pointer left
+
INC [ptr]
Increment byte at pointer
-
DEC [ptr]
Decrement byte at pointer
.
OUTPUT [ptr]
Output byte at pointer
,
INPUT [ptr]
Input byte to pointer
[
JZ forward
Jump past ] if byte is zero
]
JNZ back
Jump back to [ if byte is nonzero
4. MAZE CHALLENGES
4.1 Identification
- Binary reads directional input (WASD, arrow keys, UDLR)
- 2D array in data section (walls, paths, start, end)
- Position tracking with x,y coordinates
- Win condition at specific coordinates
4.2 Map Extraction
# Extract maze grid from binary data section
MAZE_ADDR = 0x601060
WIDTH = 20
HEIGHT = 15
# From binary dump:
maze = []
for row in range(HEIGHT):
line = ""
for col in range(WIDTH):
cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr]
if cell == 0: line += "." # path
elif cell == 1: line += "#" # wall
elif cell == 2: line += "S" # start
elif cell == 3: line += "E" # end
else: line += "?"
maze.append(line)
print(line)
4.3 Automated Solving
from collections import deque
def solve_maze(maze, start, end):
"""BFS solver returns direction string."""
rows, cols = len(maze), len(maze[0])
directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
queue = deque([(start, "")])
visited = {start}
while queue:
(r, c), path = queue.popleft()
if (r, c) == end:
return path
for name, (dr, dc) in directions.items():
nr, nc = r + dr, c + dc
if (0 <= nr < rows and 0 <= nc < cols and
maze[nr][nc] != '#' and (nr, nc) not in visited):
visited.add((nr, nc))
queue.append(((nr, nc), path + name))
return None
# Find start and end positions
for r, row in enumerate(maze):
for c, cell in enumerate(row):
if cell == 'S': start = (r, c)
if cell == 'E': end = (r, c)
solution = solve_maze(maze, start, end)
print(f"Path: {solution}")
4.4 Direction Encoding
Different challenges encode directions differently:
Encoding
Up
Down
Left
Right
WASD
W
S
A
D
UDLR
U
D
L
R
Arrow keys
↑ (0x48)
↓ (0x50)
← (0x4B)
→ (0x4D)
Numbers
1
2
3
4
Hex opcodes
0x01
0x02
0x03
0x04
5. REAL-WORLD VM PROTECTORS
5.1 VMProtect Analysis Approach
1. Find VM entry: search for pushad/pushfd sequence
2. Identify VM context structure (registers, flags, bytecode pointer)
3. Locate handler table (often obfuscated with opaque predicates)
4. For each handler:
a. Remove junk code / opaque predicates
b. Identify the core operation
c. Document handler semantics
5. Trace bytecode execution (instruction-level trace)
6. Reconstruct original code from trace
5.2 Tigress Obfuscator
Academic VM obfuscator with configurable protection layers.
Feature
Approach
Single-dispatch VM
Standard handler extraction
Split handlers
Handlers spread across multiple functions
Nested VMs
Outer VM handler invokes inner VM
Encrypted bytecode
Dynamic decryption before each fetch
Polymorphic handlers
Different code for same operation on each build
5.3 Common VM Protector Patterns
Protector
Dispatcher Style
Difficulty
VMProtect
Table + opaque predicates
High
Themida (Code Virtualizer)
CISC-like, large handler set
High
Tigress
Configurable, academic
Medium-High
Custom CTF VM
Simple switch
Low-Medium
Movfuscator
All-mov computation
Medium
6. TOOLS
Tool
Purpose
Usage
IDA Pro
Identify dispatcher, reverse handlers
F5 decompile, xref analysis
Ghidra
Free alternative with Sleigh processor modules
Write custom processor for VM ISA
angr
Symbolic execution through VM
Treat entire VM as constraint system
Pin / DynamoRIO
Dynamic instrumentation for tracing
Record opcode handler execution sequence
REVEN
Full-system trace recording
Replay and analyze VM execution
Unicorn
Emulate VM execution
Fast handler emulation
Miasm
IR-based analysis
Lift VM handlers to IR for analysis
Custom Python
Write disassembler/decompiler
Per-challenge custom tooling
Ghidra Sleigh Processor Module
For recurring VM architectures, write a Sleigh processor specification:
define space ram type=ram_space size=2 default;
define space register type=register_space size=1;
define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];
define token opcode(8)
op = (0,7)
;
:NOP is op=0x00 { }
:PUSH imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP is op=0x02 { SP = SP + 1; }
:ADD is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }
7. DECISION TREE
Binary contains custom bytecode interpreter?
│
├─ Can you identify the dispatcher?
│ ├─ Yes (switch/table/if-chain)
│ │ ├─ Few opcodes (< 20) → Simple CTF VM
│ │ │ ├─ Stack-based → map push/pop/arithmetic ops
│ │ │ ├─ Register-based → map mov/add/cmp ops
│ │ │ └─ Write disassembler → analyze program → solve
│ │ │
│ │ └─ Many opcodes (50+) → Commercial protector
│ │ ├─ Known protector → use specific deprotection tools
│ │ └─ Custom → trace execution, pattern-match handlers
│ │
│ └─ No clear dispatcher
│ ├─ All-mov instructions → movfuscator
│ ├─ Encrypted bytecode → find decryption, dump after decode
│ └─ Split/distributed handlers → trace execution to find them
│
├─ Is it a maze challenge?
│ ├─ Extract grid from data section
│ ├─ Identify direction encoding
│ ├─ BFS/DFS to find shortest path
│ └─ Convert path to expected input format
│
├─ Is there input validation in VM?
│ ├─ Small input space → brute-force via Unicorn emulation
│ ├─ Known format → constrained angr solve
│ └─ Complex check → write disassembler, analyze check logic
│
└─ Multiple VM layers (VM in VM)?
├─ Analyze outer VM first
├─ Extract inner bytecode
├─ Repeat analysis for inner VM
└─ Consider: symbolic execution may handle nested VMs directly
8. CTF SOLVING WORKFLOW
1. Run the binary — understand I/O behavior
└─ What input does it expect? What output on success/failure?
2. Open in IDA/Ghidra — find the main loop
└─ Look for while/for loop with switch or indirect jump
3. Identify VM components:
├─ Bytecode location (where is the program data?)
├─ PC/IP variable (how is current position tracked?)
├─ Registers/stack (where is VM state stored?)
└─ I/O handlers (which opcodes read input / write output?)
4. Map all opcodes (create the ISA specification)
└─ For each case/handler: opcode number, operation, operands
5. Write disassembler in Python
└─ Output readable assembly for the bytecode
6. Analyze the disassembled program:
├─ Find input reading
├─ Trace transformations applied to input
├─ Find comparison against expected values
└─ Reverse the transformation to find valid input
7. Solve:
├─ If simple transforms (XOR, ADD) → reverse manually
├─ If complex → feed to Z3 as constraints
└─ If maze → extract grid, run pathfinding