vm-and-bytecode-reverse

>-

INSTALLATION
npx skills add https://github.com/yaklang/hack-skills --skill vm-and-bytecode-reverse
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Binary Pattern

Likely VM Type

Start With

while(1) { switch(bytecode[pc]) }

Switch-based dispatcher

Map each case to an operation

Indirect jump via table jmp [table + opcode*8]

Table-based dispatcher

Dump jump table, analyze handlers

Nested if-else chain on byte value

If-chain dispatcher

Same as switch, just different syntax

Stack push/pop dominant operations

Stack-based VM

Identify push, pop, arithmetic ops

reg[X] = ... array operations

Register-based VM

Map register indices to operations

2D grid + direction input

Maze challenge

Extract grid, apply BFS/DFS

1. CUSTOM VM IDENTIFICATION

1.1 Structural Indicators

VM Architecture Components:

┌─────────────────────────────────┐

│  Bytecode Program (data section)│

├─────────────────────────────────┤

│  Program Counter (pc/ip)        │

│  Register File / Stack          │

│  Memory / Data Area             │

├─────────────────────────────────┤

│  Dispatcher Loop                │

│  ├─ Fetch: opcode = code[pc]    │

│  ├─ Decode: lookup handler      │

│  └─ Execute: run handler        │

└─────────────────────────────────┘

1.2 IDA/Ghidra Signatures

Switch dispatcher (most common in CTF):

while (running) {

    unsigned char op = bytecode[pc++];

    switch (op) {

        case 0x00: /* nop */       break;

        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;

        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;

        // ...

        case 0xFF: /* halt */      running = 0; break;

    }

}

Table dispatcher (more optimized):

typedef void (*handler_t)(vm_ctx_t*);

handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {

    handlers[bytecode[pc++]](&ctx);

}

2. ANALYSIS METHODOLOGY

Step 1: Find the Dispatcher

Look for:

  • Large switch statement (many cases) in a loop
  • Array of function pointers indexed by a byte from a data buffer
  • Single function with high cyclomatic complexity
  • Cross-references to a data buffer read byte-by-byte

Step 2: Map Opcodes to Operations

For each case/handler, determine:

Property

How to Identify

Opcode value

Case number or table index

Operation type

Register/stack modifications

Operand count

How many bytes consumed after opcode

Operand type

Immediate value, register index, or memory address

Side effects

Output, memory write, flag modification

Step 3: Extract Bytecode Program

# Typical extraction from binary

import struct

with open('challenge', 'rb') as f:

    f.seek(bytecode_offset)

    bytecode = f.read(bytecode_length)

# Or from IDA:

# bytecode = idc.get_bytes(bytecode_addr, bytecode_len)

Step 4: Write Custom Disassembler

OPCODES = {

    0x00: ("nop",  0),    # (mnemonic, operand_bytes)

    0x01: ("push", 1),    # push immediate byte

    0x02: ("pop",  0),

    0x03: ("add",  0),

    0x04: ("sub",  0),

    0x05: ("xor",  0),

    0x06: ("cmp",  0),

    0x07: ("jmp",  2),    # jump to 16-bit address

    0x08: ("je",   2),

    0x09: ("jne",  2),

    0x0A: ("mov",  2),    # mov reg, imm

    0x0B: ("load", 1),    # load from memory[operand]

    0x0C: ("store",1),    # store to memory[operand]

    0x0D: ("print",0),

    0x0E: ("read", 0),    # read input

    0xFF: ("halt", 0),

}

def disassemble(bytecode):

    pc = 0

    while pc < len(bytecode):

        op = bytecode[pc]

        if op not in OPCODES:

            print(f"  {pc:04x}: UNKNOWN {op:#04x}")

            pc += 1

            continue

        mnemonic, operand_size = OPCODES[op]

        operands = bytecode[pc+1:pc+1+operand_size]

        operand_str = ' '.join(f'{b:#04x}' for b in operands)

        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")

        pc += 1 + operand_size

disassemble(bytecode)

Step 5: Analyze Disassembled Program

With the custom disassembly, apply standard reverse engineering:

  • Identify input reading (read opcode)
  • Trace data flow from input to comparison
  • Determine success/failure conditions
  • Extract the check logic (often XOR/ADD transformations of input compared against constants)

3. COMMON VM PATTERNS IN CTF

3.1 Stack-Based VM

Operations work on a stack (like JVM or Python bytecode).

Opcode

Operation

Stack Effect

PUSH imm

Push immediate value

[...] → [..., imm]

POP

Discard top

[..., a] → [...]

ADD

Add top two

[..., a, b] → [..., a+b]

SUB

Subtract

[..., a, b] → [..., a-b]

MUL

Multiply

[..., a, b] → [..., a*b]

XOR

Bitwise XOR

[..., a, b] → [..., a^b]

CMP

Compare

[..., a, b] → [..., (a==b)]

JMP addr

Unconditional jump

no change

JZ addr

Jump if top is zero

[..., a] → [...]

PRINT

Output top as char

[..., a] → [...]

READ

Read char to stack

[...] → [..., input]

HALT

Stop execution

-

3.2 Register-Based VM

Operations use register indices (like x86, ARM).

Opcode

Format

Operation

MOV r, imm

0x01 RR II II

reg[R] = imm16

MOV r1, r2

0x02 R1 R2

reg[R1] = reg[R2]

ADD r1, r2

0x03 R1 R2

reg[R1] += reg[R2]

SUB r1, r2

0x04 R1 R2

reg[R1] -= reg[R2]

XOR r1, r2

0x05 R1 R2

reg[R1] ^= reg[R2]

CMP r1, r2

0x06 R1 R2

flags = compare(r1, r2)

JMP addr

0x07 AA AA

pc = addr

JE addr

0x08 AA AA

if equal: pc = addr

LOAD r, [addr]

0x09 RR AA

reg[R] = mem[addr]

STORE [addr], r

0x0A AA RR

mem[addr] = reg[R]

SYSCALL

0x0B

I/O operation based on reg[0]

HALT

0xFF

stop

3.3 Brainfuck-like / Esoteric VMs

BF Command

VM Equivalent

Description

>

INC ptr

Move data pointer right

<

DEC ptr

Move data pointer left

+

INC [ptr]

Increment byte at pointer

-

DEC [ptr]

Decrement byte at pointer

.

OUTPUT [ptr]

Output byte at pointer

,

INPUT [ptr]

Input byte to pointer

[

JZ forward

Jump past ] if byte is zero

]

JNZ back

Jump back to [ if byte is nonzero

4. MAZE CHALLENGES

4.1 Identification

  • Binary reads directional input (WASD, arrow keys, UDLR)
  • 2D array in data section (walls, paths, start, end)
  • Position tracking with x,y coordinates
  • Win condition at specific coordinates

4.2 Map Extraction

# Extract maze grid from binary data section

MAZE_ADDR = 0x601060

WIDTH = 20

HEIGHT = 15

# From binary dump:

maze = []

for row in range(HEIGHT):

    line = ""

    for col in range(WIDTH):

        cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr]

        if cell == 0: line += "."    # path

        elif cell == 1: line += "#"  # wall

        elif cell == 2: line += "S"  # start

        elif cell == 3: line += "E"  # end

        else: line += "?"

    maze.append(line)

    print(line)

4.3 Automated Solving

from collections import deque

def solve_maze(maze, start, end):

    """BFS solver returns direction string."""

    rows, cols = len(maze), len(maze[0])

    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}

    queue = deque([(start, "")])

    visited = {start}

    while queue:

        (r, c), path = queue.popleft()

        if (r, c) == end:

            return path

        for name, (dr, dc) in directions.items():

            nr, nc = r + dr, c + dc

            if (0 <= nr < rows and 0 <= nc < cols and

                maze[nr][nc] != '#' and (nr, nc) not in visited):

                visited.add((nr, nc))

                queue.append(((nr, nc), path + name))

    return None

# Find start and end positions

for r, row in enumerate(maze):

    for c, cell in enumerate(row):

        if cell == 'S': start = (r, c)

        if cell == 'E': end = (r, c)

solution = solve_maze(maze, start, end)

print(f"Path: {solution}")

4.4 Direction Encoding

Different challenges encode directions differently:

Encoding

Up

Down

Left

Right

WASD

W

S

A

D

UDLR

U

D

L

R

Arrow keys

↑ (0x48)

↓ (0x50)

← (0x4B)

→ (0x4D)

Numbers

1

2

3

4

Hex opcodes

0x01

0x02

0x03

0x04

5. REAL-WORLD VM PROTECTORS

5.1 VMProtect Analysis Approach

1. Find VM entry: search for pushad/pushfd sequence

2. Identify VM context structure (registers, flags, bytecode pointer)

3. Locate handler table (often obfuscated with opaque predicates)

4. For each handler:

   a. Remove junk code / opaque predicates

   b. Identify the core operation

   c. Document handler semantics

5. Trace bytecode execution (instruction-level trace)

6. Reconstruct original code from trace

5.2 Tigress Obfuscator

Academic VM obfuscator with configurable protection layers.

Feature

Approach

Single-dispatch VM

Standard handler extraction

Split handlers

Handlers spread across multiple functions

Nested VMs

Outer VM handler invokes inner VM

Encrypted bytecode

Dynamic decryption before each fetch

Polymorphic handlers

Different code for same operation on each build

5.3 Common VM Protector Patterns

Protector

Dispatcher Style

Difficulty

VMProtect

Table + opaque predicates

High

Themida (Code Virtualizer)

CISC-like, large handler set

High

Tigress

Configurable, academic

Medium-High

Custom CTF VM

Simple switch

Low-Medium

Movfuscator

All-mov computation

Medium

6. TOOLS

Tool

Purpose

Usage

IDA Pro

Identify dispatcher, reverse handlers

F5 decompile, xref analysis

Ghidra

Free alternative with Sleigh processor modules

Write custom processor for VM ISA

angr

Symbolic execution through VM

Treat entire VM as constraint system

Pin / DynamoRIO

Dynamic instrumentation for tracing

Record opcode handler execution sequence

REVEN

Full-system trace recording

Replay and analyze VM execution

Unicorn

Emulate VM execution

Fast handler emulation

Miasm

IR-based analysis

Lift VM handlers to IR for analysis

Custom Python

Write disassembler/decompiler

Per-challenge custom tooling

Ghidra Sleigh Processor Module

For recurring VM architectures, write a Sleigh processor specification:

define space ram      type=ram_space      size=2  default;

define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)

    op = (0,7)

;

:NOP    is op=0x00 { }

:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }

:POP    is op=0x02 { SP = SP + 1; }

:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

7. DECISION TREE

Binary contains custom bytecode interpreter?

│

├─ Can you identify the dispatcher?

│  ├─ Yes (switch/table/if-chain)

│  │  ├─ Few opcodes (< 20) → Simple CTF VM

│  │  │  ├─ Stack-based → map push/pop/arithmetic ops

│  │  │  ├─ Register-based → map mov/add/cmp ops

│  │  │  └─ Write disassembler → analyze program → solve

│  │  │

│  │  └─ Many opcodes (50+) → Commercial protector

│  │     ├─ Known protector → use specific deprotection tools

│  │     └─ Custom → trace execution, pattern-match handlers

│  │

│  └─ No clear dispatcher

│     ├─ All-mov instructions → movfuscator

│     ├─ Encrypted bytecode → find decryption, dump after decode

│     └─ Split/distributed handlers → trace execution to find them

│

├─ Is it a maze challenge?

│  ├─ Extract grid from data section

│  ├─ Identify direction encoding

│  ├─ BFS/DFS to find shortest path

│  └─ Convert path to expected input format

│

├─ Is there input validation in VM?

│  ├─ Small input space → brute-force via Unicorn emulation

│  ├─ Known format → constrained angr solve

│  └─ Complex check → write disassembler, analyze check logic

│

└─ Multiple VM layers (VM in VM)?

   ├─ Analyze outer VM first

   ├─ Extract inner bytecode

   ├─ Repeat analysis for inner VM

   └─ Consider: symbolic execution may handle nested VMs directly

8. CTF SOLVING WORKFLOW

1. Run the binary — understand I/O behavior

   └─ What input does it expect? What output on success/failure?

2. Open in IDA/Ghidra — find the main loop

   └─ Look for while/for loop with switch or indirect jump

3. Identify VM components:

   ├─ Bytecode location (where is the program data?)

   ├─ PC/IP variable (how is current position tracked?)

   ├─ Registers/stack (where is VM state stored?)

   └─ I/O handlers (which opcodes read input / write output?)

4. Map all opcodes (create the ISA specification)

   └─ For each case/handler: opcode number, operation, operands

5. Write disassembler in Python

   └─ Output readable assembly for the bytecode

6. Analyze the disassembled program:

   ├─ Find input reading

   ├─ Trace transformations applied to input

   ├─ Find comparison against expected values

   └─ Reverse the transformation to find valid input

7. Solve:

   ├─ If simple transforms (XOR, ADD) → reverse manually

   ├─ If complex → feed to Z3 as constraints

   └─ If maze → extract grid, run pathfinding
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card