code-obfuscation-deobfuscation

>-

INSTALLATION
npx skills add https://github.com/yaklang/hack-skills --skill code-obfuscation-deobfuscation
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Symptom in IDA/Ghidra

Likely Obfuscation

Start With

Flat CFG, single giant switch

Control flow flattening

Symbolic execution to recover CFG

Only mov instructions

movfuscator

demovfuscation / trace-based lifting

pushad/pushfd → VM entry

VM protector

Handler table extraction

XOR loop before code execution

SMC / string encryption

Dynamic analysis, breakpoint after decode

Impossible conditions (opaque predicates)

Junk code insertion

Pattern-based removal

All strings unreadable

String encryption

Hook decryption routine, or emulate

No imports in IAT

Import hiding

Trace GetProcAddress / hash resolution

1. JUNK CODE & OPAQUE PREDICATES

1.1 Junk Code Insertion

Dead code that never affects program output, added to increase analysis time.

Identification:

  • Instructions that write to registers/memory never read afterward
  • Function calls whose return values are discarded and have no side effects
  • Loops with invariant bounds that compute unused results

Removal strategy:

  • Compute def-use chains (IDA/Ghidra data flow analysis)
  • Mark instructions with no downstream use as dead
  • Verify removal doesn't change program behavior (trace comparison)

1.2 Opaque Predicates

Conditional branches where the condition is always true or always false, but this is non-obvious.

Type

Example

Always Evaluates To

Arithmetic

x² ≥ 0

True

Number theory

x*(x+1) % 2 == 0

True (product of consecutive ints)

Pointer-based

ptr == ptr after aliasing

True

Hash-based

CRC32(constant) == known_value

True

Deobfuscation:

  • Abstract interpretation: prove the condition is constant
  • Symbolic execution: Z3 proves ∀x: predicate(x) = True
  • Pattern matching: recognize known opaque predicate families
  • Dynamic: trace and observe the branch is never taken / always taken
import z3

x = z3.BitVec('x', 32)

s = z3.Solver()

s.add(x * (x + 1) % 2 != 0)

print(s.check())  # unsat → always true

2. SELF-MODIFYING CODE (SMC)

Runtime code patching: encrypted code is decrypted just before execution.

2.1 XOR Decryption Loop (Most Common)

lea esi, [encrypted_code]

mov ecx, code_length

mov al, xor_key

decrypt_loop:

    xor byte [esi], al

    inc esi

    loop decrypt_loop

    jmp encrypted_code  ; now decrypted

2.2 Analysis Strategy

1. Identify the decryption routine (look for XOR/ADD/SUB in loops writing to .text)

2. Set breakpoint AFTER the loop completes

3. At breakpoint: dump the decrypted memory region

4. Re-analyze the dumped code in IDA/Ghidra

5. For multi-layer: repeat for each decryption stage

2.3 Automated Unpacking via Emulation

from unicorn import *

from unicorn.x86_const import *

mu = Uc(UC_ARCH_X86, UC_MODE_32)

mu.mem_map(0x400000, 0x10000)

mu.mem_write(0x400000, binary_code)

mu.emu_start(decrypt_entry, decrypt_end)

decrypted = mu.mem_read(code_start, code_length)

3. CONTROL FLOW FLATTENING (CFF)

3.1 Structure

Original sequential blocks are transformed into a dispatcher loop:

Original:      A → B → C → D

Flattened:     ┌──────────────────┐

               │   dispatcher     │

               │   switch(state)  │◄─────┐

               ├──────────────────┤      │

               │ case 1: block A  │──────┤

               │ case 2: block B  │──────┤

               │ case 3: block C  │──────┤

               │ case 4: block D  │──────┘

               └──────────────────┘

Each block sets state = next_state before jumping back to the dispatcher.

3.2 Recovery Techniques

Technique

Tool

Effectiveness

Symbolic execution

angr, Triton, miasm

High — traces all state transitions

Trace-based recovery

Pin/DynamoRIO trace → reconstruct CFG

Medium — covers executed paths only

Pattern matching

Custom IDA/Ghidra script

Medium — works for known flatteners

D-810 (IDA plugin)

IDA Pro

High — specifically designed for CFF

3.3 Symbolic Deflattening (angr approach)

import angr, claripy

proj = angr.Project('./obfuscated')

cfg = proj.analyses.CFGFast()

# Find dispatcher block (highest in-degree basic block)

dispatcher = max(cfg.graph.nodes(), key=lambda n: cfg.graph.in_degree(n))

# For each case block, symbolically determine successor

for block in case_blocks:

    state = proj.factory.blank_state(addr=block.addr)

    # ... solve state variable to find real successor

4. MOVFUSCATOR

4.1 Concept

All computation reduced to mov instructions only (Turing-complete via memory-mapped computation tables). Created by Christopher Domas.

4.2 Identification

  • Function contains only mov instructions (no add, sub, xor, jmp, call)
  • Large lookup tables in data section
  • Memory-mapped flag registers

4.3 Demovfuscation

Approach

Description

demovfuscator (tool)

Static analysis, recovers original operations from mov patterns

Trace + taint analysis

Run with Pin/DynamoRIO, taint inputs, observe computation

Symbolic execution

Treat entire function as constraint system

5. VM PROTECTION (VMProtect / Themida / Code Virtualizer)

5.1 VM Architecture

Protected code → bytecode compiler → custom bytecode

Runtime: VM entry (pushad/pushfd) → fetch → decode → execute → VM exit (popad/popfd)

5.2 VM Entry Point Identification

; Typical VMProtect entry

pushad                    ; save all registers

pushfd                    ; save flags

mov ebp, esp              ; VM stack frame

sub esp, VM_LOCALS_SIZE   ; allocate VM context

mov esi, bytecode_addr    ; bytecode instruction pointer

jmp vm_dispatcher         ; enter VM loop

5.3 Handler Table Extraction

1. Find dispatcher (large switch or indirect jump via table)

2. Each case/entry = one VM handler (implements one VM opcode)

3. Map handler addresses to operations by analyzing each handler:

   - Handler reads operand from bytecode stream (esi)

   - Performs operation on VM registers/stack

   - Advances bytecode pointer

   - Returns to dispatcher

5.4 Devirtualization Approaches

Method

Description

Tool

Manual handler mapping

Reverse each handler, build ISA spec

IDA + scripting

Trace recording

Record all handler executions, reconstruct program

REVEN, Pin

Symbolic lifting

Symbolically execute handlers, lift to IR

Triton, miasm

Pattern matching

Match handler patterns to known VM families

Custom scripts

5.5 VMProtect Specifics

  • Uses opaque predicates in dispatcher
  • Handler mutation: same opcode, different handler code per build
  • Multiple VM layers (VM inside VM)
  • Integrates anti-debug and integrity checks

6. STRING ENCRYPTION

6.1 Common Patterns

Pattern

Example

Recovery

XOR loop

for (i=0; i<len; i++) s[i] ^= key;

Hook or emulate XOR function

Stack strings

mov [esp+0], 'H'; mov [esp+1], 'e'; ...

IDA FLIRT / Ghidra script to reassemble

RC4 encrypted

Encrypted blob + RC4 key in binary

Extract key, decrypt offline

AES encrypted

Encrypted blob + AES key derived at runtime

Hook after decryption

Custom encoding

Base64 + XOR + reverse

Trace the decode function, replicate

6.2 Automated String Decryption

# Ghidra script: find XOR decryption calls, emulate them

from ghidra.program.model.symbol import SourceType

decrypt_func = getFunction("decrypt_string")

refs = getReferencesTo(decrypt_func.getEntryPoint())

for ref in refs:

    call_addr = ref.getFromAddress()

    # extract arguments (encrypted buffer ptr, key, length)

    # emulate decryption, add comment with plaintext

7. IMPORT HIDING

7.1 GetProcAddress + Hash Lookup

FARPROC resolve(DWORD hash) {

    // Walk PEB → LDR → InMemoryOrderModuleList

    // For each DLL, walk export table

    // Hash each export name, compare with target hash

    // Return matching function pointer

}

7.2 Recovery

  • Identify the hash algorithm (common: CRC32, djb2, ROR13+ADD)
  • Compute hashes for all known API names
  • Build hash → API name lookup table
  • Annotate resolved calls in IDA/Ghidra

7.3 Common Hash Algorithms

Name

Algorithm

Used By

ROR13

hash = (hash >> 13 | hash << 19) + char

Metasploit shellcode

djb2

hash = hash * 33 + char

Various malware

CRC32

Standard CRC32 of function name

Sophisticated packers

FNV-1a

hash = (hash ^ char) * 0x01000193

Modern malware

8. ANTI-DISASSEMBLY TRICKS

8.1 Techniques

Trick

Mechanism

Fix

Overlapping instructions

jmp $+2; db 0xE8 (fake call prefix)

Manual re-analysis from correct offset

Misaligned jumps

Jump into middle of multi-byte instruction

Force IDA to re-analyze at target

Conditional jump pair

jz $+5; jnz $+3 (always jumps, confuses linear disasm)

Convert to unconditional jmp

Return address manipulation

push addr; ret instead of jmp addr

Recognize push+ret as jump

Exception-based flow

Trigger exception, real code in handler

Analyze exception handler chain

Call + add [esp]

call $+5; add [esp], N; ret (computed jump)

Calculate actual target

8.2 IDA Fixes

Right-click → Undefine (U)

Right-click → Code (C) at correct offset

Edit → Patch → Assemble (for permanent fix)

9. DECISION TREE

Obfuscated binary — how to approach?

│

├─ Can you run it?

│  ├─ Yes → Dynamic analysis first

│  │  ├─ Set BP on interesting APIs (file, network, crypto)

│  │  ├─ Trace execution to understand real behavior

│  │  └─ Dump decrypted code/strings at runtime

│  │

│  └─ No (embedded/firmware/exotic arch) → Static only

│     └─ Identify obfuscation type from patterns below

│

├─ What does the code look like?

│  │

│  ├─ Giant flat switch/dispatcher loop?

│  │  ├─ State variable drives control flow → CFF

│  │  │  └─ Use D-810 or symbolic deflattening

│  │  └─ Bytecode fetch-decode-execute → VM protection

│  │     └─ Extract handlers, build disassembler

│  │

│  ├─ Only mov instructions?

│  │  └─ movfuscator → demovfuscator tool

│  │

│  ├─ XOR/ADD loop writing to .text section?

│  │  └─ SMC → breakpoint after decode, dump

│  │

│  ├─ Impossible conditions in branches?

│  │  └─ Opaque predicates → Z3 proving or pattern removal

│  │

│  ├─ Disassembly looks wrong / functions overlap?

│  │  └─ Anti-disassembly → manual re-analysis at correct offsets

│  │

│  ├─ No readable strings?

│  │  └─ String encryption → hook decrypt function or emulate

│  │

│  ├─ No imports in IAT?

│  │  └─ Import hiding → identify hash, build lookup table

│  │

│  └─ pushad/pushfd → complex code → popad/popfd?

│     └─ VM protector entry/exit → full VM analysis

│

└─ What tool to use?

   ├─ Known protector (VMProtect/Themida) → specific deprotection guide

   ├─ Custom obfuscation → combine: IDA scripting + Triton + manual

   ├─ CTF challenge → angr symbolic execution often fastest

   └─ Malware analysis → dynamic (debugger + API monitor) first

10. TOOLBOX

Tool

Purpose

Best For

IDA Pro + Hex-Rays

Disassembly, decompilation, scripting

All-around analysis

Ghidra

Free alternative with scripting (Java/Python)

Budget-friendly RE

D-810 (IDA plugin)

Automated CFF deflattening

OLLVM-style obfuscation

miasm

IR-based analysis framework

Symbolic deobfuscation

Triton

Dynamic symbolic execution

Opaque predicate solving, CFF

REVEN

Full-system trace recording and replay

VM protector analysis

demovfuscator

movfuscator reversal

mov-only binaries

x64dbg + plugins

Dynamic analysis with scripting

Windows RE

Unicorn Engine

CPU emulation

SMC unpacking, shellcode

Capstone

Disassembly library

Custom tooling

IDA FLIRT

Function signature matching

Identify library code in stripped binaries

Binary Ninja

Alternative disassembler with MLIL/HLIL

Automated analysis

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card