How to decode .wasm files manually

Manual binary inspection of WebAssembly modules bypasses automated decompilers, giving engineers direct visibility into compiled output, hidden allocations, and runtime optimization bottlenecks. For performance engineers, systems programmers, and tooling builders, understanding the raw byte stream is essential for security validation, polyfill debugging, and verifying compiler toolchain correctness. This workflow aligns with foundational WebAssembly Core Concepts & Browser Runtime architecture, where precise binary layout directly dictates instantiation latency and linear memory footprint.

Prerequisites & Binary Inspection Setup

Required CLI & Hex Tools

Prepare a minimal, deterministic inspection environment. Avoid GUI-only tools that mask byte alignment.

# macOS
brew install wabt xxd

# Ubuntu/Debian
sudo apt install wabt xxd

# Hex editors (CLI/GUI)
sudo apt install bless # Linux
# Windows: HxD (portable)

Environment Configuration for Raw Byte Analysis

Terminal encoding and file integrity checks prevent silent parsing drift.

export LC_ALL=en_US.UTF-8
sha256sum module.wasm > module.wasm.sha256
sha256sum -c module.wasm.sha256

Dump the raw byte stream with strict column formatting:

xxd -g 1 -c 16 module.wasm | head -n 40

Manual decoding requires strict adherence to the Wasm Binary Format Deep Dive specification. Misaligned section boundaries are the most common source of false-positive corruption reports.

Debugging Workflow:

  • Install xxd, wasm-objdump, and a hex editor (e.g., Bless, HxD)
  • Configure terminal encoding to UTF-8 to prevent opcode misinterpretation
  • Verify file integrity using sha256sum before manual parsing

Step 1: Validating the Magic Number & Version Byte

Identifying 0x00 0x61 0x73 0x6D

The Wasm header occupies exactly 5 bytes. Offsets 0x000x03 must resolve to the ASCII sequence \0asm.

Offset 00 01 02 03
Hex 00 61 73 6D
ASCII \0 a s m

If these bytes differ, the file is not a valid Wasm module.

Version Byte (0x01) Verification

Offset 0x04 must equal 0x01, denoting WebAssembly MVP (v1). Browsers will throw CompileError: WebAssembly.Module(): magic header not detected or version mismatch errors if this byte is altered.

Debugging Workflow:

  • Read first 4 bytes: confirm exact ASCII sequence for \0asm
  • Read byte 5: verify 0x01 (spec version 1)
  • Symptom: Invalid magic → Fix: Check for gzip/brotli compression wrapper (file module.wasm) or corrupted download. Decompress before parsing.

Step 2: Parsing Section IDs & LEB128 Sizes

Unsigned LEB128 Decoding Algorithm

Wasm uses unsigned LEB128 for all variable-length integers. Decode iteratively:

def decode_uleb128(data, offset):
 result = 0
 shift = 0
 while True:
 byte = data[offset]
 offset += 1
 result |= (byte & 0x7F) << shift
 if not (byte & 0x80):
 break
 shift += 7
 return result, offset

Each byte’s MSB (0x80) indicates continuation. Mask with 0x7F and accumulate.

Section ID Mapping (0x000x0C)

After the header, the binary is a sequence of (section_id, section_size, payload) tuples.

ID Section Purpose
0x00 Custom Names, source maps, metadata
0x01 Type Function signatures
0x02 Import External dependencies
0x03 Function Type indices for local functions
0x04 Table Function references
0x05 Memory Linear memory limits
0x06 Global Global variables
0x07 Export Public API
0x08 Start Entry point function
0x09 Element Table initialization
0x0A Code Function bodies & locals
0x0B Data Memory initialization

Handling Custom Sections & Metadata

Custom sections (0x00) contain a name length, name string, and payload. They are ignored by the VM but critical for debugging. Skip them by decoding the size and advancing the cursor.

Debugging Workflow:

  • Read section ID (1 byte)
  • Decode section size using unsigned LEB128 until MSB=0
  • Advance cursor by decoded size to locate next section
  • Symptom: Cursor overflow → Fix: Validate LEB128 continuation bits and check for truncated payloads.

Step 3: Decoding Function Bodies & Opcodes

Local Declarations & Type Signatures

Each function body begins with:

  1. body_size (ULEB128)
  2. local_decl_count (ULEB128)
  3. Pairs of (count, valtype) for locals (e.g., 0x7F=i32, 0x7E=i64, 0x7D=f32, 0x7C=f64)

Stack Machine Instruction Mapping

Wasm is strictly stack-typed. Map raw bytes to the spec table:

Opcode Mnemonic Stack Effect
0x41 i32.const [ ] → [i32]
0x42 i64.const [ ] → [i64]
0x20 local.get [ ] → [val]
0x21 local.set [val] → [ ]
0x10 call [args] → [rets]
0x0B end [block] → [rets]

Control Flow Block Nesting

Blocks (0x02), loops (0x03), and ifs (0x04) consume a type signature byte (0x40=void, 0x7F=i32, etc.) and push a control frame. Every frame must terminate with 0x0B.

Debugging Workflow:

  • Parse local count + type pairs at function start
  • Map opcodes to spec table (e.g., 0x41 i32.const, 0x20 local.get, 0x10 call, 0x0B end)
  • Track stack depth manually: push for constants/locals, pop for arithmetic/calls
  • Symptom: Type mismatch → Fix: Trace stack operations and verify block type signatures.

Step 4: Reconstructing Imports, Exports & Memory Layout

Import/Export Descriptor Parsing

Descriptors follow: module_name_len (ULEB128) → module_namefield_name_len (ULEB128) → field_namekind (1 byte: 0x00 func, 0x01 table, 0x02 memory, 0x03 global).

Memory Limits & Initial Pages

Memory sections contain min_pages (ULEB128) and optionally max_pages (ULEB128, preceded by 0x01 flag). 1 page = 64 KiB. The browser allocates this upfront, directly impacting heap pressure.

Data Segment Offsets & Initialization

Data segments define passive/active memory initialization: index (ULEB128) → offset_exprinit_size (ULEB128) → init_bytes Active segments write to linear memory at instantiation.

Debugging Workflow:

  • Extract import kind (func, table, memory, global) and module/name strings
  • Decode memory limits (min/max pages) using LEB128
  • Cross-reference data segment offsets with linear memory initialization
  • Symptom: Out-of-bounds memory access → Fix: Verify data segment size vs declared memory limits.

Troubleshooting Manual Decoding Failures

Common Hex Alignment Errors

Misaligned cursors usually stem from incorrect LEB128 decoding or skipping custom section payloads. Always log absolute byte offsets during traversal.

Spec Version Mismatches

Wasm v2 proposals (e.g., GC, SIMD, exception handling) introduce new opcodes and section types. If you encounter 0xFD or 0xFC prefixes, you are reading extended instruction sets.

Browser DevTools Cross-Validation

Use runtime telemetry to verify your manual parse:

  1. Run wasm2wat module.wasm > module.wat and diff against your decoded structure.
  2. Open Chrome DevTools → Memory tab → Heap Snapshot. Confirm linear memory boundaries match your decoded min_pages.

Debugging Workflow:

  • Symptom: Unknown opcode → Fix: Check Wasm spec version alignment (v1 vs v2 proposals)
  • Symptom: Malformed section → Fix: Re-run hex dump with strict byte offset logging
  • Validate decoded structure against wasm2wat output for parity
  • Use Chrome DevTools Memory tab to confirm linear memory boundaries match decoded limits

Conclusion & Production Implementation Guidelines

When to Automate vs Manual Decode

Automated tooling (wasm-opt, wasm2wat, binaryen) handles 95% of build pipelines. Reserve manual decoding for:

  • Auditing proprietary/third-party .wasm for hidden network calls or crypto miners
  • Debugging custom polyfills where toolchain metadata is stripped
  • Validating security boundaries in sandboxed environments
  • Profiling compiler output bloat at the instruction level

Integrating Manual Audits into CI/CD

Embed deterministic binary validation into deployment gates:

# CI step: Verify header & section count
HEADER=$(xxd -p -l 5 module.wasm)
[[ "$HEADER" == "0061736d01" ]] || exit 1

# CI step: Baseline diff against known-good WAT
wasm2wat module.wasm > current.wat
git diff --exit-code baseline.wat current.wat || echo "WARNING: Binary structure changed"

Manual decoding remains the definitive method for verifying compiler correctness, optimizing linear memory allocation, and enforcing strict security boundaries in production WebAssembly deployments.