How to decode .wasm files manually
Manual binary inspection of WebAssembly modules bypasses automated decompilers, giving engineers direct visibility into compiled output, hidden allocations, and runtime optimization bottlenecks. For performance engineers, systems programmers, and tooling builders, understanding the raw byte stream is essential for security validation, polyfill debugging, and verifying compiler toolchain correctness. This workflow aligns with foundational WebAssembly Core Concepts & Browser Runtime architecture, where precise binary layout directly dictates instantiation latency and linear memory footprint.
Prerequisites & Binary Inspection Setup
Required CLI & Hex Tools
Prepare a minimal, deterministic inspection environment. Avoid GUI-only tools that mask byte alignment.
# macOS
brew install wabt xxd
# Ubuntu/Debian
sudo apt install wabt xxd
# Hex editors (CLI/GUI)
sudo apt install bless # Linux
# Windows: HxD (portable)
Environment Configuration for Raw Byte Analysis
Terminal encoding and file integrity checks prevent silent parsing drift.
export LC_ALL=en_US.UTF-8
sha256sum module.wasm > module.wasm.sha256
sha256sum -c module.wasm.sha256
Dump the raw byte stream with strict column formatting:
xxd -g 1 -c 16 module.wasm | head -n 40
Manual decoding requires strict adherence to the Wasm Binary Format Deep Dive specification. Misaligned section boundaries are the most common source of false-positive corruption reports.
Debugging Workflow:
- Install
xxd,wasm-objdump, and a hex editor (e.g., Bless, HxD) - Configure terminal encoding to UTF-8 to prevent opcode misinterpretation
- Verify file integrity using
sha256sumbefore manual parsing
Step 1: Validating the Magic Number & Version Byte
Identifying 0x00 0x61 0x73 0x6D
The Wasm header occupies exactly 5 bytes. Offsets 0x00–0x03 must resolve to the ASCII sequence \0asm.
Offset 00 01 02 03
Hex 00 61 73 6D
ASCII \0 a s m
If these bytes differ, the file is not a valid Wasm module.
Version Byte (0x01) Verification
Offset 0x04 must equal 0x01, denoting WebAssembly MVP (v1). Browsers will throw CompileError: WebAssembly.Module(): magic header not detected or version mismatch errors if this byte is altered.
Debugging Workflow:
- Read first 4 bytes: confirm exact ASCII sequence for
\0asm - Read byte 5: verify
0x01(spec version 1) - Symptom: Invalid magic → Fix: Check for gzip/brotli compression wrapper (
file module.wasm) or corrupted download. Decompress before parsing.
Step 2: Parsing Section IDs & LEB128 Sizes
Unsigned LEB128 Decoding Algorithm
Wasm uses unsigned LEB128 for all variable-length integers. Decode iteratively:
def decode_uleb128(data, offset):
result = 0
shift = 0
while True:
byte = data[offset]
offset += 1
result |= (byte & 0x7F) << shift
if not (byte & 0x80):
break
shift += 7
return result, offset
Each byte’s MSB (0x80) indicates continuation. Mask with 0x7F and accumulate.
Section ID Mapping (0x00–0x0C)
After the header, the binary is a sequence of (section_id, section_size, payload) tuples.
| ID | Section | Purpose |
|---|---|---|
| 0x00 | Custom | Names, source maps, metadata |
| 0x01 | Type | Function signatures |
| 0x02 | Import | External dependencies |
| 0x03 | Function | Type indices for local functions |
| 0x04 | Table | Function references |
| 0x05 | Memory | Linear memory limits |
| 0x06 | Global | Global variables |
| 0x07 | Export | Public API |
| 0x08 | Start | Entry point function |
| 0x09 | Element | Table initialization |
| 0x0A | Code | Function bodies & locals |
| 0x0B | Data | Memory initialization |
Handling Custom Sections & Metadata
Custom sections (0x00) contain a name length, name string, and payload. They are ignored by the VM but critical for debugging. Skip them by decoding the size and advancing the cursor.
Debugging Workflow:
- Read section ID (1 byte)
- Decode section size using unsigned LEB128 until MSB=0
- Advance cursor by decoded size to locate next section
- Symptom: Cursor overflow → Fix: Validate LEB128 continuation bits and check for truncated payloads.
Step 3: Decoding Function Bodies & Opcodes
Local Declarations & Type Signatures
Each function body begins with:
body_size(ULEB128)local_decl_count(ULEB128)- Pairs of
(count, valtype)for locals (e.g.,0x7F=i32,0x7E=i64,0x7D=f32,0x7C=f64)
Stack Machine Instruction Mapping
Wasm is strictly stack-typed. Map raw bytes to the spec table:
| Opcode | Mnemonic | Stack Effect |
|---|---|---|
0x41 |
i32.const |
[ ] → [i32] |
0x42 |
i64.const |
[ ] → [i64] |
0x20 |
local.get |
[ ] → [val] |
0x21 |
local.set |
[val] → [ ] |
0x10 |
call |
[args] → [rets] |
0x0B |
end |
[block] → [rets] |
Control Flow Block Nesting
Blocks (0x02), loops (0x03), and ifs (0x04) consume a type signature byte (0x40=void, 0x7F=i32, etc.) and push a control frame. Every frame must terminate with 0x0B.
Debugging Workflow:
- Parse local count + type pairs at function start
- Map opcodes to spec table (e.g.,
0x41i32.const,0x20local.get,0x10call,0x0Bend) - Track stack depth manually: push for constants/locals, pop for arithmetic/calls
- Symptom: Type mismatch → Fix: Trace stack operations and verify block type signatures.
Step 4: Reconstructing Imports, Exports & Memory Layout
Import/Export Descriptor Parsing
Descriptors follow: module_name_len (ULEB128) → module_name → field_name_len (ULEB128) → field_name → kind (1 byte: 0x00 func, 0x01 table, 0x02 memory, 0x03 global).
Memory Limits & Initial Pages
Memory sections contain min_pages (ULEB128) and optionally max_pages (ULEB128, preceded by 0x01 flag). 1 page = 64 KiB. The browser allocates this upfront, directly impacting heap pressure.
Data Segment Offsets & Initialization
Data segments define passive/active memory initialization:
index (ULEB128) → offset_expr → init_size (ULEB128) → init_bytes
Active segments write to linear memory at instantiation.
Debugging Workflow:
- Extract import kind (func, table, memory, global) and module/name strings
- Decode memory limits (min/max pages) using LEB128
- Cross-reference data segment offsets with linear memory initialization
- Symptom: Out-of-bounds memory access → Fix: Verify data segment size vs declared memory limits.
Troubleshooting Manual Decoding Failures
Common Hex Alignment Errors
Misaligned cursors usually stem from incorrect LEB128 decoding or skipping custom section payloads. Always log absolute byte offsets during traversal.
Spec Version Mismatches
Wasm v2 proposals (e.g., GC, SIMD, exception handling) introduce new opcodes and section types. If you encounter 0xFD or 0xFC prefixes, you are reading extended instruction sets.
Browser DevTools Cross-Validation
Use runtime telemetry to verify your manual parse:
- Run
wasm2wat module.wasm > module.watand diff against your decoded structure. - Open Chrome DevTools → Memory tab → Heap Snapshot. Confirm linear memory boundaries match your decoded
min_pages.
Debugging Workflow:
- Symptom: Unknown opcode → Fix: Check Wasm spec version alignment (v1 vs v2 proposals)
- Symptom: Malformed section → Fix: Re-run hex dump with strict byte offset logging
- Validate decoded structure against
wasm2watoutput for parity - Use Chrome DevTools Memory tab to confirm linear memory boundaries match decoded limits
Conclusion & Production Implementation Guidelines
When to Automate vs Manual Decode
Automated tooling (wasm-opt, wasm2wat, binaryen) handles 95% of build pipelines. Reserve manual decoding for:
- Auditing proprietary/third-party
.wasmfor hidden network calls or crypto miners - Debugging custom polyfills where toolchain metadata is stripped
- Validating security boundaries in sandboxed environments
- Profiling compiler output bloat at the instruction level
Integrating Manual Audits into CI/CD
Embed deterministic binary validation into deployment gates:
# CI step: Verify header & section count
HEADER=$(xxd -p -l 5 module.wasm)
[[ "$HEADER" == "0061736d01" ]] || exit 1
# CI step: Baseline diff against known-good WAT
wasm2wat module.wasm > current.wat
git diff --exit-code baseline.wat current.wat || echo "WARNING: Binary structure changed"
Manual decoding remains the definitive method for verifying compiler correctness, optimizing linear memory allocation, and enforcing strict security boundaries in production WebAssembly deployments.