How to decode .wasm files manually
This guide answers one task: take a raw .wasm file, dump its bytes with xxd, and parse the 8-byte header plus one section and one LEB128 value entirely by hand — no wasm2wat, no decompiler — so you can audit binaries whose tooling metadata has been stripped.
Prerequisites
- [ ]
xxd(ships withvim) orhexdumpfor the raw dump - [ ]
wabt≥ 1.0.34 for cross-checking withwasm-objdump -hat the end - [ ] A small
.wasmto decode — assemble one withwat2wasm add.wat -o add.wasm - [ ] The section and LEB128 reference open for the section-ID table
Step-by-step procedure
-
Assemble a known module so you can check your work. A two-parameter adder is the smallest module with a non-trivial Type section.
(module (memory (export "mem") 1) (func (export "add") (param $a i32) (param $b i32) (result i32) (i32.add (local.get $a) (local.get $b))))wat2wasm add.wat -o add.wasm -
Dump the bytes with fixed columns. One byte per group, sixteen per row, so the offsets in the left gutter map cleanly onto the spec.
xxd -g 1 -c 16 add.wasm -
Validate the 8-byte header. Offsets
0x00–0x03must be00 61 73 6d(\0asm); offsets0x04–0x07must be01 00 00 00(version 1, little-endian). If either differs, stop — the file is not a valid module (or is compressed). -
Read the first section’s ID and
LEB128length. The byte at offset0x08is the section ID; the next byte (or bytes) is theLEB128-encoded payload length. For a small module the length fits in a single byte. -
Decode the
LEB128length by hand. Mask the byte with0x7F; if its high bit (0x80) is clear, that is the whole value. A length byte of07has its high bit clear, so the payload is exactly 7 bytes — the next section ID sits at offset0x08 + 2 + 7 = 0x11. -
Parse one value type inside the section. Step into the Type section payload and read the function-type tag
60, the param count, and the value-type bytes —7fisi32,7eisi64,7disf32,7cisf64. -
Cross-check against the toolchain. Confirm your hand-decoded offsets and lengths with
wasm-objdump -h.wasm-objdump -h add.wasm
Expected output
The xxd dump of add.wasm begins like this:
00000000: 00 61 73 6d 01 00 00 00 01 07 01 60 02 7f 7f 01
00000010: 7f 03 02 01 00 05 03 01 00 01 07 07 01 03 61 64
00000020: 64 00 00 0a 09 01 07 00 20 00 20 01 6a 0b
Decoding it by hand, byte by byte:
00 61 73 6d magic "\0asm"
01 00 00 00 version 1 (little-endian u32)
01 section ID 1 -> Type
07 LEB128 length -> 7 payload bytes
01 vector count -> 1 function type
60 func-type tag
02 param count -> 2
7f 7f params -> i32, i32
01 result count -> 1
7f result -> i32
03 section ID 3 -> Function (starts at offset 0x11)
02 LEB128 length -> 2 bytes
01 00 1 function, type index 0
The signature reads (param i32 i32) (result i32), and the Function section binds the one defined function to type index 0. The wasm-objdump -h cross-check should report the Type section at the same offset with the same count: 1 and the Function section immediately after — if your offsets match the tool’s, your LEB128 decode was correct.
Type start=0x0000000a end=0x00000011 (size=0x07) count: 1
Function start=0x00000013 end=0x00000016 (size=0x02) count: 1
Gotchas
-
A leading
1f 8bis gzip, not Wasm. If the dump does not start with00 61 73 6d, the file was served or saved compressed. The browser reportsCompileError: WebAssembly.Module(): magic header not detected. Runfile add.wasmand decompress before decoding. -
The version field is fixed-width; section lengths are not. It is tempting to
LEB128-decode the version too, but01 00 00 00is a plain little-endianu32. Only the bytes after the header are length-prefixed, so start yourLEB128decoding at offset0x08, never before. -
Custom sections break a strict-ID walk. A
nameor.debug_infosection carries ID0and can appear between numbered sections, so a parser that assumes IDs only increase will mis-step. Branch on ID0, read itsLEB128length, and skip the payload without interpreting it. -
A mis-read continuation bit cascades. Reading a multi-byte
LEB128length but stopping one byte early lands your cursor inside the payload, and every subsequent offset is wrong. Log the absolute offset after each field and compare againstwasm-objdump -hthe moment they diverge.
Reading the Export and Memory sections by hand
Two sections reward a second pass because they are what JavaScript actually touches. In the dump above, the Export section is 07 07 01 03 61 64 64 00 00: ID 07, LEB128 length 07, vector count 01 (one export), then the export entry. Each entry is a name (a LEB128 length 03 followed by the three bytes 61 64 64, ASCII add), a one-byte kind (00 = function, 01 = table, 02 = memory, 03 = global), and a LEB128 index into the corresponding space (00 = function 0). So add is exported as function index 0 — exactly the name JavaScript calls as instance.exports.add. The Memory section (05 03 01 00 01) reads: ID 05, length 03, count 01, a limits flag 00 meaning min-only, and 01 = one initial page. That single page is the 64 KiB buffer JavaScript sees as instance.exports.mem.buffer.
Decoding these two by hand is the fastest audit you can run on an untrusted binary: the Import section (ID 02, parsed the same way as Export but with module and field name strings) tells you every capability the module demands from its host, and the Export section tells you exactly what surface it offers back. A module that imports a fetch-like function you did not expect, or exports far more than its documented API, is worth a closer look before you instantiate it.
When the section has more than 127 payload bytes
The single-byte length in the example is the easy case. A real module’s Code section routinely runs to thousands of bytes, so its length is a multi-byte LEB128. Suppose the bytes after the Code section ID are e5 8e 26: the first byte e5 has its high bit set (continuation), payload 0x65; 8e also continues, payload 0x0e; 26 has its high bit clear, payload 0x26. Combine low-to-high: 0x65 | (0x0e << 7) | (0x26 << 14) = 624485. The Code section is 624,485 bytes, and the next section ID sits that many bytes after the length field. This is the exact mechanism a hand parser must implement; getting the shift wrong by even one group lands the cursor deep inside the wrong section.
Performance note
Manual decoding is a triage and audit tool, not a runtime path — reserve it for verifying that a third-party .wasm makes no hidden imports, or confirming a build actually stripped its debug sections. For a typical module, wasm-objdump -h parses the full section table in single-digit milliseconds, so once you have learned the layout, automate the structural check in CI and decode by hand only when the tool’s output itself is suspect.
Frequently Asked Questions
How many bytes is the header — 5 or 8?
Eight. The four-byte magic (00 61 73 6d) plus the four-byte version (01 00 00 00). Some older notes count only the first five because the version’s three trailing zero bytes are easy to overlook, but the runtime checks all eight.
Why is the section length not just a single fixed byte?
Because section payloads can exceed 127 bytes, the length is LEB128-encoded so it grows to two or more bytes only when needed. Small sections stay one byte; a 300-byte section uses two. That is why you cannot seek to a fixed section offset.
How do I know where the next section starts?
Add the current section’s LEB128-decoded length to the offset of its first payload byte. The next byte after that is the following section’s ID. Skipping the length decode — or decoding it wrong — is the most common reason a manual parse desynchronizes.
Related
- Decoding Wasm opcodes for debugging — once you can find a section, decode the instructions inside the Code section.
- WebAssembly text format (WAT) basics — the readable form your decoded bytes correspond to.
- Reducing Wasm bundle size with wasm-opt — what stripping does to the sections you just parsed.
← Back to Wasm Binary Format Deep Dive