How to decode .wasm files manually

This guide answers one task: take a raw .wasm file, dump its bytes with xxd, and parse the 8-byte header plus one section and one LEB128 value entirely by hand — no wasm2wat, no decompiler — so you can audit binaries whose tooling metadata has been stripped.

Prerequisites

  • [ ] xxd (ships with vim) or hexdump for the raw dump
  • [ ] wabt ≥ 1.0.34 for cross-checking with wasm-objdump -h at the end
  • [ ] A small .wasm to decode — assemble one with wat2wasm add.wat -o add.wasm
  • [ ] The section and LEB128 reference open for the section-ID table

Step-by-step procedure

  1. Assemble a known module so you can check your work. A two-parameter adder is the smallest module with a non-trivial Type section.

    (module
      (memory (export "mem") 1)
      (func (export "add") (param $a i32) (param $b i32) (result i32)
        (i32.add (local.get $a) (local.get $b))))
    wat2wasm add.wat -o add.wasm
  2. Dump the bytes with fixed columns. One byte per group, sixteen per row, so the offsets in the left gutter map cleanly onto the spec.

    xxd -g 1 -c 16 add.wasm
  3. Validate the 8-byte header. Offsets 0x000x03 must be 00 61 73 6d (\0asm); offsets 0x040x07 must be 01 00 00 00 (version 1, little-endian). If either differs, stop — the file is not a valid module (or is compressed).

  4. Read the first section’s ID and LEB128 length. The byte at offset 0x08 is the section ID; the next byte (or bytes) is the LEB128-encoded payload length. For a small module the length fits in a single byte.

  5. Decode the LEB128 length by hand. Mask the byte with 0x7F; if its high bit (0x80) is clear, that is the whole value. A length byte of 07 has its high bit clear, so the payload is exactly 7 bytes — the next section ID sits at offset 0x08 + 2 + 7 = 0x11.

  6. Parse one value type inside the section. Step into the Type section payload and read the function-type tag 60, the param count, and the value-type bytes — 7f is i32, 7e is i64, 7d is f32, 7c is f64.

  7. Cross-check against the toolchain. Confirm your hand-decoded offsets and lengths with wasm-objdump -h.

    wasm-objdump -h add.wasm

Expected output

The xxd dump of add.wasm begins like this:

00000000: 00 61 73 6d 01 00 00 00 01 07 01 60 02 7f 7f 01
00000010: 7f 03 02 01 00 05 03 01 00 01 07 07 01 03 61 64
00000020: 64 00 00 0a 09 01 07 00 20 00 20 01 6a 0b

Decoding it by hand, byte by byte:

00 61 73 6d   magic  "\0asm"
01 00 00 00   version 1 (little-endian u32)
01            section ID 1  -> Type
07            LEB128 length -> 7 payload bytes
   01         vector count  -> 1 function type
   60         func-type tag
   02         param count   -> 2
   7f 7f      params        -> i32, i32
   01         result count  -> 1
   7f         result        -> i32
03            section ID 3  -> Function (starts at offset 0x11)
02            LEB128 length -> 2 bytes
   01 00      1 function, type index 0

The signature reads (param i32 i32) (result i32), and the Function section binds the one defined function to type index 0. The wasm-objdump -h cross-check should report the Type section at the same offset with the same count: 1 and the Function section immediately after — if your offsets match the tool’s, your LEB128 decode was correct.

   Type start=0x0000000a end=0x00000011 (size=0x07) count: 1
 Function start=0x00000013 end=0x00000016 (size=0x02) count: 1

Gotchas

  • A leading 1f 8b is gzip, not Wasm. If the dump does not start with 00 61 73 6d, the file was served or saved compressed. The browser reports CompileError: WebAssembly.Module(): magic header not detected. Run file add.wasm and decompress before decoding.

  • The version field is fixed-width; section lengths are not. It is tempting to LEB128-decode the version too, but 01 00 00 00 is a plain little-endian u32. Only the bytes after the header are length-prefixed, so start your LEB128 decoding at offset 0x08, never before.

  • Custom sections break a strict-ID walk. A name or .debug_info section carries ID 0 and can appear between numbered sections, so a parser that assumes IDs only increase will mis-step. Branch on ID 0, read its LEB128 length, and skip the payload without interpreting it.

  • A mis-read continuation bit cascades. Reading a multi-byte LEB128 length but stopping one byte early lands your cursor inside the payload, and every subsequent offset is wrong. Log the absolute offset after each field and compare against wasm-objdump -h the moment they diverge.

Reading the Export and Memory sections by hand

Two sections reward a second pass because they are what JavaScript actually touches. In the dump above, the Export section is 07 07 01 03 61 64 64 00 00: ID 07, LEB128 length 07, vector count 01 (one export), then the export entry. Each entry is a name (a LEB128 length 03 followed by the three bytes 61 64 64, ASCII add), a one-byte kind (00 = function, 01 = table, 02 = memory, 03 = global), and a LEB128 index into the corresponding space (00 = function 0). So add is exported as function index 0 — exactly the name JavaScript calls as instance.exports.add. The Memory section (05 03 01 00 01) reads: ID 05, length 03, count 01, a limits flag 00 meaning min-only, and 01 = one initial page. That single page is the 64 KiB buffer JavaScript sees as instance.exports.mem.buffer.

Decoding these two by hand is the fastest audit you can run on an untrusted binary: the Import section (ID 02, parsed the same way as Export but with module and field name strings) tells you every capability the module demands from its host, and the Export section tells you exactly what surface it offers back. A module that imports a fetch-like function you did not expect, or exports far more than its documented API, is worth a closer look before you instantiate it.

When the section has more than 127 payload bytes

The single-byte length in the example is the easy case. A real module’s Code section routinely runs to thousands of bytes, so its length is a multi-byte LEB128. Suppose the bytes after the Code section ID are e5 8e 26: the first byte e5 has its high bit set (continuation), payload 0x65; 8e also continues, payload 0x0e; 26 has its high bit clear, payload 0x26. Combine low-to-high: 0x65 | (0x0e << 7) | (0x26 << 14) = 624485. The Code section is 624,485 bytes, and the next section ID sits that many bytes after the length field. This is the exact mechanism a hand parser must implement; getting the shift wrong by even one group lands the cursor deep inside the wrong section.

Performance note

Manual decoding is a triage and audit tool, not a runtime path — reserve it for verifying that a third-party .wasm makes no hidden imports, or confirming a build actually stripped its debug sections. For a typical module, wasm-objdump -h parses the full section table in single-digit milliseconds, so once you have learned the layout, automate the structural check in CI and decode by hand only when the tool’s output itself is suspect.

Frequently Asked Questions

How many bytes is the header — 5 or 8? Eight. The four-byte magic (00 61 73 6d) plus the four-byte version (01 00 00 00). Some older notes count only the first five because the version’s three trailing zero bytes are easy to overlook, but the runtime checks all eight.

Why is the section length not just a single fixed byte? Because section payloads can exceed 127 bytes, the length is LEB128-encoded so it grows to two or more bytes only when needed. Small sections stay one byte; a 300-byte section uses two. That is why you cannot seek to a fixed section offset.

How do I know where the next section starts? Add the current section’s LEB128-decoded length to the offset of its first payload byte. The next byte after that is the following section’s ID. Skipping the length decode — or decoding it wrong — is the most common reason a manual parse desynchronizes.

← Back to Wasm Binary Format Deep Dive