Implementing a Bump Allocator in Wasm

This guide builds the smallest practical allocator for WebAssembly: a bump allocator that hands out memory by incrementing a single next pointer, rounds each request up for alignment, grows linear memory when it runs out, and “frees” everything at once by resetting the pointer.

A bump allocator is the right tool when allocations share a lifetime — a request, an animation frame, a parse pass — and you can throw them all away together. It cannot free individual objects, but in exchange it is a few instructions per allocation with zero per-block metadata, versus dlmalloc’s free-list bookkeeping.

Prerequisites

  • [ ] wabt installed (wat2wasm, wasm-objdump) — or Rust 1.78+ with wasm32-unknown-unknown
  • [ ] A runtime to instantiate the module (node 20+ or a browser)
  • [ ] Familiarity with the linear memory modelpage size, memory.grow, and the stack/heap layout
  • [ ] Understanding that a Wasm pointer is just a byte offset into instance.exports.memory.buffer

How a bump allocator works

The entire allocator is one mutable global — the bump pointer — that marks the boundary between used and free memory. To allocate n bytes you round the pointer up to the required alignment, remember that aligned value as the result, add n, and check you have not run past the end of linear memory. If you have, grow it. To free everything, set the pointer back to the start of the arena. There is no per-allocation free; that is the deliberate trade.

The reason this works at all is that WebAssembly’s linear memory is a flat byte array with no required structure — the allocator is free to impose whatever bookkeeping it likes, and a bump allocator chooses the simplest possible scheme: a single high-water mark. Compare that to a free-list allocator like dlmalloc, which writes a size-and-status header before every block, threads freed blocks onto size-class lists, and coalesces neighbours on free. All of that machinery exists to support reclaiming individual allocations out of order. A bump allocator deletes the entire problem by declaring that allocations are freed in exactly one way — all at once — so it needs no headers, no lists, and no coalescing logic. The allocator state is literally one integer. This is the classic arena or region allocation strategy that game engines, compilers, and request handlers have used for decades, now expressed in the leanest possible Wasm form.

Step-by-step procedure

  1. Reserve an arena and a bump pointer. Pick a base offset above the data section and the shadow stack so you never collide with the toolchain’s own layout. A mutable global holds the current next offset.

    (module
      (memory (export "memory") 2 256)        ;; 128 KiB initial, 16 MiB cap
      (global $next (mut i32) (i32.const 1024))  ;; arena starts at byte 1024
      (global $base i32 (i32.const 1024)))       ;; remember the start for reset
  2. Round up for alignment. Allocations of i32/f32 want 4-byte alignment; f64 and 64-bit loads want 8. The standard branchless trick adds align - 1 then masks off the low bits: (p + (a-1)) & ~(a-1). Alignment must be a power of two for the mask to be correct.

    ;; align_up(p, a) = (p + a - 1) & ~(a - 1)
    (func $align_up (param $p i32) (param $a i32) (result i32)
      (i32.and
        (i32.add (local.get $p) (i32.sub (local.get $a) (i32.const 1)))
        (i32.xor (i32.sub (local.get $a) (i32.const 1)) (i32.const -1))))
  3. Implement alloc. Align the current pointer, compute the new end, grow memory if the end exceeds memory.size * 65536, then store the new next and return the aligned start.

    (func (export "alloc") (param $size i32) (param $align i32) (result i32)
      (local $ptr i32) (local $end i32) (local $need i32)
      ;; ptr = align_up(next, align)
      (local.set $ptr (call $align_up (global.get $next) (local.get $align)))
      (local.set $end (i32.add (local.get $ptr) (local.get $size)))
      ;; grow if end > current capacity in bytes
      (if (i32.gt_u (local.get $end)
                    (i32.mul (memory.size) (i32.const 65536)))
        (then
          ;; pages needed = ceil((end - capacity) / 65536)
          (local.set $need
            (i32.div_u
              (i32.add
                (i32.sub (local.get $end) (i32.mul (memory.size) (i32.const 65536)))
                (i32.const 65535))
              (i32.const 65536)))
          ;; memory.grow returns -1 on failure
          (if (i32.eq (memory.grow (local.get $need)) (i32.const -1))
            (then (unreachable)))))      ;; OOM: trap
      (global.set $next (local.get $end))
      (local.get $ptr))
  4. Implement reset to free the whole arena at once. This is the only free a bump allocator offers.

    (func (export "reset")
      (global.set $next (global.get $base)))
  5. Drive it from JavaScript. Allocate, build a view after the alloc (it may have grown memory), use the block, then reset when the batch is done.

    const { instance } = await WebAssembly.instantiateStreaming(fetch("/bump.wasm"));
    const ex = instance.exports;
    
    const ptr = ex.alloc(64, 4);                          // 64 bytes, 4-byte aligned
    // Re-read .buffer AFTER alloc — a grow may have replaced it.
    const view = new Uint8Array(ex.memory.buffer, ptr, 64);
    view.set(new TextEncoder().encode("scratch"));
    // ... use it ...
    ex.reset();                                           // frees every allocation at once

A subtle point in step 3: the order of operations matters for correctness under growth. We compute end before deciding whether to grow, and we grow by exactly the number of whole pages needed to cover end, using the ceiling-division idiom (needed + 65535) / 65536 so a request that overshoots a page boundary by a single byte still rounds up to a full page rather than truncating. After a successful grow we commit next = end and return the aligned ptr we computed at the top — never re-reading next, because nothing else touched it. If memory.grow returns -1 we hit unreachable, which raises a trap and aborts the call cleanly; that is the bump allocator’s honest answer to out-of-memory. You could instead return a sentinel like 0 and let the caller branch, which is friendlier to JavaScript but pushes the null-check onto every call site.

The same allocator in Rust is a static mut cursor with the identical logic, useful if you want the rest of your module in Rust but still want bump semantics for a hot path:

static mut NEXT: usize = 1024;
const BASE: usize = 1024;

#[no_mangle]
pub extern "C" fn alloc(size: usize, align: usize) -> *mut u8 {
    unsafe {
        let ptr = (NEXT + align - 1) & !(align - 1);
        let end = ptr + size;
        let cap = core::arch::wasm32::memory_size(0) * 65536;
        if end > cap {
            let need = (end - cap + 65535) / 65536;
            if core::arch::wasm32::memory_grow(0, need) == usize::MAX {
                core::arch::wasm32::unreachable();
            }
        }
        NEXT = end;
        ptr as *mut u8
    }
}

#[no_mangle]
pub extern "C" fn reset() {
    unsafe { NEXT = BASE; }
}

Expected output

Compile and confirm the exports are present:

wat2wasm bump.wat -o bump.wasm
wasm-objdump -x bump.wasm | grep -E "func\[.*\] <(alloc|reset)>"
# - func[1] <alloc> -> "alloc"
# - func[2] <reset> -> "reset"

A sequence of allocations advances next monotonically; reset rewinds it. Logging the returned pointers makes the bump visible:

console.log(ex.alloc(10, 1)); // 1024
console.log(ex.alloc(10, 8)); // 1040  (1034 rounded up to 8-byte alignment)
console.log(ex.alloc(10, 1)); // 1050
ex.reset();
console.log(ex.alloc(10, 1)); // 1024  again — arena reused

Gotchas

  • No individual free. Calling alloc repeatedly without reset only ever grows the arena. If you alloc per frame but never reset, memory.buffer.byteLength climbs until growth fails — that is a leak with this design, not a bug in the allocator. Reset at a clear lifetime boundary.
  • Alignment must be a power of two. The (a - 1) mask is only correct for powers of two. Passing align = 3 silently misaligns; passing align = 0 underflows to 0xFFFFFFFF and the mask zeroes the pointer. Validate alignment, or hard-code it.
  • Growing memory mid-bump invalidates JS views. A memory.grow inside alloc can replace the backing ArrayBuffer, detaching any Uint8Array you already built. Always construct views after the alloc that produced the pointer — see why memory.grow invalidates pointers.
  • Colliding with the shadow stack. Starting the arena at too low an offset overwrites the linker’s data section or shadow stack. Base it high enough, or in Rust let the toolchain pick a static address rather than a hard-coded 1024.

When a bump allocator is the right choice

Reach for this design when your allocations share a lifetime you can name: everything allocated while parsing one document, decoding one frame, or handling one request, all discarded together. Parsers are the canonical fit — build the entire AST in an arena, walk it, then reset and move to the next input. Per-frame rendering scratch (vertex buffers, intermediate masks) is another, because the frame boundary is a natural reset point. The anti-pattern is any workload with overlapping, independently-ending lifetimes — a long-lived cache alongside short-lived temporaries — because the long-lived object pins the bump pointer and prevents reset from reclaiming the temporaries. In that case, split your memory: a dlmalloc heap for the cache and a separate bump arena for the temporaries, which is exactly the lifetime segregation the linear memory management overview recommends for fighting fragmentation.

Performance note

Allocation here is a handful of instructions — an add, a mask, a compare, a store — with no free list to walk and no metadata header per block. In a tight loop that calls alloc millions of times, a bump allocator routinely beats dlmalloc by an order of magnitude on allocation latency, and it adds roughly 0.1 KiB of code versus dlmalloc’s 6–10 KiB. Because there is no header before each block, the memory overhead is also zero — you store exactly the bytes you asked for, where dlmalloc rounds up to a size class and prepends metadata. The cost is entirely in the freeing model: you trade per-object reclamation for arena-at-once reset.

Frequently Asked Questions

Can I free a single allocation from a bump allocator? No — that is the defining limitation. A bump allocator only supports freeing the entire arena via reset. If you need per-object free, use dlmalloc (the Rust/Emscripten default) or a more complex design; the tradeoffs are compared in the linear memory management overview.

What alignment should I pass? Match the largest type you store in the block: 4 for i32/f32, 8 for f64 and 64-bit integers, 16 for SIMD v128. When in doubt, 8 is a safe default that satisfies every scalar type.

Why does my view read zeros after a few allocations? An alloc call grew linear memory and replaced the ArrayBuffer, detaching the view you built earlier. Re-create the view from instance.exports.memory.buffer after every alloc.

← Back to Linear Memory Management & Allocators