Implementing a Bump Allocator in Wasm
This guide builds the smallest practical allocator for WebAssembly: a bump allocator that hands out
memory by incrementing a single next pointer, rounds each request up for alignment, grows linear memory
when it runs out, and “frees” everything at once by resetting the pointer.
A bump allocator is the right tool when allocations share a lifetime — a request, an animation frame, a
parse pass — and you can throw them all away together. It cannot free individual objects, but in exchange
it is a few instructions per allocation with zero per-block metadata, versus dlmalloc’s free-list
bookkeeping.
Prerequisites
- [ ]
wabtinstalled (wat2wasm,wasm-objdump) — or Rust1.78+withwasm32-unknown-unknown - [ ] A runtime to instantiate the module (
node 20+or a browser) - [ ] Familiarity with the linear memory model —
pagesize,memory.grow, and the stack/heap layout - [ ] Understanding that a Wasm pointer is just a byte offset into
instance.exports.memory.buffer
How a bump allocator works
The entire allocator is one mutable global — the bump pointer — that marks the boundary between used
and free memory. To allocate n bytes you round the pointer up to the required alignment, remember that
aligned value as the result, add n, and check you have not run past the end of linear memory. If you
have, grow it. To free everything, set the pointer back to the start of the arena. There is no
per-allocation free; that is the deliberate trade.
The reason this works at all is that WebAssembly’s linear memory is a flat byte array with no required
structure — the allocator is free to impose whatever bookkeeping it likes, and a bump allocator chooses
the simplest possible scheme: a single high-water mark. Compare that to a free-list allocator like
dlmalloc, which writes a size-and-status header before every block, threads freed blocks onto size-class
lists, and coalesces neighbours on free. All of that machinery exists to support reclaiming individual
allocations out of order. A bump allocator deletes the entire problem by declaring that allocations are
freed in exactly one way — all at once — so it needs no headers, no lists, and no coalescing logic. The
allocator state is literally one integer. This is the classic arena or region allocation strategy that
game engines, compilers, and request handlers have used for decades, now expressed in the leanest possible
Wasm form.
Step-by-step procedure
-
Reserve an arena and a bump pointer. Pick a base offset above the data section and the shadow stack so you never collide with the toolchain’s own layout. A mutable global holds the current
nextoffset.(module (memory (export "memory") 2 256) ;; 128 KiB initial, 16 MiB cap (global $next (mut i32) (i32.const 1024)) ;; arena starts at byte 1024 (global $base i32 (i32.const 1024))) ;; remember the start for reset -
Round up for alignment. Allocations of
i32/f32want 4-byte alignment;f64and 64-bit loads want 8. The standard branchless trick addsalign - 1then masks off the low bits:(p + (a-1)) & ~(a-1). Alignment must be a power of two for the mask to be correct.;; align_up(p, a) = (p + a - 1) & ~(a - 1) (func $align_up (param $p i32) (param $a i32) (result i32) (i32.and (i32.add (local.get $p) (i32.sub (local.get $a) (i32.const 1))) (i32.xor (i32.sub (local.get $a) (i32.const 1)) (i32.const -1)))) -
Implement
alloc. Align the current pointer, compute the new end, grow memory if the end exceedsmemory.size * 65536, then store the newnextand return the aligned start.(func (export "alloc") (param $size i32) (param $align i32) (result i32) (local $ptr i32) (local $end i32) (local $need i32) ;; ptr = align_up(next, align) (local.set $ptr (call $align_up (global.get $next) (local.get $align))) (local.set $end (i32.add (local.get $ptr) (local.get $size))) ;; grow if end > current capacity in bytes (if (i32.gt_u (local.get $end) (i32.mul (memory.size) (i32.const 65536))) (then ;; pages needed = ceil((end - capacity) / 65536) (local.set $need (i32.div_u (i32.add (i32.sub (local.get $end) (i32.mul (memory.size) (i32.const 65536))) (i32.const 65535)) (i32.const 65536))) ;; memory.grow returns -1 on failure (if (i32.eq (memory.grow (local.get $need)) (i32.const -1)) (then (unreachable))))) ;; OOM: trap (global.set $next (local.get $end)) (local.get $ptr)) -
Implement
resetto free the whole arena at once. This is the onlyfreea bump allocator offers.(func (export "reset") (global.set $next (global.get $base))) -
Drive it from JavaScript. Allocate, build a view after the alloc (it may have grown memory), use the block, then reset when the batch is done.
const { instance } = await WebAssembly.instantiateStreaming(fetch("/bump.wasm")); const ex = instance.exports; const ptr = ex.alloc(64, 4); // 64 bytes, 4-byte aligned // Re-read .buffer AFTER alloc — a grow may have replaced it. const view = new Uint8Array(ex.memory.buffer, ptr, 64); view.set(new TextEncoder().encode("scratch")); // ... use it ... ex.reset(); // frees every allocation at once
A subtle point in step 3: the order of operations matters for correctness under growth. We compute end
before deciding whether to grow, and we grow by exactly the number of whole pages needed to cover end,
using the ceiling-division idiom (needed + 65535) / 65536 so a request that overshoots a page boundary
by a single byte still rounds up to a full page rather than truncating. After a successful grow we commit
next = end and return the aligned ptr we computed at the top — never re-reading next, because nothing
else touched it. If memory.grow returns -1 we hit unreachable, which raises a trap and aborts the
call cleanly; that is the bump allocator’s honest answer to out-of-memory. You could instead return a
sentinel like 0 and let the caller branch, which is friendlier to JavaScript but pushes the null-check
onto every call site.
The same allocator in Rust is a static mut cursor with the identical logic, useful if you want the rest
of your module in Rust but still want bump semantics for a hot path:
static mut NEXT: usize = 1024;
const BASE: usize = 1024;
#[no_mangle]
pub extern "C" fn alloc(size: usize, align: usize) -> *mut u8 {
unsafe {
let ptr = (NEXT + align - 1) & !(align - 1);
let end = ptr + size;
let cap = core::arch::wasm32::memory_size(0) * 65536;
if end > cap {
let need = (end - cap + 65535) / 65536;
if core::arch::wasm32::memory_grow(0, need) == usize::MAX {
core::arch::wasm32::unreachable();
}
}
NEXT = end;
ptr as *mut u8
}
}
#[no_mangle]
pub extern "C" fn reset() {
unsafe { NEXT = BASE; }
}
Expected output
Compile and confirm the exports are present:
wat2wasm bump.wat -o bump.wasm
wasm-objdump -x bump.wasm | grep -E "func\[.*\] <(alloc|reset)>"
# - func[1] <alloc> -> "alloc"
# - func[2] <reset> -> "reset"
A sequence of allocations advances next monotonically; reset rewinds it. Logging the returned pointers
makes the bump visible:
console.log(ex.alloc(10, 1)); // 1024
console.log(ex.alloc(10, 8)); // 1040 (1034 rounded up to 8-byte alignment)
console.log(ex.alloc(10, 1)); // 1050
ex.reset();
console.log(ex.alloc(10, 1)); // 1024 again — arena reused
Gotchas
- No individual free. Calling
allocrepeatedly withoutresetonly ever grows the arena. If youallocper frame but neverreset,memory.buffer.byteLengthclimbs until growth fails — that is a leak with this design, not a bug in the allocator. Reset at a clear lifetime boundary. - Alignment must be a power of two. The
(a - 1)mask is only correct for powers of two. Passingalign = 3silently misaligns; passingalign = 0underflows to0xFFFFFFFFand the mask zeroes the pointer. Validate alignment, or hard-code it. - Growing memory mid-bump invalidates JS views. A
memory.growinsidealloccan replace the backingArrayBuffer, detaching anyUint8Arrayyou already built. Always construct views after the alloc that produced the pointer — see why memory.grow invalidates pointers. - Colliding with the shadow stack. Starting the arena at too low an offset overwrites the linker’s
data section or shadow stack. Base it high enough, or in Rust let the toolchain pick a
staticaddress rather than a hard-coded1024.
When a bump allocator is the right choice
Reach for this design when your allocations share a lifetime you can name: everything allocated while
parsing one document, decoding one frame, or handling one request, all discarded together. Parsers are the
canonical fit — build the entire AST in an arena, walk it, then reset and move to the next input.
Per-frame rendering scratch (vertex buffers, intermediate masks) is another, because the frame boundary is
a natural reset point. The anti-pattern is any workload with overlapping, independently-ending lifetimes —
a long-lived cache alongside short-lived temporaries — because the long-lived object pins the bump pointer
and prevents reset from reclaiming the temporaries. In that case, split your memory: a dlmalloc heap
for the cache and a separate bump arena for the temporaries, which is exactly the lifetime segregation the
linear memory management overview
recommends for fighting fragmentation.
Performance note
Allocation here is a handful of instructions — an add, a mask, a compare, a store — with no free list to
walk and no metadata header per block. In a tight loop that calls alloc millions of times, a bump
allocator routinely beats dlmalloc by an order of magnitude on allocation latency, and it adds roughly
0.1 KiB of code versus dlmalloc’s 6–10 KiB. Because there is no header before each block, the memory
overhead is also zero — you store exactly the bytes you asked for, where dlmalloc rounds up to a size
class and prepends metadata. The cost is entirely in the freeing model: you trade per-object reclamation
for arena-at-once reset.
Frequently Asked Questions
Can I free a single allocation from a bump allocator?
No — that is the defining limitation. A bump allocator only supports freeing the entire arena via reset.
If you need per-object free, use dlmalloc (the Rust/Emscripten default) or a more complex design; the
tradeoffs are compared in the linear memory management overview.
What alignment should I pass?
Match the largest type you store in the block: 4 for i32/f32, 8 for f64 and 64-bit integers, 16
for SIMD v128. When in doubt, 8 is a safe default that satisfies every scalar type.
Why does my view read zeros after a few allocations?
An alloc call grew linear memory and replaced the ArrayBuffer, detaching the view you built earlier.
Re-create the view from instance.exports.memory.buffer after every alloc.
Related
- Linear memory management & allocators — dlmalloc, wee_alloc, and the full allocator tradeoff table.
- Why memory.grow invalidates pointers — the detached-view bug bump allocators trigger when they grow.
- Zero-copy data transfer patterns — using arena memory as a shared scratch buffer.
← Back to Linear Memory Management & Allocators