Avoiding Copies When Passing Image Buffers
This guide answers one task: feed a canvas’s pixels into a WebAssembly module, process them in place (here, grayscale), and read them back to the canvas without copying the buffer in or out on every frame.
Prerequisites
- [ ] A
.wasmmodule exportingmemory, analloc/freepair, and an in-place pixel function (e.g.grayscale(ptr, len)). - [ ] A browser with
<canvas>2D context; for the worker path,OffscreenCanvas(Chrome 69+, Firefox 105+). - [ ] A loaded image drawn to the canvas so
getImageDatahas pixels to return. - [ ]
performance.now()for confirming the copy elimination.
How in-place pixel processing works
A canvas 2D context hands you pixels via ctx.getImageData(x, y, w, h), whose .data field is a
Uint8ClampedArray of straight (non-premultiplied) RGBA bytes — four bytes per pixel, row-major. To
process those pixels in WebAssembly you place them in the module’s linear memory, run the function over
that region, and read the result through a view aliasing the same region. The clamped array matters: any
value the module writes back is read as [0, 255], so out-of-range results saturate instead of wrapping.
The key to zero per-frame overhead is allocating the frame-sized region once and reusing the pointer across frames, rather than re-allocating every tick. Re-allocating each frame is the bug that quietly reintroduces the work you were trying to remove.
It helps to count the copies in the naive version so you know what you are removing. A typical
unoptimized filter does four moves of the pixel data per frame: getImageData produces a fresh
Uint8ClampedArray (one copy out of the canvas’s internal buffer), you set it into linear memory
(a second copy), the module writes results to a separate output region (a third), and you read that back
into an ImageData and putImageData it (a fourth, plus the canvas’s own internal copy). Processing in
place collapses the module’s two regions into one, and reusing the allocation removes the per-frame
allocator work. What remains is the unavoidable minimum: one copy in from the canvas and one blit back.
For an 8 MB frame, going from four data moves to two is the difference between ~3.3 ms and ~1.66 ms of
pure copy time per frame.
Step-by-step procedure
-
Instantiate and capture the exports.
const { instance } = await WebAssembly.instantiateStreaming(fetch("/img.wasm")); const { memory, alloc, free, grayscale } = instance.exports; -
Read the frame from the canvas.
.datais aUint8ClampedArrayof RGBA bytes.const { width, height } = canvas; const frame = ctx.getImageData(0, 0, width, height); const len = frame.data.length; // width * height * 4 -
Allocate the region once, outside any render loop. Hoist this so it runs a single time.
const ptr = alloc(len); // reuse this pointer every frame -
Alias the region with a clamped view and fill it. The view aliases
linear memory;setfills it.const view = new Uint8ClampedArray(memory.buffer, ptr, len); view.set(frame.data); // copy source pixels in once per frame -
Run the in-place transform. Only the pointer and length cross the boundary.
grayscale(ptr, len); // module mutates the bytes directly -
Write the result back to the canvas.
viewalready reflects the processed pixels — no copy out.putImageDatarequires anImageData, so wrap the view’s bytes; rebuild the view first if a grow may have occurred.const out = new Uint8ClampedArray(memory.buffer, ptr, len); // re-alias, detach-safe ctx.putImageData(new ImageData(out.slice(), width, height), 0, 0); -
Free only when you are done with the buffer for good — not every frame. For a streaming loop, free on teardown.
// on stop(): free(ptr, len);
Reusing the buffer across frames
For video or an animation loop, steps 1 and 3 run once; steps 2 and 4–6 run per frame against the same
ptr. That turns N allocations into one:
const ptr = alloc(len); // once
function renderFrame() {
const frame = ctx.getImageData(0, 0, width, height);
const view = new Uint8ClampedArray(memory.buffer, ptr, len);
view.set(frame.data);
grayscale(ptr, len);
ctx.putImageData(new ImageData(view.slice(), width, height), 0, 0);
requestAnimationFrame(renderFrame);
}
requestAnimationFrame(renderFrame);
OffscreenCanvas in a worker
Heavy per-frame processing on the main thread causes jank. Move it to a worker with OffscreenCanvas:
transfer the canvas to the worker once, and run the same allocate-once loop there so the main thread stays
free for input and layout.
// main thread
const offscreen = canvas.transferControlToOffscreen();
worker.postMessage({ canvas: offscreen }, [offscreen]); // transfer, not copy
// worker.js
onmessage = async ({ data }) => {
const ctx = data.canvas.getContext("2d");
const { instance } = await WebAssembly.instantiateStreaming(fetch("/img.wasm"));
const { memory, alloc, free, grayscale } = instance.exports;
const len = data.canvas.width * data.canvas.height * 4;
const ptr = alloc(len); // once
// ...same per-frame loop, off the main thread
};
Expected output
After step 6 the canvas shows the grayscale image. Verify a single pixel was converted in place:
const px = new Uint8ClampedArray(memory.buffer, ptr, 4);
console.log([...px]); // e.g. [128, 128, 128, 255] — R=G=B means grayscale, alpha preserved
Equal R, G, and B channels with the original alpha confirm the module wrote luminance back into the same bytes you handed it.
A correct in-place grayscale also tells you the layout assumptions held. The four bytes are R, G, B, A in
that order; the module computed a luminance value (commonly 0.299·R + 0.587·G + 0.114·B), wrote it to
all three color channels, and left the alpha byte at index 3 untouched. If the alpha came back as
something other than the original 255, the module either ran off the end of the region or mis-indexed the
stride — both of which point at a len that is not exactly width * height * 4. Checking a corner pixel
and a center pixel is usually enough to catch a row-stride bug, which manifests as a sheared or repeated
image rather than wrong colors.
Gotchas
-
Re-allocating every frame is the bug. Calling
alloc(len)inside the render loop leaks (without a matchingfree) or thrashes the allocator and may triggermemory.grow, detaching your views. Allocate once, reuse the pointer. -
ImageData is straight RGBA, not premultiplied. Canvas
getImageData/putImageDatause non-premultiplied (straight) alpha, so process channels as-is — do not divide or multiply by alpha. The array isUint8ClampedArray, so writes outside[0, 255]saturate rather than wrap. -
Detached buffer after grow. If the module grew memory (e.g. a large allocation elsewhere), a view built earlier is detached and reads zero-length, producing a blank or garbage frame. Rebuild the view from
memory.bufferright beforeputImageData. -
putImageDataneeds its own backing array.ImageDataadopts the array you pass; handing it a live view overlinear memoryties the canvas to memory you may free or overwrite. Passview.slice()for the final blit, or keep the region alive for the canvas’s lifetime.
Verifying the speedup
Time the optimized loop against the naive copy-heavy version with performance.now() so the saving is a
number, not a hope:
let frames = 0, total = 0;
function timed() {
const t = performance.now();
const frame = ctx.getImageData(0, 0, width, height);
new Uint8ClampedArray(memory.buffer, ptr, len).set(frame.data);
grayscale(ptr, len);
ctx.putImageData(new ImageData(new Uint8ClampedArray(memory.buffer, ptr, len).slice(), width, height), 0, 0);
total += performance.now() - t;
if (++frames === 120) console.log(`avg ${(total / frames).toFixed(2)} ms/frame`);
else requestAnimationFrame(timed);
}
requestAnimationFrame(timed);
Compare the average against a variant that re-allocs every frame and uses a separate output region; the
in-place, allocate-once version should shave the per-frame copy and allocator time, and the gap widens
with frame size.
Performance note
The win is concrete: eliminating the copy-in and copy-out for an 8 MB 1080p frame removes ~1.66 ms per
frame (two ~0.83 ms memcpys at ~10 GB/s). At 60 fps that is ~100 ms of pure copy work reclaimed every
second — often the difference between a smooth and a dropping frame rate. Reusing one allocation across
frames removes the allocator cost on top, and keeping the work in an OffscreenCanvas worker keeps the
main thread free so the saved time turns into responsiveness rather than just headroom.
Sizing memory for a frame buffer
A 1080p RGBA frame is ~8 MB, and a WebAssembly.Memory grows in 64 KiB pages. If your module’s default
memory is small, the first alloc(len) for a full frame will trigger one or more memory.grow calls to
make room — and that grow is exactly the event that detaches any view you built before it. Two practices
keep this from biting. First, size the memory’s initial pages to comfortably hold the frame buffer plus
the module’s own working set, so the frame allocation never forces a grow at runtime. Second, do the
allocation once during setup, before you build any long-lived views, so the single unavoidable grow
happens up front rather than mid-loop. After that, the pointer is stable and the views you build each
frame are safe for the duration of the frame. Reserving headroom up front trades a little memory for the
elimination of a whole class of intermittent blank-frame bugs.
Frequently Asked Questions
Why is my processed image blank or full of noise?
Most often a detached buffer: a memory.grow between allocating and reading swapped the backing
ArrayBuffer, so your view is zero-length. Rebuild the view from memory.buffer immediately before
putImageData. Less often, a wrong len (not width * height * 4) misaligns rows.
Can I avoid even the one set() into linear memory?
Only if the pixels originate inside linear memory to begin with — for example, the module decodes the
image itself. With canvas you must move the bytes from the canvas’s buffer into the module’s memory once;
the gain is removing the second and third copies (the readback and any intermediate), and amortizing
the allocation. See zero-copy data transfer patterns.
Does OffscreenCanvas copy the canvas to the worker?
No. transferControlToOffscreen() plus a transfer-list postMessage transfers control with no pixel
copy — the original canvas can no longer be drawn to from the main thread, which is the signal that
ownership moved rather than duplicated.
Related
- Reading Wasm linear memory with typed arrays — build and decode views at an offset.
- Linear memory management & allocators — the allocate-once pattern behind frame reuse.
- Sharing memory between Wasm and Web Workers — coordinating shared buffers across threads.
← Back to Zero-Copy Data Transfer Patterns