Avoiding Copies When Passing Image Buffers

This guide answers one task: feed a canvas’s pixels into a WebAssembly module, process them in place (here, grayscale), and read them back to the canvas without copying the buffer in or out on every frame.

Prerequisites

  • [ ] A .wasm module exporting memory, an alloc/free pair, and an in-place pixel function (e.g. grayscale(ptr, len)).
  • [ ] A browser with <canvas> 2D context; for the worker path, OffscreenCanvas (Chrome 69+, Firefox 105+).
  • [ ] A loaded image drawn to the canvas so getImageData has pixels to return.
  • [ ] performance.now() for confirming the copy elimination.

How in-place pixel processing works

A canvas 2D context hands you pixels via ctx.getImageData(x, y, w, h), whose .data field is a Uint8ClampedArray of straight (non-premultiplied) RGBA bytes — four bytes per pixel, row-major. To process those pixels in WebAssembly you place them in the module’s linear memory, run the function over that region, and read the result through a view aliasing the same region. The clamped array matters: any value the module writes back is read as [0, 255], so out-of-range results saturate instead of wrapping.

The key to zero per-frame overhead is allocating the frame-sized region once and reusing the pointer across frames, rather than re-allocating every tick. Re-allocating each frame is the bug that quietly reintroduces the work you were trying to remove.

It helps to count the copies in the naive version so you know what you are removing. A typical unoptimized filter does four moves of the pixel data per frame: getImageData produces a fresh Uint8ClampedArray (one copy out of the canvas’s internal buffer), you set it into linear memory (a second copy), the module writes results to a separate output region (a third), and you read that back into an ImageData and putImageData it (a fourth, plus the canvas’s own internal copy). Processing in place collapses the module’s two regions into one, and reusing the allocation removes the per-frame allocator work. What remains is the unavoidable minimum: one copy in from the canvas and one blit back. For an 8 MB frame, going from four data moves to two is the difference between ~3.3 ms and ~1.66 ms of pure copy time per frame.

Step-by-step procedure

  1. Instantiate and capture the exports.

    const { instance } = await WebAssembly.instantiateStreaming(fetch("/img.wasm"));
    const { memory, alloc, free, grayscale } = instance.exports;
  2. Read the frame from the canvas. .data is a Uint8ClampedArray of RGBA bytes.

    const { width, height } = canvas;
    const frame = ctx.getImageData(0, 0, width, height);
    const len = frame.data.length;       // width * height * 4
  3. Allocate the region once, outside any render loop. Hoist this so it runs a single time.

    const ptr = alloc(len);              // reuse this pointer every frame
  4. Alias the region with a clamped view and fill it. The view aliases linear memory; set fills it.

    const view = new Uint8ClampedArray(memory.buffer, ptr, len);
    view.set(frame.data);                // copy source pixels in once per frame
  5. Run the in-place transform. Only the pointer and length cross the boundary.

    grayscale(ptr, len);                 // module mutates the bytes directly
  6. Write the result back to the canvas. view already reflects the processed pixels — no copy out. putImageData requires an ImageData, so wrap the view’s bytes; rebuild the view first if a grow may have occurred.

    const out = new Uint8ClampedArray(memory.buffer, ptr, len);   // re-alias, detach-safe
    ctx.putImageData(new ImageData(out.slice(), width, height), 0, 0);
  7. Free only when you are done with the buffer for good — not every frame. For a streaming loop, free on teardown.

    // on stop():
    free(ptr, len);

Reusing the buffer across frames

For video or an animation loop, steps 1 and 3 run once; steps 2 and 4–6 run per frame against the same ptr. That turns N allocations into one:

const ptr = alloc(len);                              // once
function renderFrame() {
  const frame = ctx.getImageData(0, 0, width, height);
  const view = new Uint8ClampedArray(memory.buffer, ptr, len);
  view.set(frame.data);
  grayscale(ptr, len);
  ctx.putImageData(new ImageData(view.slice(), width, height), 0, 0);
  requestAnimationFrame(renderFrame);
}
requestAnimationFrame(renderFrame);

OffscreenCanvas in a worker

Heavy per-frame processing on the main thread causes jank. Move it to a worker with OffscreenCanvas: transfer the canvas to the worker once, and run the same allocate-once loop there so the main thread stays free for input and layout.

// main thread
const offscreen = canvas.transferControlToOffscreen();
worker.postMessage({ canvas: offscreen }, [offscreen]);   // transfer, not copy
// worker.js
onmessage = async ({ data }) => {
  const ctx = data.canvas.getContext("2d");
  const { instance } = await WebAssembly.instantiateStreaming(fetch("/img.wasm"));
  const { memory, alloc, free, grayscale } = instance.exports;
  const len = data.canvas.width * data.canvas.height * 4;
  const ptr = alloc(len);                                 // once
  // ...same per-frame loop, off the main thread
};

Expected output

After step 6 the canvas shows the grayscale image. Verify a single pixel was converted in place:

const px = new Uint8ClampedArray(memory.buffer, ptr, 4);
console.log([...px]);   // e.g. [128, 128, 128, 255] — R=G=B means grayscale, alpha preserved

Equal R, G, and B channels with the original alpha confirm the module wrote luminance back into the same bytes you handed it.

A correct in-place grayscale also tells you the layout assumptions held. The four bytes are R, G, B, A in that order; the module computed a luminance value (commonly 0.299·R + 0.587·G + 0.114·B), wrote it to all three color channels, and left the alpha byte at index 3 untouched. If the alpha came back as something other than the original 255, the module either ran off the end of the region or mis-indexed the stride — both of which point at a len that is not exactly width * height * 4. Checking a corner pixel and a center pixel is usually enough to catch a row-stride bug, which manifests as a sheared or repeated image rather than wrong colors.

Gotchas

  • Re-allocating every frame is the bug. Calling alloc(len) inside the render loop leaks (without a matching free) or thrashes the allocator and may trigger memory.grow, detaching your views. Allocate once, reuse the pointer.

  • ImageData is straight RGBA, not premultiplied. Canvas getImageData/putImageData use non-premultiplied (straight) alpha, so process channels as-is — do not divide or multiply by alpha. The array is Uint8ClampedArray, so writes outside [0, 255] saturate rather than wrap.

  • Detached buffer after grow. If the module grew memory (e.g. a large allocation elsewhere), a view built earlier is detached and reads zero-length, producing a blank or garbage frame. Rebuild the view from memory.buffer right before putImageData.

  • putImageData needs its own backing array. ImageData adopts the array you pass; handing it a live view over linear memory ties the canvas to memory you may free or overwrite. Pass view.slice() for the final blit, or keep the region alive for the canvas’s lifetime.

Verifying the speedup

Time the optimized loop against the naive copy-heavy version with performance.now() so the saving is a number, not a hope:

let frames = 0, total = 0;
function timed() {
  const t = performance.now();
  const frame = ctx.getImageData(0, 0, width, height);
  new Uint8ClampedArray(memory.buffer, ptr, len).set(frame.data);
  grayscale(ptr, len);
  ctx.putImageData(new ImageData(new Uint8ClampedArray(memory.buffer, ptr, len).slice(), width, height), 0, 0);
  total += performance.now() - t;
  if (++frames === 120) console.log(`avg ${(total / frames).toFixed(2)} ms/frame`);
  else requestAnimationFrame(timed);
}
requestAnimationFrame(timed);

Compare the average against a variant that re-allocs every frame and uses a separate output region; the in-place, allocate-once version should shave the per-frame copy and allocator time, and the gap widens with frame size.

Performance note

The win is concrete: eliminating the copy-in and copy-out for an 8 MB 1080p frame removes ~1.66 ms per frame (two ~0.83 ms memcpys at ~10 GB/s). At 60 fps that is ~100 ms of pure copy work reclaimed every second — often the difference between a smooth and a dropping frame rate. Reusing one allocation across frames removes the allocator cost on top, and keeping the work in an OffscreenCanvas worker keeps the main thread free so the saved time turns into responsiveness rather than just headroom.

Sizing memory for a frame buffer

A 1080p RGBA frame is ~8 MB, and a WebAssembly.Memory grows in 64 KiB pages. If your module’s default memory is small, the first alloc(len) for a full frame will trigger one or more memory.grow calls to make room — and that grow is exactly the event that detaches any view you built before it. Two practices keep this from biting. First, size the memory’s initial pages to comfortably hold the frame buffer plus the module’s own working set, so the frame allocation never forces a grow at runtime. Second, do the allocation once during setup, before you build any long-lived views, so the single unavoidable grow happens up front rather than mid-loop. After that, the pointer is stable and the views you build each frame are safe for the duration of the frame. Reserving headroom up front trades a little memory for the elimination of a whole class of intermittent blank-frame bugs.

Frequently Asked Questions

Why is my processed image blank or full of noise? Most often a detached buffer: a memory.grow between allocating and reading swapped the backing ArrayBuffer, so your view is zero-length. Rebuild the view from memory.buffer immediately before putImageData. Less often, a wrong len (not width * height * 4) misaligns rows.

Can I avoid even the one set() into linear memory? Only if the pixels originate inside linear memory to begin with — for example, the module decodes the image itself. With canvas you must move the bytes from the canvas’s buffer into the module’s memory once; the gain is removing the second and third copies (the readback and any intermediate), and amortizing the allocation. See zero-copy data transfer patterns.

Does OffscreenCanvas copy the canvas to the worker? No. transferControlToOffscreen() plus a transfer-list postMessage transfers control with no pixel copy — the original canvas can no longer be drawn to from the main thread, which is the signal that ownership moved rather than duplicated.

← Back to Zero-Copy Data Transfer Patterns