SharedArrayBuffer, Atomics & Threading

WebAssembly has no threads of its own. A .wasm module is just code; the host supplies the threads, and on the web that means Web Workers. To make several workers cooperate on the same data you give them one linear memory whose backing store is a SharedArrayBuffer instead of a private ArrayBuffer, instantiate the same module in every worker against that one memory, and coordinate access with the Atomics operations the threads proposal adds to the instruction set. This guide covers the whole stack: why threads need shared memory, the cross-origin isolation that browsers require before they will even define SharedArrayBuffer, the atomic wait/notify protocol, building a worker pool, and how Rust (wasm-bindgen-rayon) and C/C++ (Emscripten -pthread) reach all of it from a normal source tree.

Prerequisites

[ ] A document served cross-origin isolated — Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp. Verify with self.crossOriginIsolated === true.
[ ] A modern browser (Chrome 92+, Firefox 95+, Safari 15.2+) where SharedArrayBuffer and Wasm threads ship.
[ ] Rust nightly (rustup toolchain install nightly) plus rust-src for building threaded Rust: rustup component add rust-src --toolchain nightly.
[ ] RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' for atomic codegen.
[ ] Emscripten 3.x (emcc -v) if you compile C/C++ with -pthread.
[ ] A local server that can send custom headers (the COOP/COEP pair) over https or localhost.

How shared memory and atomics fit together

A non-shared WebAssembly.Memory is a single ArrayBuffer owned by one instance. The moment you pass it to a worker with postMessage, the structured-clone algorithm copies the bytes — two workers end up with two disconnected memories. A shared memory is different: it is created with { shared: true }, backed by a SharedArrayBuffer, and when you postMessage that buffer the structured clone shares the backing store rather than copying it. Every worker that instantiates the module with that memory as a linear memory import now reads and writes the exact same bytes.

Sharing bytes is necessary but not sufficient — two threads writing the same address is a data race. The threads proposal answers this with atomic instructions (i32.atomic.rmw.add, i32.atomic.rmw.cmpxchg, memory.atomic.wait32, memory.atomic.notify, and friends) and a sequentially-consistent memory ordering for them. JavaScript exposes the same primitives through the Atomics object: Atomics.add, Atomics.compareExchange, Atomics.wait, and Atomics.notify. A worker that needs to block until another thread is done calls Atomics.wait on an Int32Array index; the producer calls Atomics.notify on that same index to wake it. This wait/notify pair is the foundation of every higher-level construct — mutexes, condition variables, and the futex-style parking that pthreads and Rust’s std::sync build on.

The key mental shift from single-threaded Wasm is that the bytes in linear memory are no longer owned by one instance. They are a shared resource, and like any shared mutable resource they need a discipline: a rule for who may write which region when. Atomics give you the low-level enforcement primitives, but the protocol — which slot is a lock, which is a counter, which is a payload, and in what order they are read and written — is yours to design. Most threading bugs in Wasm are not exotic memory-ordering puzzles; they are protocol bugs where two threads disagree about whose turn it is to touch a region. Keep the protocol explicit and small, and the atomics fall into place around it.

Memory ordering matters here in a concrete way. Wasm atomics are sequentially consistent, which means an atomic store that another thread observes also publishes every ordinary (non-atomic) write that preceded it in program order. That is what lets you write a payload with plain stores and then “release” it with a single atomic flag store: a reader that sees the flag is guaranteed to see the payload. Break that pairing — read the payload without first checking the atomic flag — and you have a classic torn read, where one thread sees half-written data. The whole point of the flag is to be the synchronization edge.

Cross-origin isolation: the gate you hit first

Before any of this works, the browser must have decided your page is allowed to hold a SharedArrayBuffer. After the Spectre disclosures, vendors removed SharedArrayBuffer from non-isolated contexts because it can be turned into a high-resolution timer that powers side-channel attacks. To get it back, your top-level document must opt into cross-origin isolation by sending two response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

COOP: same-origin severs the page from cross-origin windows that could share its browsing context group; COEP: require-corp forces every subresource (images, scripts, fonts) to explicitly opt into being embedded via Cross-Origin-Resource-Policy or CORS. When both hold, the browser sets self.crossOriginIsolated to true and re-enables SharedArrayBuffer plus high-resolution performance.now(). Getting the exact header values and the dev-server config right — including the Cross-Origin-Resource-Policy: cross-origin you need on third-party assets — is involved enough that it has its own walkthrough in configuring COOP/COEP headers for SharedArrayBuffer.

The single most common failure in threaded Wasm is forgetting this step: your code throws ReferenceError: SharedArrayBuffer is not defined (or WebAssembly.Memory rejects { shared: true }) and it is tempting to blame the toolchain. It is almost always a missing header.

Isolation is binary and document-wide. There is no “isolate just this worker” — the entire top-level document either qualifies or it does not, and a single embedded subresource that does not opt in (an <img> from a CDN with no Cross-Origin-Resource-Policy, an analytics script without CORS) silently flips crossOriginIsolated back to false for everything. In practice the most painful part of shipping threaded Wasm is not the threading code at all; it is auditing every third-party asset on the page so that COEP’s require-corp does not reject it. When isolation is failing intermittently, suspect a lazily loaded ad, font, or image rather than your worker setup. The companion guide on headers covers the credentialless COEP variant, which relaxes the requirement for some cross-origin resources, and the report-only modes that let you find offenders before enforcing.

Step-by-step: from a non-shared module to a worker pool

The mechanics are the same whether you write WAT by hand, compile Rust, or compile C — only the build command changes. Here is the end-to-end flow.

Serve the page cross-origin isolated. Configure your server (or dev server) to emit the COOP/COEP pair, then confirm in the console:
```
console.log(self.crossOriginIsolated); // must be true before you continue
```

Build the module with atomics enabled. For Rust, target the nightly toolchain with the atomics and bulk-memory features and rebuild std so its synchronization primitives use atomic ops:

RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' \
  rustup run nightly \
  wasm-pack build --target web -- -Z build-std=panic_abort,std

For C/C++ with Emscripten, pass -pthread and let it provision a worker pool:

emcc threaded.c -pthread -s PTHREAD_POOL_SIZE=4 -s PROXY_TO_PTHREAD \
  -o threaded.js

Create one shared memory on whichever thread owns the lifecycle (usually the main thread):

const memory = new WebAssembly.Memory({ initial: 256, maximum: 256, shared: true });

Hand the memory to each worker through postMessage. The structured clone shares the SharedArrayBuffer, so no bytes are copied:

const worker = new Worker("./worker.js", { type: "module" });
worker.postMessage({ memory, module: compiledModule });

Instantiate the same module in every worker with that memory as an import, so all instances point at one linear memory. (Detailed in sharing memory between Wasm and Web Workers.)
Coordinate with Atomics. Use Atomics.wait/Atomics.notify to park and wake worker threads and atomic read-modify-write ops to mutate shared counters and locks without races (the spinlock and futex handoff patterns live in using Atomics for Wasm thread synchronization).

A concrete shared-memory + Atomics example

This is the smallest end-to-end example that demonstrates two real threads observing one memory and synchronizing through an atomic flag. The main thread creates the shared memory, posts it, writes a value, and notifies; the worker waits on the flag and reads the value the moment it is woken.

// main.js — runs on the main thread of a cross-origin-isolated page
if (!self.crossOriginIsolated) {
  throw new Error("Not cross-origin isolated — SharedArrayBuffer is unavailable.");
}

// 256 pages = 16 MiB; maximum must equal initial for a non-growable shared memory in some engines.
const memory = new WebAssembly.Memory({ initial: 256, maximum: 256, shared: true });
const flags = new Int32Array(memory.buffer); // Int32Array is required by Atomics

const worker = new Worker("./worker.js", { type: "module" });
worker.postMessage({ memory });

// Give the worker a moment to start waiting, then publish data and wake it.
setTimeout(() => {
  Atomics.store(flags, 1, 0xC0FFEE);   // payload at index 1
  Atomics.store(flags, 0, 1);          // mark "ready" at index 0
  Atomics.notify(flags, 0, 1);         // wake exactly one waiter on index 0
}, 50);

// worker.js — runs on a Web Worker; Atomics.wait is allowed here (not on the main thread)
self.onmessage = ({ data: { memory } }) => {
  const flags = new Int32Array(memory.buffer);

  // Block until index 0 changes away from 0. Returns "ok", "not-equal", or "timed-out".
  const result = Atomics.wait(flags, 0, 0);
  const payload = Atomics.load(flags, 1);

  // We see the exact bytes the main thread wrote — same SharedArrayBuffer, no copy.
  console.log("worker woke:", result, "payload:", payload.toString(16)); // -> ok c0ffee
};

In a real Wasm program both sides would also instantiate the module against memory and call exported functions; the JavaScript above is the synchronization skeleton those functions run inside. Note the asymmetry: Atomics.wait is only legal on a worker thread — calling it on the main thread throws a TypeError because blocking the UI thread is forbidden. The main thread polls with Atomics.waitAsync instead, or simply never waits.

The setTimeout in main.js is a deliberate simplification for the example — it gives the worker time to reach its Atomics.wait before the main thread notifies. In production you would never rely on a timer for ordering: the wait/notify protocol is robust precisely because Atomics.wait only sleeps while the slot still holds the expected value. If the main thread stores and notifies before the worker reaches wait, the worker’s Atomics.wait(flags, 0, 0) sees the slot is no longer 0, returns "not-equal" immediately, and the worker reads the payload anyway. That is the correct behavior, and it is why a well-written waiter always checks the return value and re-reads state rather than assuming “I was woken, therefore the data is ready.” The timer is just to make the demo’s console output deterministic.

How Rust and C/C++ reach Wasm threads

Rust does not expose Atomics directly; you write ordinary std::sync and std::thread-style code and let the toolchain lower it to atomic Wasm ops. The practical path for browsers is wasm-bindgen-rayon: you build with the +atomics,+bulk-memory target features against a rebuilt std (-Z build-std), and the wasm-bindgen-rayon runtime spins up a pool of Web Workers, each instantiating your module against one shared memory. A rayon::par_iter() then fans work across those workers transparently. The build line from step 2 plus a one-time initThreadPool(navigator.hardwareConcurrency) call at startup is the whole integration. This rides directly on the standard Rust-to-Wasm compilation toolchain; threading is an additive set of flags, not a different compiler.

C/C++ via Emscripten is more turnkey because Emscripten ships a full pthreads implementation that maps pthread_create onto Web Workers and pthread_mutex_t onto atomic spinlocks plus futex waits. The two flags that matter:

-pthread enables shared memory and the pthreads runtime; combine with -s PTHREAD_POOL_SIZE=N to pre-spawn N workers so pthread_create does not pay worker-startup latency at runtime.
-s PROXY_TO_PTHREAD moves main() itself onto a worker so it can block (and call pthread_join) without freezing the browser’s main thread.

Both toolchains ultimately emit a module whose data and bss live in a shared linear memory, and both rely on the same COOP/COEP gate.

There is a subtlety worth flagging for both languages: thread-local storage. When several instances share one linear memory, each still needs its own stack and its own TLS region, because a function-local variable on thread A must not collide with the same variable on thread B. The toolchains handle this by giving every worker a distinct stack pointer and a distinct TLS base within the shared memory — Emscripten carves a per-thread stack out of the heap, and the Rust/wasm-bindgen-rayon runtime does the equivalent when it spins up the pool. You rarely touch this directly, but it explains why a threaded module’s maximum memory must be sized for all threads’ stacks at once, not just the heap. Under-provisioning maximum is a common cause of mysterious stack-overflow traps that only appear once the thread count goes up. Budget roughly your per-thread stack size times the worker count on top of your heap working set.

Building a worker pool

Spawning a fresh Worker per task is wasteful — worker startup and module instantiation cost milliseconds. A pool creates the workers once, instantiates the module against the shared memory in each, and then feeds them tasks over the lifetime of the page. The pattern is a fixed array of workers plus a queue:

class WasmWorkerPool {
  constructor(size, module, memory) {
    this.idle = [];
    this.queue = [];
    this.workers = Array.from({ length: size }, () => {
      const w = new Worker("./pool-worker.js", { type: "module" });
      w.postMessage({ module, memory });          // instantiate once, share one memory
      w.onmessage = (e) => this.#onDone(w, e.data);
      this.idle.push(w);
      return w;
    });
  }
  run(task) {
    return new Promise((resolve) => {
      this.queue.push({ task, resolve });
      this.#pump();
    });
  }
  #pump() {
    while (this.idle.length && this.queue.length) {
      const w = this.idle.pop();
      const job = this.queue.shift();
      w._resolve = job.resolve;
      w.postMessage({ task: job.task });
    }
  }
  #onDone(w, result) {
    const resolve = w._resolve;
    this.idle.push(w);
    resolve(result);
    this.#pump();
  }
}

Each worker holds a live instance bound to the shared memory, so dispatching a task is a single postMessage with the task descriptor — never the data, which already lives in the shared buffer. This is exactly the model wasm-bindgen-rayon and Emscripten’s PTHREAD_POOL_SIZE implement for you; rolling your own is worthwhile when you want explicit control over scheduling or backpressure. Size the pool to navigator.hardwareConcurrency and feed it coarse-grained tasks so the per-dispatch overhead stays negligible against the work each task does.

Tradeoffs: shared memory vs postMessage copies

Shared memory is not always the right answer. The decision is fundamentally about payload size and access pattern.

postMessage copy (structured clone or transfer). Simple, no headers required, no data-race risk — each side owns its bytes. But a structured clone is O(payload): cloning a 64 MiB buffer costs real memcpy time and a GC allocation. Transferables (ArrayBuffer via the transfer list) move ownership in O(1) but then the sender loses access, which is wrong when both sides must keep reading.
Shared memory. Posting a SharedArrayBuffer is O(1) regardless of size and both sides retain live access — ideal for large, long-lived working sets (image pipelines, simulation grids, audio ring buffers). The cost is correctness: you now own synchronization, and every uncoordinated concurrent write is undefined behavior at the JS-visible level.

Contention is the other tradeoff. An uncontended atomic compare-exchange is tens of nanoseconds; a contended spinlock under high core counts can burn whole milliseconds spinning. Prefer wait/notify parking over busy-spinning for anything but the shortest critical sections, shard hot counters across cache lines, and keep critical sections tiny. The rule of thumb: reach for shared memory when the data is large and both threads need concurrent live access; otherwise a postMessage copy is simpler and fast enough.

Gotchas & failure modes

ReferenceError: SharedArrayBuffer is not defined. The page is not cross-origin isolated. Send Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp, reload, and confirm self.crossOriginIsolated === true. A single non-CORP subresource (an <img> from a CDN without Cross-Origin-Resource-Policy) silently breaks isolation for the whole document.

TypeError: Atomics.wait cannot be called in this context. You called Atomics.wait on the main thread. It is only permitted on Web Workers. Use Atomics.waitAsync (which returns a promise) on the main thread, or move the blocking logic into a worker.

RangeError: WebAssembly.Memory(): could not allocate memory or a shared memory that refuses to grow. Some engines historically required maximum === initial for shared memory, and growing a shared linear memory (memory.grow) is only valid up to maximum and never detaches the buffer — but views created before the grow do not see the new pages until you re-create them over the (same) SharedArrayBuffer. Size maximum for your worst case up front.

Data race / torn reads. Reading a multi-byte value with a plain i32[0] while another thread writes it is a race; mixed atomic/non-atomic access to the same location is undefined. Use Atomics.load/ Atomics.store for any location more than one thread touches.

Rust build fails with the wasm32-unknown-unknown target does not support atomics. You are on stable or forgot the target feature. Use nightly and pass RUSTFLAGS='-C target-feature=+atomics,+bulk-memory' plus -Z build-std.

Verification

Confirm isolation, shared memory, and the atomic opcodes are actually present:

// 1. Isolation is the precondition for everything else.
console.log("isolated:", self.crossOriginIsolated);

// 2. The memory really is shared.
const memory = new WebAssembly.Memory({ initial: 1, maximum: 1, shared: true });
console.log("shared:", memory.buffer instanceof SharedArrayBuffer); // true

Then inspect the compiled module to prove the atomic instructions made it through codegen:

wasm-objdump -d threaded.wasm | grep -E 'atomic|memory.atomic'
# expect lines like: i32.atomic.rmw.cmpxchg, memory.atomic.wait32, memory.atomic.notify
wasm-objdump -x threaded.wasm | grep -i 'shared'   # memory marked shared in the export section

If wasm-objdump shows no atomic ops, your target features did not take effect and the module is still single-threaded regardless of how many workers you start.

In this guide

Sharing memory between Wasm and Web Workers — create a shared memory, post it to a worker, and have both sides view it through one Int32Array.
Using Atomics for Wasm thread synchronization — build a spinlock and a wait/notify handoff, with the underlying i32.atomic.rmw.cmpxchg and memory.atomic.wait32 opcodes.

Frequently Asked Questions

Do I really need cross-origin isolation just to use WebAssembly? Only for threads. Single-threaded Wasm runs fine with an ordinary non-shared linear memory and needs no special headers. The COOP/COEP requirement applies specifically to SharedArrayBuffer, which threaded modules depend on.

Can I share a non-shared memory with a worker to save the headers? No. Posting a regular WebAssembly.Memory structured-clones (copies) its bytes; the worker gets a separate memory. Only a memory created with { shared: true } — backed by a SharedArrayBuffer — is actually shared, and that requires isolation.

Why must I use Int32Array with Atomics? The Atomics operations are defined over integer typed arrays — Int8Array through BigInt64Array. Atomics.wait and Atomics.notify specifically require an Int32Array (or BigInt64Array for wait), because wait/notify operate on 32-bit (or 64-bit) aligned slots. A Float64Array cannot be used with atomic ops at all.

Does growing a shared memory invalidate my views like non-shared memory does? Growing a shared linear memory does not detach the SharedArrayBuffer, so existing views are not detached — but they still only span the old length and will not see the new pages. Re-create your typed-array views after any memory.grow to address the full size. See why memory.grow invalidates pointers.

How many workers should I spawn? Start with navigator.hardwareConcurrency and tune down. More workers than logical cores adds context switching and contention without adding throughput; for latency-sensitive work, leave one core free for the main thread.

JS/Wasm Interop & Memory Management — the boundary and memory model these threads share.
wasm-bindgen deep dive — how wasm-bindgen-rayon drives a Rust worker pool over shared memory.
Configuring COOP/COEP headers for SharedArrayBuffer — the exact headers that enable cross-origin isolation.
Linear memory management & allocators — how the shared heap is allocated and why grows invalidate views.

← Back to JS/Wasm Interop & Memory Management