SharedArrayBuffer, Atomics & Threading
WebAssembly has no threads of its own. A .wasm module is just code; the host supplies the threads,
and on the web that means Web Workers. To make several workers cooperate on the same data you give them
one linear memory whose backing store is a SharedArrayBuffer instead of a private ArrayBuffer,
instantiate the same module in every worker against that one memory, and coordinate access with the
Atomics operations the threads proposal adds to the instruction set. This guide covers the whole stack:
why threads need shared memory, the cross-origin isolation that browsers require before they will even
define SharedArrayBuffer, the atomic wait/notify protocol, building a worker pool, and how Rust
(wasm-bindgen-rayon) and C/C++ (Emscripten -pthread) reach all of it from a normal source tree.
Prerequisites
- [ ] A document served cross-origin isolated —
Cross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corp. Verify withself.crossOriginIsolated === true. - [ ] A modern browser (Chrome 92+, Firefox 95+, Safari 15.2+) where
SharedArrayBufferand Wasm threads ship. - [ ] Rust nightly (
rustup toolchain install nightly) plusrust-srcfor building threaded Rust:rustup component add rust-src --toolchain nightly. - [ ]
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals'for atomic codegen. - [ ] Emscripten 3.x (
emcc -v) if you compile C/C++ with-pthread. - [ ] A local server that can send custom headers (the COOP/COEP pair) over
httpsorlocalhost.
How shared memory and atomics fit together
A non-shared WebAssembly.Memory is a single ArrayBuffer owned by one instance. The moment you pass
it to a worker with postMessage, the structured-clone algorithm copies the bytes — two workers end
up with two disconnected memories. A shared memory is different: it is created with { shared: true },
backed by a SharedArrayBuffer, and when you postMessage that buffer the structured clone shares the
backing store rather than copying it. Every worker that instantiates the module with that memory as a
linear memory import now reads and writes the exact same bytes.
Sharing bytes is necessary but not sufficient — two threads writing the same address is a data race.
The threads proposal answers this with atomic instructions (i32.atomic.rmw.add,
i32.atomic.rmw.cmpxchg, memory.atomic.wait32, memory.atomic.notify, and friends) and a
sequentially-consistent memory ordering for them. JavaScript exposes the same primitives through the
Atomics object: Atomics.add, Atomics.compareExchange, Atomics.wait, and Atomics.notify. A
worker that needs to block until another thread is done calls Atomics.wait on an Int32Array index;
the producer calls Atomics.notify on that same index to wake it. This wait/notify pair is the
foundation of every higher-level construct — mutexes, condition variables, and the futex-style
parking that pthreads and Rust’s std::sync build on.
The key mental shift from single-threaded Wasm is that the bytes in linear memory are no longer owned
by one instance. They are a shared resource, and like any shared mutable resource they need a discipline:
a rule for who may write which region when. Atomics give you the low-level enforcement primitives, but
the protocol — which slot is a lock, which is a counter, which is a payload, and in what order they are
read and written — is yours to design. Most threading bugs in Wasm are not exotic memory-ordering puzzles;
they are protocol bugs where two threads disagree about whose turn it is to touch a region. Keep the
protocol explicit and small, and the atomics fall into place around it.
Memory ordering matters here in a concrete way. Wasm atomics are sequentially consistent, which means an atomic store that another thread observes also publishes every ordinary (non-atomic) write that preceded it in program order. That is what lets you write a payload with plain stores and then “release” it with a single atomic flag store: a reader that sees the flag is guaranteed to see the payload. Break that pairing — read the payload without first checking the atomic flag — and you have a classic torn read, where one thread sees half-written data. The whole point of the flag is to be the synchronization edge.
Cross-origin isolation: the gate you hit first
Before any of this works, the browser must have decided your page is allowed to hold a
SharedArrayBuffer. After the Spectre disclosures, vendors removed SharedArrayBuffer from
non-isolated contexts because it can be turned into a high-resolution timer that powers side-channel
attacks. To get it back, your top-level document must opt into cross-origin isolation by sending
two response headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
COOP: same-origin severs the page from cross-origin windows that could share its browsing context
group; COEP: require-corp forces every subresource (images, scripts, fonts) to explicitly opt into
being embedded via Cross-Origin-Resource-Policy or CORS. When both hold, the browser sets
self.crossOriginIsolated to true and re-enables SharedArrayBuffer plus high-resolution
performance.now(). Getting the exact header values and the dev-server config right — including the
Cross-Origin-Resource-Policy: cross-origin you need on third-party assets — is involved enough that it
has its own walkthrough in
configuring COOP/COEP headers for SharedArrayBuffer.
The single most common failure in threaded Wasm is forgetting this step: your code throws
ReferenceError: SharedArrayBuffer is not defined (or WebAssembly.Memory rejects { shared: true })
and it is tempting to blame the toolchain. It is almost always a missing header.
Isolation is binary and document-wide. There is no “isolate just this worker” — the entire top-level
document either qualifies or it does not, and a single embedded subresource that does not opt in (an
<img> from a CDN with no Cross-Origin-Resource-Policy, an analytics script without CORS) silently
flips crossOriginIsolated back to false for everything. In practice the most painful part of shipping
threaded Wasm is not the threading code at all; it is auditing every third-party asset on the page so that
COEP’s require-corp does not reject it. When isolation is failing intermittently, suspect a lazily
loaded ad, font, or image rather than your worker setup. The companion guide on headers covers the
credentialless COEP variant, which relaxes the requirement for some cross-origin resources, and the
report-only modes that let you find offenders before enforcing.
Step-by-step: from a non-shared module to a worker pool
The mechanics are the same whether you write WAT by hand, compile Rust, or compile C — only the build command changes. Here is the end-to-end flow.
-
Serve the page cross-origin isolated. Configure your server (or dev server) to emit the COOP/COEP pair, then confirm in the console:
console.log(self.crossOriginIsolated); // must be true before you continue -
Build the module with atomics enabled. For Rust, target the nightly toolchain with the atomics and bulk-memory features and rebuild
stdso its synchronization primitives use atomic ops:RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' \ rustup run nightly \ wasm-pack build --target web -- -Z build-std=panic_abort,stdFor C/C++ with Emscripten, pass
-pthreadand let it provision a worker pool:emcc threaded.c -pthread -s PTHREAD_POOL_SIZE=4 -s PROXY_TO_PTHREAD \ -o threaded.js -
Create one shared memory on whichever thread owns the lifecycle (usually the main thread):
const memory = new WebAssembly.Memory({ initial: 256, maximum: 256, shared: true }); -
Hand the memory to each worker through
postMessage. The structured clone shares theSharedArrayBuffer, so no bytes are copied:const worker = new Worker("./worker.js", { type: "module" }); worker.postMessage({ memory, module: compiledModule }); -
Instantiate the same module in every worker with that memory as an import, so all instances point at one
linear memory. (Detailed in sharing memory between Wasm and Web Workers.) -
Coordinate with Atomics. Use
Atomics.wait/Atomics.notifyto park and wake worker threads and atomic read-modify-write ops to mutate shared counters and locks without races (the spinlock and futex handoff patterns live in using Atomics for Wasm thread synchronization).
A concrete shared-memory + Atomics example
This is the smallest end-to-end example that demonstrates two real threads observing one memory and synchronizing through an atomic flag. The main thread creates the shared memory, posts it, writes a value, and notifies; the worker waits on the flag and reads the value the moment it is woken.
// main.js — runs on the main thread of a cross-origin-isolated page
if (!self.crossOriginIsolated) {
throw new Error("Not cross-origin isolated — SharedArrayBuffer is unavailable.");
}
// 256 pages = 16 MiB; maximum must equal initial for a non-growable shared memory in some engines.
const memory = new WebAssembly.Memory({ initial: 256, maximum: 256, shared: true });
const flags = new Int32Array(memory.buffer); // Int32Array is required by Atomics
const worker = new Worker("./worker.js", { type: "module" });
worker.postMessage({ memory });
// Give the worker a moment to start waiting, then publish data and wake it.
setTimeout(() => {
Atomics.store(flags, 1, 0xC0FFEE); // payload at index 1
Atomics.store(flags, 0, 1); // mark "ready" at index 0
Atomics.notify(flags, 0, 1); // wake exactly one waiter on index 0
}, 50);
// worker.js — runs on a Web Worker; Atomics.wait is allowed here (not on the main thread)
self.onmessage = ({ data: { memory } }) => {
const flags = new Int32Array(memory.buffer);
// Block until index 0 changes away from 0. Returns "ok", "not-equal", or "timed-out".
const result = Atomics.wait(flags, 0, 0);
const payload = Atomics.load(flags, 1);
// We see the exact bytes the main thread wrote — same SharedArrayBuffer, no copy.
console.log("worker woke:", result, "payload:", payload.toString(16)); // -> ok c0ffee
};
In a real Wasm program both sides would also instantiate the module against memory and call exported
functions; the JavaScript above is the synchronization skeleton those functions run inside. Note the
asymmetry: Atomics.wait is only legal on a worker thread — calling it on the main thread throws a
TypeError because blocking the UI thread is forbidden. The main thread polls with
Atomics.waitAsync instead, or simply never waits.
The setTimeout in main.js is a deliberate simplification for the example — it gives the worker time to
reach its Atomics.wait before the main thread notifies. In production you would never rely on a timer
for ordering: the wait/notify protocol is robust precisely because Atomics.wait only sleeps while the
slot still holds the expected value. If the main thread stores and notifies before the worker reaches
wait, the worker’s Atomics.wait(flags, 0, 0) sees the slot is no longer 0, returns "not-equal"
immediately, and the worker reads the payload anyway. That is the correct behavior, and it is why a
well-written waiter always checks the return value and re-reads state rather than assuming “I was woken,
therefore the data is ready.” The timer is just to make the demo’s console output deterministic.
How Rust and C/C++ reach Wasm threads
Rust does not expose Atomics directly; you write ordinary std::sync and std::thread-style code
and let the toolchain lower it to atomic Wasm ops. The practical path for browsers is
wasm-bindgen-rayon: you build with the
+atomics,+bulk-memory target features against a rebuilt std (-Z build-std), and the
wasm-bindgen-rayon runtime spins up a pool of Web Workers, each instantiating your module against one
shared memory. A rayon::par_iter() then fans work across those workers transparently. The build line
from step 2 plus a one-time initThreadPool(navigator.hardwareConcurrency) call at startup is the whole
integration. This rides directly on the standard
Rust-to-Wasm compilation toolchain;
threading is an additive set of flags, not a different compiler.
C/C++ via Emscripten is more turnkey because Emscripten ships a full pthreads implementation that maps
pthread_create onto Web Workers and pthread_mutex_t onto atomic spinlocks plus futex waits. The two
flags that matter:
-pthreadenables shared memory and the pthreads runtime; combine with-s PTHREAD_POOL_SIZE=Nto pre-spawnNworkers sopthread_createdoes not pay worker-startup latency at runtime.-s PROXY_TO_PTHREADmovesmain()itself onto a worker so it can block (and callpthread_join) without freezing the browser’s main thread.
Both toolchains ultimately emit a module whose data and bss live in a shared linear memory, and
both rely on the same COOP/COEP gate.
There is a subtlety worth flagging for both languages: thread-local storage. When several instances share
one linear memory, each still needs its own stack and its own TLS region, because a function-local
variable on thread A must not collide with the same variable on thread B. The toolchains handle this by
giving every worker a distinct stack pointer and a distinct TLS base within the shared memory — Emscripten
carves a per-thread stack out of the heap, and the Rust/wasm-bindgen-rayon runtime does the equivalent
when it spins up the pool. You rarely touch this directly, but it explains why a threaded module’s
maximum memory must be sized for all threads’ stacks at once, not just the heap. Under-provisioning
maximum is a common cause of mysterious stack-overflow traps that only appear once the thread count goes
up. Budget roughly your per-thread stack size times the worker count on top of your heap working set.
Building a worker pool
Spawning a fresh Worker per task is wasteful — worker startup and module instantiation cost milliseconds.
A pool creates the workers once, instantiates the module against the shared memory in each, and then feeds
them tasks over the lifetime of the page. The pattern is a fixed array of workers plus a queue:
class WasmWorkerPool {
constructor(size, module, memory) {
this.idle = [];
this.queue = [];
this.workers = Array.from({ length: size }, () => {
const w = new Worker("./pool-worker.js", { type: "module" });
w.postMessage({ module, memory }); // instantiate once, share one memory
w.onmessage = (e) => this.#onDone(w, e.data);
this.idle.push(w);
return w;
});
}
run(task) {
return new Promise((resolve) => {
this.queue.push({ task, resolve });
this.#pump();
});
}
#pump() {
while (this.idle.length && this.queue.length) {
const w = this.idle.pop();
const job = this.queue.shift();
w._resolve = job.resolve;
w.postMessage({ task: job.task });
}
}
#onDone(w, result) {
const resolve = w._resolve;
this.idle.push(w);
resolve(result);
this.#pump();
}
}
Each worker holds a live instance bound to the shared memory, so dispatching a task is a single
postMessage with the task descriptor — never the data, which already lives in the shared buffer. This is
exactly the model wasm-bindgen-rayon and Emscripten’s PTHREAD_POOL_SIZE implement for you; rolling your
own is worthwhile when you want explicit control over scheduling or backpressure. Size the pool to
navigator.hardwareConcurrency and feed it coarse-grained tasks so the per-dispatch overhead stays
negligible against the work each task does.
Tradeoffs: shared memory vs postMessage copies
Shared memory is not always the right answer. The decision is fundamentally about payload size and access pattern.
- postMessage copy (structured clone or transfer). Simple, no headers required, no data-race risk —
each side owns its bytes. But a structured clone is
O(payload): cloning a 64 MiB buffer costs real memcpy time and a GC allocation. Transferables (ArrayBuffervia the transfer list) move ownership inO(1)but then the sender loses access, which is wrong when both sides must keep reading. - Shared memory. Posting a
SharedArrayBufferisO(1)regardless of size and both sides retain live access — ideal for large, long-lived working sets (image pipelines, simulation grids, audio ring buffers). The cost is correctness: you now own synchronization, and every uncoordinated concurrent write is undefined behavior at the JS-visible level.
Contention is the other tradeoff. An uncontended atomic compare-exchange is tens of nanoseconds; a
contended spinlock under high core counts can burn whole milliseconds spinning. Prefer wait/notify
parking over busy-spinning for anything but the shortest critical sections, shard hot counters across
cache lines, and keep critical sections tiny. The rule of thumb: reach for shared memory when the data
is large and both threads need concurrent live access; otherwise a postMessage copy is simpler and
fast enough.
Gotchas & failure modes
ReferenceError: SharedArrayBuffer is not defined.
The page is not cross-origin isolated. Send Cross-Origin-Opener-Policy: same-origin and
Cross-Origin-Embedder-Policy: require-corp, reload, and confirm self.crossOriginIsolated === true.
A single non-CORP subresource (an <img> from a CDN without Cross-Origin-Resource-Policy) silently
breaks isolation for the whole document.
TypeError: Atomics.wait cannot be called in this context.
You called Atomics.wait on the main thread. It is only permitted on Web Workers. Use
Atomics.waitAsync (which returns a promise) on the main thread, or move the blocking logic into a
worker.
RangeError: WebAssembly.Memory(): could not allocate memory or a shared memory that refuses to
grow. Some engines historically required maximum === initial for shared memory, and growing a shared
linear memory (memory.grow) is only valid up to maximum and never detaches the buffer — but views
created before the grow do not see the new pages until you re-create them over the (same)
SharedArrayBuffer. Size maximum for your worst case up front.
Data race / torn reads. Reading a multi-byte value with a plain i32[0] while another thread writes
it is a race; mixed atomic/non-atomic access to the same location is undefined. Use Atomics.load/
Atomics.store for any location more than one thread touches.
Rust build fails with the wasm32-unknown-unknown target does not support atomics.
You are on stable or forgot the target feature. Use nightly and pass
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory' plus -Z build-std.
Verification
Confirm isolation, shared memory, and the atomic opcodes are actually present:
// 1. Isolation is the precondition for everything else.
console.log("isolated:", self.crossOriginIsolated);
// 2. The memory really is shared.
const memory = new WebAssembly.Memory({ initial: 1, maximum: 1, shared: true });
console.log("shared:", memory.buffer instanceof SharedArrayBuffer); // true
Then inspect the compiled module to prove the atomic instructions made it through codegen:
wasm-objdump -d threaded.wasm | grep -E 'atomic|memory.atomic'
# expect lines like: i32.atomic.rmw.cmpxchg, memory.atomic.wait32, memory.atomic.notify
wasm-objdump -x threaded.wasm | grep -i 'shared' # memory marked shared in the export section
If wasm-objdump shows no atomic ops, your target features did not take effect and the module is still
single-threaded regardless of how many workers you start.
In this guide
- Sharing memory between Wasm and Web Workers —
create a shared memory, post it to a worker, and have both sides view it through one
Int32Array. - Using Atomics for Wasm thread synchronization —
build a spinlock and a wait/notify handoff, with the underlying
i32.atomic.rmw.cmpxchgandmemory.atomic.wait32opcodes.
Frequently Asked Questions
Do I really need cross-origin isolation just to use WebAssembly?
Only for threads. Single-threaded Wasm runs fine with an ordinary non-shared linear memory and needs no
special headers. The COOP/COEP requirement applies specifically to SharedArrayBuffer, which threaded
modules depend on.
Can I share a non-shared memory with a worker to save the headers?
No. Posting a regular WebAssembly.Memory structured-clones (copies) its bytes; the worker gets a
separate memory. Only a memory created with { shared: true } — backed by a SharedArrayBuffer — is
actually shared, and that requires isolation.
Why must I use Int32Array with Atomics?
The Atomics operations are defined over integer typed arrays — Int8Array through BigInt64Array.
Atomics.wait and Atomics.notify specifically require an Int32Array (or BigInt64Array for
wait), because wait/notify operate on 32-bit (or 64-bit) aligned slots. A Float64Array cannot be
used with atomic ops at all.
Does growing a shared memory invalidate my views like non-shared memory does?
Growing a shared linear memory does not detach the SharedArrayBuffer, so existing views are not
detached — but they still only span the old length and will not see the new pages. Re-create your
typed-array views after any memory.grow to address the full size. See
why memory.grow invalidates pointers.
How many workers should I spawn?
Start with navigator.hardwareConcurrency and tune down. More workers than logical cores adds context
switching and contention without adding throughput; for latency-sensitive work, leave one core free for
the main thread.
Related
- JS/Wasm Interop & Memory Management — the boundary and memory model these threads share.
- wasm-bindgen deep dive — how
wasm-bindgen-rayondrives a Rust worker pool over shared memory. - Configuring COOP/COEP headers for SharedArrayBuffer — the exact headers that enable cross-origin isolation.
- Linear memory management & allocators — how the shared heap is allocated and why grows invalidate views.
← Back to JS/Wasm Interop & Memory Management