Using Atomics for Wasm Thread Synchronization

This guide shows how to synchronize WebAssembly threads with Atomics over a shared Int32Array — first a compare-exchange spinlock, then a wait/notify handoff that parks a thread instead of burning a core — and maps each JavaScript call onto the underlying Wasm atomic opcode it mirrors.

Once two threads share one linear memory, every uncoordinated write to a location both touch is a data race with undefined, JS-visible results: torn reads, lost updates, stale values. Atomics is the toolkit that makes shared access correct. It gives you indivisible read-modify-write operations and a blocking wait/notify protocol, and because Wasm atomics are sequentially consistent, an atomic store also publishes the ordinary writes that came before it. The constructs below — a lock and a handoff — are the two you reach for most, and everything heavier (queues, barriers, semaphores) is built from the same primitives.

Prerequisites

  • [ ] A page served cross-origin isolated (self.crossOriginIsolated === true); see configuring COOP/COEP headers.
  • [ ] A shared WebAssembly.Memory already posted to your workers; if not, start with sharing memory between Wasm and Web Workers.
  • [ ] An Int32Array view over the shared buffer — Atomics.wait/notify require 32-bit slots.
  • [ ] At least one Web Worker, because Atomics.wait is illegal on the main thread.

The atomic toolbox and its Wasm opcodes

Every Atomics method in JavaScript lowers to a single Wasm instruction from the threads proposal. The ones this guide uses:

  • Atomics.compareExchange(arr, i, expected, replacement)i32.atomic.rmw.cmpxchg — atomically swap if the current value equals expected, returning the prior value.
  • Atomics.add(arr, i, v)i32.atomic.rmw.add — atomic fetch-and-add.
  • Atomics.wait(arr, i, expected)memory.atomic.wait32 — block while arr[i] === expected.
  • Atomics.notify(arr, i, count)memory.atomic.notify — wake up to count waiters on i.

Step 1 — a compare-exchange spinlock

A spinlock is the simplest mutual-exclusion primitive: a single i32 slot that is 0 when free and 1 when held. Acquiring it is one atomic compare-exchange in a loop; releasing it is one atomic store. The compare-exchange is what makes it correct under contention — it reads the slot, checks it against the expected 0, and writes 1 as one indivisible operation, so two threads racing to acquire cannot both see 0 and both win. Exactly one compare-exchange returns the old value 0 (it got the lock); every other contender sees 1 and loops.

// lock state lives at index `slot` of a shared Int32Array
function lock(i32, slot) {
  // Spin until we flip the slot from 0 (free) to 1 (held).
  while (Atomics.compareExchange(i32, slot, 0, 1) !== 0) {
    // someone else holds it; keep retrying
  }
}

function unlock(i32, slot) {
  Atomics.store(i32, slot, 0); // release
}

The same logic in WAT is a loop around i32.atomic.rmw.cmpxchg:

(func $lock (param $addr i32)
  (block $acquired
    (loop $spin
      ;; cmpxchg(addr, expected=0, replacement=1) -> old value
      (br_if $acquired
        (i32.eqz
          (i32.atomic.rmw.cmpxchg (local.get $addr) (i32.const 0) (i32.const 1))))
      (br $spin))))

Step 2 — stop busy-spinning with wait/notify

A pure spinlock wastes a CPU core while it waits. Under low contention that is fine — spinning for a few hundred nanoseconds is cheaper than parking and waking. But if the lock can be held for microseconds or longer, every waiting thread burns a full core doing nothing, starving the rest of the system. The fix is to park: replace the spin body with Atomics.wait, which puts the thread to sleep in the kernel until the holder calls Atomics.notify. This is the futex (fast userspace mutex) pattern pthreads and Rust’s std::sync::Mutex use under the hood — fast, uncontended path in userspace; slow, parking path only when there is real contention.

function lockParking(i32, slot) {
  // Fast path: try once.
  if (Atomics.compareExchange(i32, slot, 0, 1) === 0) return;
  // Slow path: mark contended (2) and sleep until woken.
  while (Atomics.exchange(i32, slot, 2) !== 0) {
    Atomics.wait(i32, slot, 2);          // memory.atomic.wait32 — sleeps while slot === 2
  }
}

function unlockParking(i32, slot) {
  if (Atomics.exchange(i32, slot, 0) === 2) {
    Atomics.notify(i32, slot, 1);        // memory.atomic.notify — wake one waiter
  }
}

Step 3 — a one-shot handoff

Not every synchronization need is a lock. A very common shape is a one-shot handoff: a producer thread prepares data and a consumer thread must not look at it until the producer says it is ready. You do not need mutual exclusion for this — only an ordering edge. One flag slot plus wait/notify is enough, and it is strictly cheaper than a full mutex because there is no contention to arbitrate, just a single publish-then-signal.

// consumer (a worker): block until the flag at index 0 leaves its initial 0 state
const result = Atomics.wait(i32, 0, 0); // "ok" once woken, "not-equal" if already changed
const payload = Atomics.load(i32, 1);

// producer (any thread): publish, then wake the consumer
Atomics.store(i32, 1, 0xBEEF);
Atomics.store(i32, 0, 1);
Atomics.notify(i32, 0, 1);

Expected output

Running the handoff with the consumer in a worker and the producer on the main thread:

consumer woke: ok
payload: beef

Atomics.wait returns "ok" when it was actually parked and then notified, "not-equal" if the slot had already changed before it slept (the producer won the race), and "timed-out" if you passed a timeout that elapsed. All three are correct outcomes your loop should handle.

The "not-equal" case is the one people forget, and forgetting it is what produces lost wakeups. The check is built into Atomics.wait precisely to close the race window: wait atomically compares the slot to the expected value and only sleeps if they still match. So if the producer’s store landed in the gap between the consumer deciding to wait and actually sleeping, wait sees the mismatch and returns immediately instead of sleeping forever. This is why you must always re-load the real state after wait returns and never treat “wait returned” as a synonym for “I was notified.” Treat wait as “sleep if nothing has happened yet,” loop, and re-check.

Gotchas

TypeError: Atomics.wait cannot be called in this context. You called Atomics.wait on the main thread, where blocking is forbidden. Use it only inside a Web Worker; on the main thread use the promise-based Atomics.waitAsync instead.

The waiter never wakes. Atomics.wait only blocks while the slot still equals the expected value at the moment it is called. If the producer stored the new value before the consumer reached wait, the call returns "not-equal" immediately — which is correct, but your code must check the return and not assume it was woken. Always re-load the value after wait rather than trusting that being woken means the data is ready.

Busy-wait pins a core. A bare compareExchange spin loop consumes 100% of one core under contention. Spin only for sub-microsecond critical sections; for anything longer, park with Atomics.wait.

ABA hazard. A compare-exchange that sees the original value again cannot tell that it changed and changed back in between. For a simple held/free lock this is harmless, but for lock-free stacks or freelists, pair the pointer with a version counter (a tagged 64-bit slot via BigInt64Array) so a reused value still differs.

Notify before wait is lost. Atomics.notify only wakes threads already parked on that index; a notify that arrives before any waiter is simply dropped. The value-check in Atomics.wait is what prevents the resulting lost-wakeup — keep the state change and the notify together, and always re-check state.

Spurious-looking wakeups from notify(count). Atomics.notify(arr, i, count) wakes up to count waiters, and several waiters can wake from one notify. Each woken thread must re-evaluate the shared state and decide independently whether it actually gets to proceed; never assume “I was notified, therefore the resource is mine.” This is the same loop-and-recheck discipline that defeats lost wakeups, applied to the wake side.

Performance note

An uncontended atomic compare-exchange resolves in tens of nanoseconds — it is a single cache-line operation with no kernel involvement. A memory.atomic.wait32 that actually parks, by contrast, crosses into the OS futex machinery and costs on the order of a microsecond to sleep and wake. The lesson: take the lock-free fast path (compareExchange once) whenever possible and fall back to wait/notify only on real contention, exactly as lockParking above does.

Frequently Asked Questions

Why use the value 2 for “contended” in the parking lock? The three-state lock (0 = free, 1 = held-uncontended, 2 = held-contended) lets unlock skip the Atomics.notify syscall entirely when the lock was uncontended — a notify is only needed if a waiter might be parked, which the 2 state records. This is the classic futex mutex optimization.

Can I call Atomics.notify from the main thread? Yes. Atomics.notify is permitted everywhere — only Atomics.wait is restricted to workers. The main thread can freely wake parked workers.

Do atomic operations guarantee memory ordering across threads? Yes. Wasm atomic accesses are sequentially consistent, so an atomic store made visible to another thread also publishes the non-atomic writes that preceded it in program order — which is why the consumer can safely read the payload after observing the flag.

← Back to SharedArrayBuffer, Atomics & Threading