Migrating legacy C code to WebAssembly

This guide answers one concrete question: how do you take an existing C codebase that assumes a real operating system and get it compiling, linking, and running correctly as a WebAssembly module under Emscripten — without rewriting it from scratch? Legacy C code typically assumes unrestricted OS access, raw file descriptors, and a contiguous virtual address space. WebAssembly gives you none of those: a sandboxed runtime, a virtual filesystem, and a single bounds-checked linear memory buffer. The work is mechanical once you know which failure maps to which fix.

The migration breaks into three distinct problem classes, and it helps to keep them separate. Compile-time failures are about missing headers and unsupported architecture flags — the easiest to fix because the compiler points at the exact line. Link-time failures are about undefined symbols and archive ordering, where wasm-ld cannot resolve a reference to a syscall that has no Wasm implementation. Runtime failures are the subtle ones: a build that compiles and links cleanly but traps with memory access out of bounds, hangs the page on a blocking call, or silently corrupts its own heap. The procedure below walks the three classes in order, because fixing them out of order wastes time — there is no point chasing a runtime trap in code that has not linked yet.

Prerequisites

  • [ ] emsdk activated./emsdk activate latest && source ./emsdk_env.sh, with emcc --version ≥ 3.1.50 pinned in CI so behavior is reproducible.
  • [ ] The code builds natively first — confirm a clean gcc -Wall build before porting, so you are debugging Wasm-specific issues, not pre-existing bugs.
  • [ ] A list of the project’s syscalls — run nm or grep for fork, socket, mmap, pthread_create, and raw open/read so you know your surface area up front.
  • [ ] Node ≥ 16 to run the output headlessly while iterating.

Procedure

  1. Establish a strict baseline build. Compile the entry translation unit with -sSTRICT=1 so deprecated, non-standard extensions fail loudly instead of compiling to something subtly wrong.

    emcc legacy.c -sSTRICT=1 -sENVIRONMENT=web,worker -o baseline.mjs -sEXPORT_ES6=1
  2. Surface every undefined symbol at once. Force the linker to report all unresolved references so you can triage them as a batch rather than one rebuild at a time.

    emcc legacy.c -sERROR_ON_UNDEFINED_SYMBOLS=1 -o test.mjs 2>&1 | grep "undefined symbol"
  3. Isolate unsupported POSIX calls behind a compile guard. Wrap process, socket, and raw-fd usage so the Wasm build takes a stubbed or async-JS path while the native build is untouched.

    #ifdef __EMSCRIPTEN__
      #include <emscripten.h>
      /* route to an async JS shim or a no-op stub */
    #else
      #include <unistd.h>
    #endif
  4. Map file access onto the virtual filesystem. Legacy code that opens files needs those files present in Emscripten’s in-memory FS. Preload them at build time, or mount a persistent backend.

    emcc legacy.c -o app.mjs --preload-file assets/ -sEXPORT_ES6=1
    # or, at runtime, persist to IndexedDB:
    # Module.FS.mount(Module.IDBFS, {}, '/data');
  5. Replace any custom heap manager with Emscripten’s allocator. Allocators that call brk/sbrk directly or assume a fixed address space corrupt linear memory. Defer to the built-in dlmalloc (or emmalloc for size).

    emcc legacy.c -sMALLOC=dlmalloc -sINITIAL_MEMORY=67108864 -o app.mjs

    The reason this matters is that legacy allocators frequently assume sbrk hands back an ever-growing flat address space. Under WebAssembly the heap is a bounded region of one linear memory buffer, and a custom allocator that walks past it does not segfault into a clean crash — it scribbles over the stack or static data living in the same buffer. Routing allocation through Emscripten’s dlmalloc keeps every malloc/free inside the runtime’s managed region.

  6. Fix link-time ordering and emit a modern module. Place objects before the archives they consume, then produce ESM output for the front end. wasm-ld, like the GNU linker, resolves symbols left-to-right, so an archive listed before the object that needs it contributes nothing.

    emcc main.o -L./libs -llegacy -sMODULARIZE=1 -sEXPORT_ES6=1 -o app.mjs

Expected output and verification

A successful build emits app.mjs and app.wasm. Validate the binary and confirm the runtime starts without aborting:

wasm-validate app.wasm                 # exits 0 on a well-formed module
node -e "import('./app.mjs').then(f => f.default()).then(() => console.log('runtime ok'))"

Inspect the import section to confirm you have not pulled in an unexpected syscall:

wasm-objdump -x app.wasm | grep -i import

Seeing only the expected env/wasi_snapshot_preview1 imports means your shims caught everything; a stray fd_write or environment import you did not anticipate points to an unhandled code path.

For a deeper smoke test, run the module’s main computation against a known input and compare the output to the native build byte-for-byte. Differences usually trace to one of three sources: undefined behavior that the native compiler happened to tolerate but the Wasm back end did not, integer-size assumptions (long is 32-bit under wasm32, not 64-bit as on a typical 64-bit host), or endianness code that is now dead because WebAssembly is always little-endian. Catching these at verification time, with a reference output in hand, is far cheaper than diagnosing them later from a wrong result in the browser.

Gotchas

error: implicit declaration of function 'foo'. A POSIX header your code includes does not exist in the Emscripten sysroot, so the function is implicitly declared and then fails to link. Guard the include with #ifdef __EMSCRIPTEN__ and provide a stub, rather than forcing the header.

wasm-ld: error: undefined symbol: pthread_create. Threading is off by default. Either remove the threading path under the Emscripten guard, or opt in with -pthread -sPTHREAD_POOL_SIZE=4 — which additionally requires cross-origin isolation (COOP/COEP) in the browser to enable SharedArrayBuffer.

RuntimeError: memory access out of bounds at runtime. A pointer escaped its allocation, often a legacy allocator writing past the heap. Rebuild with -sSAFE_HEAP=1 -sASSERTIONS=2; it adds guard checks that report the exact offending access with a stack trace instead of corrupting memory silently.

Blocking I/O hangs the page. Synchronous sleep, fread, or recv cannot block the browser event loop. Wrap the call graph with Asyncify so the C code keeps its synchronous shape while the runtime yields to JavaScript:

emcc legacy.c -sASYNCIFY=1 -sASYNCIFY_IMPORTS='["js_sleep"]' -o app.mjs

undefined symbol: __cxa_throw or longjmp aborts. C++ exceptions and setjmp/longjmp are opt-in in modern Emscripten because the zero-cost default has neither. Enable them explicitly with -fexceptions (or -fwasm-exceptions for the newer, faster scheme on supporting runtimes) and -sSUPPORT_LONGJMP=emscripten. Each adds binary size and some overhead, so enable only the one your code actually uses:

emcc legacy.c -fexceptions -sSUPPORT_LONGJMP=emscripten -o app.mjs

Performance note

Asyncify is convenient but not free: it instruments the call stack to support unwinding and rewinding, which inflates the binary 30–50% and adds per-call overhead on every function in the async path. Scope -sASYNCIFY_IMPORTS to the narrowest possible set of functions so the transform only touches the call chains that actually suspend. For the dead-code and Binaryen size passes that claw some of that weight back, see reducing Wasm bundle size with wasm-opt.

The other latency to budget for is instantiation. A large module compiled to a multi-megabyte binary takes measurable time to download and compile; streaming instantiation overlaps the two by compiling the bytes as they arrive, which is why serving the .wasm with Content-Type: application/wasm and using WebAssembly.instantiateStreaming matters for a migrated codebase. Profile the cold start in the DevTools Performance panel and the peak working set via Module.HEAPU8.length; legacy code that front-loads a large static reservation often reserves far more linear memory than it ever touches, and trimming -sINITIAL_MEMORY to the real high-water mark both shrinks the reservation and speeds the first paint.

Frequently Asked Questions

Do I have to rewrite my custom allocator? Usually no — you replace it. Build with -sMALLOC=dlmalloc so calls to malloc/free route through Emscripten’s allocator instead of your sbrk-based one. Keep your allocator only if it provides domain-specific behavior, and then make it respect the linear memory model rather than assuming a flat address space.

How do I handle a fork()/exec() model? There is no process model in WebAssembly. Refactor the design to use Web Workers for parallelism (each worker is a separate module instance) and message passing instead of shared process state. There is no mechanical shim — fork is the one call you must restructure around.

Can I keep my Makefile? Yes. Wrap it with emmake make, which overrides CC/CXX to point at emcc/em++ and injects the Emscripten sysroot. For CMake projects, configure with emcmake cmake -B build instead. Avoid hardcoding -march, -mtune, or host-specific flags in the build file, since the Wasm back end rejects them — gate those behind a toolchain check or strip them in the Emscripten configuration.

Why does my ported code give the wrong answer instead of crashing? The usual culprit is undefined behavior the native compiler tolerated and the Wasm back end did not, or an integer-width assumption: long is 32-bit under the wasm32 target, not 64-bit as on most 64-bit hosts. Audit for code that stores pointers in long, or that relies on int and long being the same size, and switch to fixed-width types from <stdint.h>.

← Back to C/C++ to Wasm with Emscripten