Encoding Strings Across the Wasm Boundary
This guide answers one task precisely: how to send a string from JavaScript into a WebAssembly module and get a string back, byte for byte, without truncation, leaks, or traps.
A Wasm function signature carries only numbers, so a string never crosses the boundary as a string. It
crosses as a pointer and a length — two i32 values that locate UTF-8 bytes in the module’s
linear memory. JavaScript strings are UTF-16 internally and Rust strings are UTF-8, so every crossing
is also a transcode. Get the encoding, the byte count, and the freeing right and strings just work; get
any one wrong and you read garbage or leak memory.
The reason strings are the trickiest of the common payloads is that two independent things can go wrong
at once: the encoding (UTF-16 to UTF-8 and back) and the bookkeeping (allocating, copying, and
freeing the bytes in a heap the module owns). The good news is that both follow a fixed recipe. Once you
have walked the recipe a couple of times — and seen that wasm-bindgen runs the exact same steps under
the hood — you can write or debug any string boundary by hand, including the cases where the generated
glue does not give you the control you need.
Prerequisites
- [ ] A module exporting
alloc(size) -> ptr,dealloc(ptr, size), andmemory(the parent guide shows a minimal one) - [ ] Browser or Node 20+ with global
TextEncoderandTextDecoder - [ ]
wabtforwat2wasmandwasm-objdumpif you build the module by hand - [ ] Familiarity with the
(ptr, len)convention and typed-array views overmemory.buffer
JS → Wasm: send a string in
The inbound path is encode, allocate, copy, call, free — five steps that map one-to-one onto the manual ABI.
-
Encode the JS string to UTF-8 bytes.
TextEncoderalways emits UTF-8; the result is aUint8Arraywhose.lengthis the byte count you will pass aslen.const enc = new TextEncoder(); const bytes = enc.encode("café"); // 5 bytes: 'é' is two bytes in UTF-8 -
Allocate that many bytes in
linear memory. Call the module’s exported allocator; it returns a pointer into the heap it owns.const ptr = instance.exports.alloc(bytes.length); -
Copy the bytes to that offset. Build a
Uint8Arrayview over the currentmemory.bufferandset()the payload atptr. Always rebuild the view here — a precedingallocmay have grown and detached memory.new Uint8Array(instance.exports.memory.buffer, ptr, bytes.length).set(bytes); -
Call the function with
(ptr, len). The module reads exactlylenbytes starting atptr.const outLen = instance.exports.uppercase_ascii(ptr, bytes.length); -
Free the input. Whoever allocated frees. Call
dealloc(ptr, len)once the module has finished reading — wrap it infinallyso an exception cannot leak the buffer.instance.exports.dealloc(ptr, bytes.length);
Wasm → JS: read a string out
When the module returns a string it hands back a pointer and a length — usually via the multi-value
return (result i32 i32), or by writing both into an out-pointer. The host then decodes and frees.
-
Receive
(ptr, len). With multi-value, the call returns a two-element array.const [outPtr, outLen] = instance.exports.build_greeting(ptr, bytes.length); -
Decode the bytes to a JS string. Slice a view at
(outPtr, outLen)and run it throughTextDecoder. Use.slice()(a copy) before the nextalloc, or decode immediately, because the next allocation could detach the buffer.const out = new Uint8Array(instance.exports.memory.buffer, outPtr, outLen); const text = new TextDecoder().decode(out); // decode copies into a JS string -
Free the module’s output buffer. The string the module allocated for its result is now yours to release.
instance.exports.dealloc(outPtr, outLen);
What wasm-bindgen does
With wasm-bindgen you write the Rust and the glue is generated — but the glue performs exactly the
steps above. This function takes and returns an owned string:
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub fn build_greeting(name: &str) -> String {
format!("Hello, {name}!")
}
The emitted JavaScript calls an internal passStringToWasm helper that runs TextEncoder.encodeInto
straight into linear memory (avoiding a temporary array), passes (ptr, len), then reads the returned
(ptr, len) back with TextDecoder.decode and calls the generated __wbindgen_free. The
wasm-bindgen deep dive annotates that shim
line by line. The point of seeing the raw version first is that nothing magic happens: you can reproduce
wasm-bindgen’s string handling by hand when a hot path needs it.
Two details are worth lifting out of the generated code. First, wasm-bindgen uses encodeInto, not
encode followed by a set() — that writes the UTF-8 directly into the module heap in a single pass,
saving an intermediate Uint8Array allocation and one copy. When you hand-roll a hot string path it is
worth doing the same: enc.encodeInto(str, new Uint8Array(memory.buffer, ptr, capacity)) returns a
{ read, written } result so you know exactly how many bytes landed. Second, for a &str argument
the generated glue frees the input buffer for you immediately after the call, and for a returned
String it frees the module’s result buffer right after decoding — the same finally/decode-then-free
discipline shown above, just generated. Knowing this is what lets you reason about why a wasm-bindgen
call allocates and frees twice per string, and where a manual ABI could avoid one of those round trips.
Expected output
Running the round trip and logging the decoded result and a byte-length check:
> build_greeting("café")
input bytes: [0x63,0x61,0x66,0xc3,0xa9] // 5 bytes, JS .length was 4
output text: "Hello, café!"
output bytes: 13 // not 12 — 'é' is still 2 bytes
Confirm any static strings compiled into the module land where you expect with a data-section dump:
wasm-objdump -s -j data greeting.wasm
# Data[0]: ... 48 65 6c 6c 6f 2c 20 "Hello, "
Gotchas
-
UTF-8 vs UTF-16 length mismatch.
"café".lengthis 4 (UTF-16 code units) but its UTF-8 encoding is 5 bytes. Passing the JS.lengthaslentruncates the last byte and corrupts the trailing character — or, for emoji and CJK, drops whole code points. Fix: always passnew TextEncoder().encode(s).length, nevers.length. -
Assuming null termination. The
(ptr, len)convention is length-prefixed, not NUL-terminated. A module that callsstrlenon your buffer will read pastlenuntil it finds a zero byte — straight into adjacent allocations, or off the end of memory:RuntimeError: memory access out of bounds. Fix: either always pass an explicit length, or, if the module truly needs a C string, allocatelen + 1bytes and write a trailing0. -
Forgetting to free → leak. Each
allocfor an input string, and each result buffer the module returns, must be freed. Skip it andmemory.buffer.byteLengthclimbs on every call. Fix: free the input infinally; free the output right after decoding. -
Stale view after a grow. Building the
Uint8Arraybefore theallocthat grows memory leaves you writing into a detached, zero-length buffer — the bytes silently vanish. Fix: construct the view frommemory.bufferafter everyalloc, as the steps above do.
Performance note
The transcode cost scales with byte count, not call count: TextEncoder.encode and TextDecoder.decode
are roughly linear at ~1–3 GB/s in modern engines, so a 1 KB string costs under a microsecond while a
1 MB string costs a few hundred. The fixed per-call overhead (the alloc/free round trip and the Wasm
call itself) is a few hundred nanoseconds. The takeaway: many tiny string calls are dominated by fixed
overhead — batch them — while a few large ones are dominated by the linear transcode, which only
zero-copy layouts (rarely possible for strings, since the encodings differ) can avoid.
A concrete example makes the batching point sharp. Suppose you need to uppercase 10,000 short labels.
Calling the module once per label pays the ~300 ns fixed overhead 10,000 times — about 3 ms of pure
overhead before any work happens. Concatenating the labels with a separator, sending them as one buffer,
and splitting the result on the JavaScript side pays that overhead once and lets the linear transcode
dominate, where it belongs. The same logic applies in reverse: if a module produces many small strings,
have it write them into one contiguous buffer with a length table rather than returning each through its
own call. When you cannot batch — interactive, one-string-at-a-time work — prefer encodeInto over
encode to shave the intermediate allocation, and reuse a single scratch buffer across calls so the
allocator is not churning. None of these tricks change the asymptotics, but for string-heavy boundaries
they are routinely the difference between marshaling being invisible and marshaling being the profile.
Frequently Asked Questions
Can I avoid the copy entirely for strings?
Almost never, because JavaScript stores UTF-16 and Wasm wants UTF-8 — the transcode itself is a copy.
TextEncoder.encodeInto lets you encode directly into linear memory (one pass instead of two), which
is the closest you get. Truly zero-copy hand-offs are for byte buffers that need no transcode.
Why does the decoded string sometimes have a replacement character (�)?
You decoded a byte range that does not start and end on UTF-8 code-point boundaries — usually a wrong
len, or slicing a multibyte character in half. Confirm len is the exact encoded byte count the module
wrote, and that you are not truncating the view.
Do I free the input buffer before or after reading the output?
After. The module may still reference the input while producing the output. Free the input once the call
returns, then decode and free the output — and never reuse a pointer after dealloc.
Related
- Returning structs from wasm to javascript — the same (ptr,len) out-pattern for non-string data.
- wasm-bindgen deep dive — the generated glue that automates this string handling.
- Reading linear memory with typed arrays — building the views that copy these bytes.