Inside Liftoff: Streaming Compilation, Single-Pass Code Gen, and How WebAssembly Outruns JavaScript’s Startup

If you’ve ever watched a large WebAssembly module load in the browser, you’ve seen what looks like a magic trick: a few tens of megabytes of binary arrive over the network, and almost instantly the module runs. The magic has a name—Liftoff, V8’s baseline compiler for WebAssembly—and it’s built on a trio of ideas that work together: streaming compilation, single-pass code generation, and a carefully engineered interface with JavaScript’s heap. In this deep-dive, I’ll walk through how Liftoff works, why it makes WebAssembly a credible alternative to JavaScript for performance-critical modules, and what happens under the hood when your browser turns raw bytes into native code before the download even finishes.

WebAssembly modules can be huge—game engines, scientific computation libraries, image processing pipelines. The traditional approach was to wait for the entire binary to buffer into an ArrayBuffer, then parse and compile it synchronously. That meant a long, blocking pause before a single instruction could execute. The streaming compilation API changed everything. By using `WebAssembly.instantiateStreaming()` or `WebAssembly.compileStreaming()`, the engine can start compiling as soon as the first bytes arrive, overlapping network transfer and compilation. This isn’t just pipelining in the usual sense; it requires a compiler that can work on chunks of raw bytecode without seeing the whole module first. Liftoff was designed from the ground up to do exactly that.

The heart of Liftoff is single-pass code generation. Unlike optimizing compilers such as TurboFan, which build a graph-based intermediate representation and perform multiple analysis passes, Liftoff emits machine code immediately for each instruction it decodes. There’s no IR, no global optimization phase. The decoder calls back into Liftoff as it validates and parses the WebAssembly bytecode, and Liftoff responds by generating native instructions on the spot. This callback-driven approach means the compiler keeps pace with the decoder, which itself runs as chunks come off the wire. The result: compilation throughput north of tens of megabytes per second, bounded primarily by CPU speed, not by any staging delay.

For this single-pass scheme to work, the compiler must know where values live—in a register, on the virtual stack, or in memory—at the point it encounters each instruction. WebAssembly’s structured control flow (no `goto`; only blocks, loops, and if-trees) makes this tractable. Liftoff maintains a virtual operand stack that mirrors the WebAssembly specification’s stack semantics, but at compile time. Because the stack state is statically decidable, the compiler can map each value to a concrete register or stack slot without dynamic resolution. It’s a bit like a just-in-time register allocator that runs once per function body, guided entirely by the instruction stream and the assumption that the code flow will be as structured as the bytecode insists. This yields correct, though not yet optimized, machine code. The performance is adequate—functions run at maybe 30–50% of peak—but the real win is in startup latency.

But WebAssembly code doesn’t live in a vacuum; it needs memory to do useful work. The interface to JavaScript’s heap is a central design point. WebAssembly linear memory is represented by a `WebAssembly.Memory` JavaScript object, which internally holds a growable `ArrayBuffer`. This buffer is the actual store of bytes that the WebAssembly module reads and writes. When you call `new WebAssembly.Memory({ initial: 256, maximum: 512 })`, you get back an object whose `buffer` property is an `ArrayBuffer` of the initial size. WebAssembly code sees a flat address space from 0 to `memory.size`, and the engine maps that directly to the underlying `ArrayBuffer`. This design means JavaScript and WebAssembly can share the same physical memory with zero-copy overhead: JS can write into a `TypedArray` view of the buffer, and the Wasm module sees the change instantly.

There’s a subtlety, though, around memory growth. When `memory.grow` is called, the `ArrayBuffer` backing the linear memory gets detached and a new, larger buffer is allocated. Any existing `TypedArray` views into the old buffer become unusable because the backing store is detached. The engine must ensure that all compiled code references the new base address. Both Liftoff and TurboFan emit code that accesses memory through an indirection—typically a “memory base” register that the runtime updates on growth. This way, even though Liftoff’s code is generated quickly without complex global modifications, memory operations stay safe and correct across resizes. The detached-buffer dance is handled transparently by the embedding, but it’s a critical part of what makes the heap interface robust.

Liftoff isn’t alone. V8 uses a tiered compilation strategy specifically for WebAssembly. When a module is loaded, no function is compiled right away—Liftoff lazily compiles a function only when it is first called. This means that large modules with many functions that are never invoked in a given session don’t pay the compilation cost for dead code. When a function becomes hot (executed enough times to trigger the tier-up heuristics), TurboFan recompiles it with full optimizations: register allocation, instruction scheduling, constant propagation, and inlining decisions that Liftoff never attempts. From that point on, only the optimized version runs. And because WebAssembly has static types and no dynamic deoptimization points in most cases, TurboFan can be more aggressive than it is with JavaScript, yielding code that often reaches near-native speed.

This tiering—baseline-fast to optimized-fast—is the engine’s answer to the warm-up problem that plagues JavaScript JITs. A pure TurboFan approach would mean a large module could take seconds to compile before any function runs. Liftoff reduces that to a fraction of a second, and because streaming compilation lets the work happen while the bytes flow in, users often see computation start within a network round-trip after the first bytes arrive. It feels like instant loading, and it’s the reason WebAssembly is displacing heavy JavaScript in domains like image filtering, 3D rendering, and cryptographic operations.

One more piece of the puzzle: code caching. Liftoff-compiled code is not cached to disk. Why? Because regenerating it from the bytecode with Liftoff is so fast that it’s nearly as quick as reading a cache, and the space saved by not storing baseline code can be put toward caching the far more valuable TurboFan output. When a module is loaded via `instantiateStreaming`, the TurboFan versions of hot functions are incrementally cached after they’re generated. On subsequent loads, the browser can skip the TurboFan recompilation entirely, going straight to optimized code for those functions. Liftoff-style startup remains fast enough that the first-load experience doesn’t suffer.

What makes WebAssembly a credible alternative to JavaScript for performance-critical modules isn’t raw peak speed alone—it’s the combination of predictable execution, static typing that avoids deoptimization storms, and this compiler pipeline that delivers low-latency first interaction. Liftoff is the quiet enabler: by decoupling “start running” from “achieve peak performance,” it lets developers write in C, Rust, or any language targeting Wasm and trust that the browser will start executing code almost as soon as the binary begins to arrive. The design constraints—single-pass, virtual stack, callback-driven, heap indirection—flow directly from the need to stream-compile without sacrificing WebAssembly’s safety guarantees. It’s a masterclass in trade-offs, and it’s one of the main reasons WebAssembly feels less like a plugin and more like a native part of the web platform.

Inside Liftoff: Streaming Compilation, Single-Pass Code Gen, and How WebAssembly Outruns JavaScript’s Startup

Comments