I followed a single WebAssembly function from its raw bytes arriving over the network all the way to machine code executing at peak speed, and what I found was a beautifully parallel construction of performance—two compilers, each with a different definition of “fast,” working in tandem so that you never have to wait.
The journey begins before the function is even fully known. As the module is still downloading, V8’s streaming decoder calls into Liftoff, feeding it a stream of validated bytecodes. Liftoff compiles each function lazily and in a single pass: it decodes an opcode, emits a handful of machine instructions directly, and moves on—no intermediate representation, no global analysis, just a direct translation of WebAssembly’s virtual stack operations into real registers and stack slots. The moment the first call arrives, the function is already live. That’s the instant startup you feel when a WebAssembly module starts running before the last byte of the download has arrived. Liftoff’s code is correct, it is safe, but it is not fast in the way TurboFan can make it; it’s just fast enough to begin.
While that first call executes, a second timeline is already ticking in the background. V8 has a profiler watching how often each function runs. When a function warms up—its heat measured by call frequency—a tier-up is triggered. The trigger is not a full on‑stack replacement; the currently executing Liftoff frame is allowed to finish. Instead, the engine queues the function for recompilation by TurboFan, and a new copy of the machine code is built in a parallel thread. TurboFan is the sea: a deep, multi‑pass optimizing compiler that constructs a full “Sea of Nodes” graph, applies speculative type guards based on the profiling feedback collected during the function’s initial runs, performs aggressive register allocation, eliminates redundant loads, and inlines called functions. The result is a tightened, specialized version of the same logic—one that assumes the observed types will hold, and that uses every trick in the optimizing compiler’s book to shave cycles.
The handoff between these two tiers is seamless precisely because there is no interruption. When the TurboFan compilation finishes, the new code is atomically registered with the WebAssembly module’s function table. Future calls to that function will jump directly to the TurboFan‑generated code, using the freshly allocated registers and the speculative fast paths. But any call that is already running—still inside that original Liftoff frame—continues to completion with the baseline code it knows. There’s no frame translation, no deoptimization checkpoint to resume; the old frame simply returns, and the next call picks up the new code. This design choice—no on‑stack replacement for WebAssembly—keeps the transition invisible and adds zero runtime overhead to the hot path. Speed arrives without a single frame dropped.
The parallelism extends even to how the two compilers treat code caching. Liftoff code is never cached because the cost of regenerating it is roughly the same as loading it from disk: the one‑pass compilation is that light. TurboFan output, however, is precious—it took intense analysis to produce—so when a module is stream‑compiled via `WebAssembly.compileStreaming`, V8 incrementally caches the TurboFan‑generated code once enough of it has accumulated. On the next load from the same URL, that cached optimized code can be served immediately, bypassing even the profiling warm‑up. Liftoff still runs if needed for a cold function, but the heavy lifting has already been done in a previous session.
Stepping back, what I traced is a deliberate co‑construction of speed along two axes. Liftoff gives you the first frame now, streaming and unoptimized, because in the web the difference between zero milliseconds and fifty is the difference between a seamless experience and a glitch. TurboFan gives you the later frames fast, because once a function matters, every nanosecond counts. The two tiers aren’t competitors; they are a partnership that trades a little duplicated work in the parallel thread for a compound win: immediate responsiveness and eventual peak throughput, without a single seam between them. That’s the hidden dance behind every blazing‑fast WebAssembly module you’ll ever load.
Comments
No comments yet — be the first.