Clear Sky Science · en
Stack-based static WebAssembly binary slicing and mutation for generating valid sub-binaries
Why splitting code into smaller pieces matters
Modern web apps increasingly rely on WebAssembly, a compact machine-like language that lets programs written in C, C++, or Rust run at near‑native speed in browsers, clouds, and even tiny embedded devices. As more systems execute WebAssembly, we need safe, thorough ways to test these execution engines for bugs and security flaws. But real‑world WebAssembly programs are large, complex, and hard to collect in sufficient variety. This paper introduces a new way to carve existing WebAssembly programs into many smaller, self‑contained mini‑programs and then cleverly tweak them. The result is a rich supply of valid, bite‑sized test cases that can stress‑test WebAssembly engines far more effectively than today’s tools.

From one big program to many small ones
The authors start from the observation that simply generating random WebAssembly code almost never works: the language is tightly checked, especially around its use of a stack that holds temporary values. Instead of making code from scratch, they begin with an existing "base" WebAssembly binary and use a technique called program slicing. A slice is a reduced version of a program that keeps only the pieces of code that influence a chosen point of interest, such as a particular instruction inside a function. By following data flow, memory use, and control flow backwards from that point, the method extracts a compact fragment that still “means something” in terms of computation, rather than being random noise.
Keeping the invisible stack in balance
WebAssembly programs rely on a strict stack discipline: each instruction pops and pushes values, and structured blocks and branches must leave the stack in exactly the right shape when they finish. Naïvely cutting out parts of a program almost always breaks this discipline, making the result invalid. Earlier work repaired slices by padding them with fixed patterns of dummy instructions, which kept them runnable but limited their variety. This paper instead introduces a dedicated stack‑balance correction algorithm. It analyzes the flow of control inside each slice, finds places where branches or block endings disturb the expected stack shape, and then inserts tailored instruction sequences that drop unneeded values and synthesize new ones of the proper types. These small “gadgets” restore correctness while allowing much richer instruction patterns than simple padding.
Building whole mini‑programs with controlled complexity
A corrected slice is still just a function body, and that function may call others or depend on module‑level features like global variables and linear memory. The authors therefore extend their method to assemble complete, executable mini‑modules. Starting from a randomly chosen entry function, they repeatedly slice and fix any functions that are called, up to a user‑chosen "call depth" that controls how many layers of calls are preserved. Beyond that depth, calls are replaced with short stand‑ins that mimic the right number of arguments and results without pulling in more code. The system also reconstructs the surrounding sections of a WebAssembly module—types, tables, memory, and so on—so that each generated binary is self‑contained and can be run directly by standard tools and runtimes.

Adding variety through careful mutation
Even with slicing, many generated binaries still resemble their parent program. To broaden coverage, the authors introduce a mutation phase that works backwards through each function. For instructions that consume inputs, the algorithm randomly chooses a new result type, finds substitute instructions that produce that type, and then recursively mutates the earlier instructions that supply its inputs so that the whole chain still fits together. This preserves stack correctness while reshaping the computation. Compared with a baseline set of real‑world WebAssembly programs, the mutated binaries contain many more distinct instruction patterns and make much heavier use of rarely seen features such as vector operations, which are important for exercising the full implementation of WebAssembly engines.
What the experiments reveal
To test their approach, the authors implemented about 17,000 lines of Python covering nearly the full WebAssembly 2.0 standard. Using a large public dataset of thousands of real‑world binaries, they showed that their technique can routinely shrink programs to a fraction of their original instruction count—often below 40% of the largest baseline sizes—while keeping them valid in over 99% of cases when mutation is enabled. The method runs quickly, typically well under a second per generated binary of moderate size, and significantly increases both the range of instruction types and the number of unique instruction sequences observed. In practical terms, this means testers and security researchers can turn a modest corpus of WebAssembly programs into a large, diverse suite of small, valid binaries that are ideal for fuzzing, differential testing, and regression testing of WebAssembly runtimes.
How this helps keep WebAssembly safe
In everyday terms, the paper describes a smart “code refiner” for WebAssembly: it chops big programs into many smaller, meaningful chunks, automatically repairs them so they still follow the language’s strict rules, and then reshapes them to explore unusual corners of the instruction set. Because these mini‑programs remain realistic yet compact, they can be run quickly and in great numbers, increasing the chances of uncovering subtle bugs or security issues in the software that executes WebAssembly. As WebAssembly spreads from browsers into clouds and devices, such systematic generation of valid, diverse test binaries offers an important tool for keeping this ecosystem robust and secure.
Citation: Choi, G., Jeon, S. Stack-based static WebAssembly binary slicing and mutation for generating valid sub-binaries. Sci Rep 16, 10910 (2026). https://doi.org/10.1038/s41598-026-45837-y
Keywords: WebAssembly testing, program slicing, binary mutation, software security, runtime verification