Reusing Legacy Code in WebAssembly: Key Challenges of Cross-Compilation and Code Semantics Preservation

📅 2024-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-compiling legacy C/C++ code to WebAssembly (Wasm) frequently incurs semantic divergence, silent miscompilation, and compilation failures, undermining reliability. Method: We propose WasmChecker, the first differential-testing framework for semantic equivalence verification of Wasm compilation. It employs binary-level semantic comparison, cross-platform compilation analysis, and an empirical study across 115 open-source projects. Contribution/Results: Our analysis systematically identifies four root causes of semantic deviation: standard library inconsistencies, missing system calls, Wasm-specific constraints, and compiler bugs. WasmChecker discovers and confirms 11 previously unknown vulnerabilities in Emscripten. We publicly release the WasmChecker framework and a benchmark dataset, providing the first quantitative evidence that mainstream Wasm compilers exhibit significant semantic non-fidelity. This work has directly driven critical fixes in Emscripten.

Technology Category

Application Category

📝 Abstract
WebAssembly (Wasm) has emerged as a powerful technology for executing high-performance code and reusing legacy code in web browsers. With its increasing adoption, ensuring the reliability of WebAssembly code becomes paramount. In this paper, we investigate how well WebAssembly compilers fulfill code reusability. Specifically, we inquire (1) what challenges arise when cross-compiling a high-level language codebase into WebAssembly and (2) how faithfully WebAssembly compilers preserve code semantics in this new binary. Through a study on 115 open-source codebases, we identify the key challenges in cross-compiling legacy C/C++ code into WebAssembly, highlighting the risks of silent miscompilation and compile-time errors. We categorize these challenges based on their root causes and propose corresponding solutions. We then introduce a differential testing approach, implemented in a framework named WasmChecker, to investigate the semantics equivalency of code between native x86-64 and WebAssembly binaries. Using WasmChecker, we provide a witness that WebAssembly compilers do not necessarily preserve code semantics when cross-compiling high-level language code into WebAssembly due to different implementations of standard libraries, unsupported system calls/APIs, WebAssembly's unique features, and compiler bugs. Furthermore, we have identified 11 new bugs in the Emscripten compiler toolchain, all confirmed by Emscripten developers. As proof of concept, we make our framework and the collected dataset of open-source codebases publicly available.
Problem

Research questions and friction points this paper is trying to address.

WebAssembly
code reuse
migration challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

WebAssembly
Code Semantic Consistency
Emscripten Compiler Bugs
🔎 Similar Papers
No similar papers found.
S
Sara Baradaran
University of Southern California, USA
L
Liyan Huang
University of Southern California, USA
Mukund Raghothaman
Mukund Raghothaman
University of Southern California
Weihang Wang
Weihang Wang
University of Southern California, USA