๐ค AI Summary
Branch misprediction incurs substantial performance penalties on data-dependent branches, where existing hardware predictors and profile-guided approaches deliver limited gains. This paper proposes an LLVM IR-level compiler optimization that pioneers the application of sequence alignment to branch elimination: it merges semantically similar control-flow paths at the instruction-path level while enforcing semantic correctness via operand-level safety guardsโwithout requiring hardware predicate support. Unlike conventional if-conversion, our approach avoids x86โs restrictions on memory instructions and eliminates high speculative overhead by combining IR-level path alignment, static analysis, and lightweight runtime checks. Evaluated on 102 benchmarks, it achieves a 10.9% geometric mean speedup, with peak improvements up to 32ร, and introduces significantly lower static instruction overhead than baseline methods.
๐ Abstract
Branch mispredictions cause catastrophic performance penalties in modern processors, leading to performance loss. While hardware predictors and profile-guided techniques exist, data-dependent branches with irregular patterns remain challenging. Traditional if-conversion eliminates branches via software predication but faces limitations on architectures like x86. It often fails on paths containing memory instructions or incurs excessive instruction overhead by fully speculating large branch bodies.
This paper presents Melding IR Instructions (MERIT), a compiler transformation that eliminates branches by aligning and melding similar operations from divergent paths at the IR instruction level. By observing that divergent paths often perform structurally similar operations with different operands, MERIT adapts sequence alignment to discover merging opportunities and employs safe operand-level guarding to ensure semantic correctness without hardware predication. Implemented as an LLVM pass and evaluated on 102 programs from four benchmark suites, MERIT achieves a geometric mean speedup of 10.9% with peak improvements of 32x compared to hardware branch predictor, demonstrating the effectiveness with reduced static instruction overhead.