On Abnormal Execution Timing of Conditional Jump Instructions

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses an anomalous timing discrepancy in modern processors, where conditional branch instructions exhibit significant performance variations due to differing offsets within the micro-operation cache and the L1 instruction cache. This phenomenon not only degrades performance but also enables high-bandwidth side-channel attacks. We present the first systematic measurement and modeling of this effect across mainstream Intel microarchitectures—including Skylake, Kaby Lake, and Coffee Lake—demonstrating its strong correlation with 32-byte alignment. Through microbenchmarks, cross-platform binary analysis, and cache behavior modeling, we confirm the ubiquity of this timing variation and show that enforcing 32-byte alignment improves performance by 2.15% on average (up to 10.54%). Furthermore, we exploit this timing channel to construct a covert channel achieving a throughput of 16.14 Mbps.

Technology Category

Application Category

📝 Abstract
An extensive line of work on modern computing architectures has shown that the execution time of instructions can (i) depend on the operand of the instruction or (ii) be influenced by system optimizations, e.g., branch prediction and speculative execution paradigms. In this paper, we systematically measure and analyze timing variabilities in conditional jump instructions that can be macro-fused with a preceding instruction, depending on their placement within the binary. Our measurements indicate that these timing variations stem from the micro-op cache placement and the jump's offset in the L1 instruction cache of modern processors. We demonstrate that this behavior is consistent across multiple microarchitectures, including Skylake, Coffee Lake, and Kaby Lake, as well as various real-world implementations. We confirm the prevalence of this variability through extensive experiments on a large-scale set of popular binaries, including libraries from Ubuntu 24.04, Windows 10 Pro, and several open-source cryptographic libraries. We also show that one can easily avoid this timing variability by ensuring that macro-fusible instructions are 32-byte aligned - an approach initially suggested in 2019 by Intel in an overlooked short report. We quantify the performance impact of this approach across the cryptographic libraries, showing a speedup of 2.15% on average (and up to 10.54%) when avoiding the timing variability. As a by-product, we show that this variability can be exploited as a covert channel, achieving a maximum throughput of 16.14 Mbps.
Problem

Research questions and friction points this paper is trying to address.

conditional jump
execution timing
macro-fusion
micro-op cache
L1 instruction cache
Innovation

Methods, ideas, or system contributions that make the work stand out.

macro-fusion
timing variability
micro-op cache
covert channel
instruction alignment
🔎 Similar Papers
A
Annika Wilde
Ruhr University Bochum
S
Samira Briongos
NEC Laboratories Europe
Claudio Soriente
Claudio Soriente
NEC Labs
Securityprivacy
G
Ghassan O. Karame
Ruhr University Bochum