On Abnormal Execution Timing of Conditional Jump Instructions

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses an anomalous timing discrepancy in modern processors, where conditional branch instructions exhibit significant performance variations due to differing offsets within the micro-operation cache and the L1 instruction cache. This phenomenon not only degrades performance but also enables high-bandwidth side-channel attacks. We present the first systematic measurement and modeling of this effect across mainstream Intel microarchitectures—including Skylake, Kaby Lake, and Coffee Lake—demonstrating its strong correlation with 32-byte alignment. Through microbenchmarks, cross-platform binary analysis, and cache behavior modeling, we confirm the ubiquity of this timing variation and show that enforcing 32-byte alignment improves performance by 2.15% on average (up to 10.54%). Furthermore, we exploit this timing channel to construct a covert channel achieving a throughput of 16.14 Mbps.

Technology Category

Application Category

📝 Abstract

An extensive line of work on modern computing architectures has shown that the execution time of instructions can (i) depend on the operand of the instruction or (ii) be influenced by system optimizations, e.g., branch prediction and speculative execution paradigms. In this paper, we systematically measure and analyze timing variabilities in conditional jump instructions that can be macro-fused with a preceding instruction, depending on their placement within the binary. Our measurements indicate that these timing variations stem from the micro-op cache placement and the jump's offset in the L1 instruction cache of modern processors. We demonstrate that this behavior is consistent across multiple microarchitectures, including Skylake, Coffee Lake, and Kaby Lake, as well as various real-world implementations. We confirm the prevalence of this variability through extensive experiments on a large-scale set of popular binaries, including libraries from Ubuntu 24.04, Windows 10 Pro, and several open-source cryptographic libraries. We also show that one can easily avoid this timing variability by ensuring that macro-fusible instructions are 32-byte aligned - an approach initially suggested in 2019 by Intel in an overlooked short report. We quantify the performance impact of this approach across the cryptographic libraries, showing a speedup of 2.15% on average (and up to 10.54%) when avoiding the timing variability. As a by-product, we show that this variability can be exploited as a covert channel, achieving a maximum throughput of 16.14 Mbps.

Problem

Research questions and friction points this paper is trying to address.

conditional jump

execution timing

macro-fusion

micro-op cache

L1 instruction cache

Innovation

Methods, ideas, or system contributions that make the work stand out.

macro-fusion

timing variability

micro-op cache