🤖 AI Summary
This paper addresses pacing inaccuracy in QUIC user-space implementations, caused by coarse-grained timers, system call overhead, and scheduling latency. We systematically evaluate pacing performance across three major implementations—quiche, picoquic, and ngtcp2—and quantify the impact of Linux qdiscs (especially FQ), GSO hardware offloading, and the Earliest Transmit First (ETF) scheduler. We present the first empirical comparison of pacing accuracy between pure user-space and kernel-assisted approaches. Furthermore, we propose and validate a kernel patch that leverages per-packet timestamping within the GSO buffer to enable fine-grained scheduling while preserving batch-processing efficiency. Experiments show that picoquic with BBR achieves sub-millisecond pacing accuracy in pure user-space mode; the FQ qdisc provides the most favorable pacing environment for QUIC; and kernel-enhanced GSO pacing reduces timing jitter by 40%, significantly improving real-time audio/video quality.
📝 Abstract
Pacing is a key mechanism in modern transport protocols, used to regulate packet transmission timing to minimize traffic burstiness, lower latency, and reduce packet loss. Standardized in 2021, QUIC is a UDP-based protocol designed to improve upon the TCP / TLS stack. While the QUIC protocol recommends pacing, and congestion control algorithms like BBR rely on it, the user-space nature of QUIC introduces unique challenges. These challenges include coarse-grained timers, system call overhead, and OS scheduling delays, all of which complicate precise packet pacing. This paper investigates how pacing is implemented differently across QUIC stacks, including quiche, picoquic, and ngtcp2, and evaluates the impact of system-level features like GSO and Linux qdiscs on pacing. Using a custom measurement framework and a passive optical fiber tap, we establish a baseline with default settings and systematically explore the effects of qdiscs, hardware offloading using the ETF qdisc, and GSO on pacing precision and network performance. We also extend and evaluate a kernel patch to enable pacing of individual packets within GSO buffers, combining batching efficiency with precise pacing. Kernel-assisted and purely user-space pacing approaches are compared. We show that pacing with only user-space timers can work well, as demonstrated by picoquic with BBR. With quiche, we identify FQ as a qdisc well-suited for pacing QUIC traffic, as it is relatively easy to use and offers precise pacing based on packet timestamps. Our findings provide new insights into the trade-offs involved in implementing pacing in QUIC and highlight potential optimizations for real-world applications like video streaming and video calls.