ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the high inference latency of large models in autonomous driving—hindering real-time performance—this paper proposes an asynchronous dual-path collaborative inference framework. Our method enables concurrent execution of lightweight real-time perception on the current frame and batched pre-inference by a large model over multiple future frames, with subsequent feedback of predicted perceptual features. Key contributions include: (1) a novel “proactive computation migration” mechanism that offloads computationally intensive operations for the current frame to prior timesteps; (2) an action-mask fusion module leveraging action-guided spatial attention to dynamically focus on critical driving regions; and (3) cross-temporal coordination between small- and large-model pathways. Evaluated on CARLA Bench2Drive Leaderboard-v2, our approach achieves a driving score of 69.53—8% higher than the state-of-the-art—while maintaining an end-to-end inference latency of only 50 ms, attaining quasi-real-time performance.

Technology Category

Application Category

📝 Abstract

How can we benefit from large models without sacrificing inference speed, a common dilemma in self-driving systems? A prevalent solution is a dual-system architecture, employing a small model for rapid, reactive decisions and a larger model for slower but more informative analyses. Existing dual-system designs often implement parallel architectures where inference is either directly conducted using the large model at each current frame or retrieved from previously stored inference results. However, these works still struggle to enable large models for a timely response to every online frame. Our key insight is to shift intensive computations of the current frame to previous time steps and perform a batch inference of multiple time steps to make large models respond promptly to each time step. To achieve the shifting, we introduce Efficiency through Thinking Ahead (ETA), an asynchronous system designed to: (1) propagate informative features from the past to the current frame using future predictions from the large model, (2) extract current frame features using a small model for real-time responsiveness, and (3) integrate these dual features via an action mask mechanism that emphasizes action-critical image regions. Evaluated on the Bench2Drive CARLA Leaderboard-v2 benchmark, ETA advances state-of-the-art performance by 8% with a driving score of 69.53 while maintaining a near-real-time inference speed at 50 ms.

Problem

Research questions and friction points this paper is trying to address.

Balancing large model benefits with fast inference in self-driving

Enabling timely large model responses via asynchronous batch computation

Integrating dual-system features for real-time autonomous driving decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-system with small and large models

Batch inference for timely large model response

Action mask integrates past and current features

🔎 Similar Papers

No similar papers found.