BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

Existing autonomous driving datasets primarily support single-task supervised learning and lack multimodal, closed-loop data capturing extreme operational scenarios—essential for holistic algorithmic evaluation across perception, planning, control, and state estimation. To address this gap, we introduce BETTY: the first large-scale, multimodal dataset designed for full-stack autonomous driving. BETTY encompasses 4 years of real-world racing data (13 hours, 32 TB), covering six high-dynamics regimes—including GPS-denied navigation,极限 maneuvering, and multi-vehicle coordination. It synchronously records raw sensor streams (LiDAR, multi-camera, IMU, wheel encoders) alongside high-fidelity ground truth (motion capture + vehicle dynamics calibration), software-stack outputs, and semantic metadata. Crucially, BETTY is the first to systematically integrate full-stack closed-loop signals and extreme-scenario ground truth, enabling joint supervised and self-supervised training. Publicly released, it advances research in state estimation, motion prediction, and end-to-end control under extreme conditions—e.g., 63 m/s collisions, instability-induced sliding, and high-g cornering.

Technology Category

Application Category

📝 Abstract

We present the BETTY dataset, a large-scale, multi-modal dataset collected on several autonomous racing vehicles, targeting supervised and self-supervised state estimation, dynamics modeling, motion forecasting, perception, and more. Existing large-scale datasets, especially autonomous vehicle datasets, focus primarily on supervised perception, planning, and motion forecasting tasks. Our work enables multi-modal, data-driven methods by including all sensor inputs and the outputs from the software stack, along with semantic metadata and ground truth information. The dataset encompasses 4 years of data, currently comprising over 13 hours and 32TB, collected on autonomous racing vehicle platforms. This data spans 6 diverse racing environments, including high-speed oval courses, for single and multi-agent algorithm evaluation in feature-sparse scenarios, as well as high-speed road courses with high longitudinal and lateral accelerations and tight, GPS-denied environments. It captures highly dynamic states, such as 63 m/s crashes, loss of tire traction, and operation at the limit of stability. By offering a large breadth of cross-modal and dynamic data, the BETTY dataset enables the training and testing of full autonomy stack pipelines, pushing the performance of all algorithms to the limits. The current dataset is available at https://pitt-mit-iac.github.io/betty-dataset/.

Problem

Research questions and friction points this paper is trying to address.

Enables multi-modal data-driven methods for autonomy tasks

Supports high-speed dynamic scenarios and extreme conditions

Facilitates full-stack algorithm evaluation in diverse environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal dataset for full-stack autonomy

Includes sensor inputs and software outputs

Captures dynamic states in diverse environments

🔎 Similar Papers

No similar papers found.