The 9th AI City Challenge

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The 9th AI City Challenge addresses key urban domains—traffic management, industrial automation, and public safety—through four tasks: (1) multi-class 3D multi-camera tracking (persons, humanoid robots, AMRs, forklifts); (2) video question answering in traffic scenes using 3D gaze annotations; (3) spatial fine-grained reasoning in dynamic warehouses via RGB-D perception–language fusion; and (4) lightweight fisheye road-object detection for edge devices. Methodologically, it introduces novel 3D gaze labeling to enhance cross-camera event understanding, leverages NVIDIA Omniverse to generate high-fidelity synthetic RGB-D datasets, and integrates multi-camera calibration, 3D bounding-box annotation, model lightweighting, and multimodal language modeling. The challenge attracted 245 teams from 15 countries, with dataset downloads exceeding 30,000. Multiple tasks achieved new state-of-the-art results, significantly improving method reproducibility, cross-scenario generalizability, and edge-deployment efficiency.

Technology Category

Application Category

📝 Abstract
The ninth AI City Challenge continues to advance real-world applications of computer vision and AI in transportation, industrial automation, and public safety. The 2025 edition featured four tracks and saw a 17% increase in participation, with 245 teams from 15 countries registered on the evaluation server. Public release of challenge datasets led to over 30,000 downloads to date. Track 1 focused on multi-class 3D multi-camera tracking, involving people, humanoids, autonomous mobile robots, and forklifts, using detailed calibration and 3D bounding box annotations. Track 2 tackled video question answering in traffic safety, with multi-camera incident understanding enriched by 3D gaze labels. Track 3 addressed fine-grained spatial reasoning in dynamic warehouse environments, requiring AI systems to interpret RGB-D inputs and answer spatial questions that combine perception, geometry, and language. Both Track 1 and Track 3 datasets were generated in NVIDIA Omniverse. Track 4 emphasized efficient road object detection from fisheye cameras, supporting lightweight, real-time deployment on edge devices. The evaluation framework enforced submission limits and used a partially held-out test set to ensure fair benchmarking. Final rankings were revealed after the competition concluded, fostering reproducibility and mitigating overfitting. Several teams achieved top-tier results, setting new benchmarks in multiple tasks.
Problem

Research questions and friction points this paper is trying to address.

Multi-class 3D multi-camera tracking across diverse agents
Video question answering with 3D gaze incident understanding
Fine-grained spatial reasoning in dynamic warehouse environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-class 3D multi-camera tracking with detailed calibration
Video question answering with 3D gaze labels
RGB-D spatial reasoning in dynamic warehouse environments
🔎 Similar Papers
No similar papers found.
Z
Zheng Tang
NVIDIA Corporation, CA, USA
S
Shuo Wang
NVIDIA Corporation, CA, USA
David C. Anastasiu
David C. Anastasiu
Santa Clara University, Santa Clara, CA
machine learningdata miningcomputational genomicshigh performance computinginformation retrieval
M
Ming-Ching Chang
University at Albany, SUNY, NY, USA
A
Anuj Sharma
Iowa State University, IA, USA
Q
Quan Kong
Woven by Toyota, Japan
N
Norimasa Kobori
Woven by Toyota, Japan
M
Munkhjargal Gochoo
United Arab Emirates University, UAE
Ganzorig Batnasan
Ganzorig Batnasan
United Arab Emirates University, UAE
Munkh-Erdene Otgonbold
Munkh-Erdene Otgonbold
United Arab Emirates University, UAE
F
Fady Alnajjar
United Arab Emirates University, UAE
Jun-Wei Hsieh
Jun-Wei Hsieh
National Yang Ming Chiao Tung University
computer visionAIimage processing
Tomasz Kornuta
Tomasz Kornuta
NVIDIA Corporation, CA, USA
X
Xiaolong Li
NVIDIA Corporation, CA, USA
Y
Yilin Zhao
NVIDIA Corporation, CA, USA
H
Han Zhang
NVIDIA Corporation, CA, USA
S
Subhashree Radhakrishnan
NVIDIA Corporation, CA, USA
Arihant Jain
Arihant Jain
Research Scholar at IITM
Ratnesh Kumar
Ratnesh Kumar
NVIDIA Corporation, CA, USA
V
Vidya N. Murali
NVIDIA Corporation, CA, USA
Y
Yuxing Wang
NVIDIA Corporation, CA, USA
S
Sameer Satish Pusegaonkar
NVIDIA Corporation, CA, USA
Y
Yizhou Wang
NVIDIA Corporation, CA, USA
S
Sujit Biswas
NVIDIA Corporation, CA, USA
X
Xunlei Wu
NVIDIA Corporation, CA, USA