CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning

📅 2024-07-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of static object re-identification for mobile service robots operating over extended periods in dynamic outdoor environments, this work focuses on generalizable instance-level object re-identification across varying viewpoints, illumination conditions, and weather. Existing approaches rely heavily on category-level priors or require precise foreground segmentation, and fail to model complex outdoor appearance variations robustly. To overcome these limitations, we: (1) introduce CODa Re-ID, the first large-scale野外 (field-deployed) object re-identification benchmark featuring real-world environmental diversity; (2) propose CLOVER, a segmentation-free, context-aware invariant representation learning framework that jointly incorporates multi-view geometric priors and environment-invariance constraints via contrastive self-supervised learning; and (3) demonstrate state-of-the-art performance on CODa Re-ID, with strong generalization across unseen instances and categories—enabling robust long-term object tracking and semantic understanding in realistic outdoor settings.

Technology Category

Application Category

📝 Abstract
In many applications, robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Most works on object re-identification focus on specific classes; approaches that address general object re-identification require foreground segmentation and have limited consideration of challenges such as occlusions, outdoor scenes, and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes, and can generalize to unseen instances and classes.
Problem

Research questions and friction points this paper is trying to address.

Object re-identification across varying viewpoints and lighting
Lack of datasets for outdoor scenes with illumination changes
Need for segmentation-free instance distinction in object mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware object representation learning
No foreground segmentation required
Scalable descriptor summarization for object maps
🔎 Similar Papers
No similar papers found.
D
Dongmyeong Lee
Department of Computer Sciences, University of Texas, Austin
A
Amanda Adkins
Department of Computer Sciences, University of Texas, Austin
Joydeep Biswas
Joydeep Biswas
Associate Professor, Computer Science Department, The University of Texas at Austin
RoboticsArtificial IntelligenceMulti Robot SystemsLocalizationMapping