The Wilhelm Tell Dataset of Affordance Demonstrations

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing functional affordance learning methods predominantly rely on static images or 3D shape annotations, lacking benchmark datasets supporting dynamic video modeling. To address this gap, we introduce the first functional affordance video dataset tailored to household tasks, comprising 7 hours of real-world human manipulation videos captured simultaneously from first-person and third-person viewpoints. The dataset is annotated with fine-grained functional behavior metadata—including task preparation actions, spatial reorganization, and other context-sensitive operations—enabling precise spatiotemporal alignment. It is the first to systematically support learning of temporally grounded, context-dependent affordances directly from multi-view dynamic video. The released large-scale, functionally annotated demonstration dataset significantly advances robotic understanding of functional behaviors, particularly enhancing modeling of prerequisite actions and evolving task contexts. It establishes a new benchmark for training vision-driven functional perception models.

Technology Category

Application Category

📝 Abstract
Affordances - i.e. possibilities for action that an environment or objects in it provide - are important for robots operating in human environments to perceive. Existing approaches train such capabilities on annotated static images or shapes. This work presents a novel dataset for affordance learning of common household tasks. Unlike previous approaches, our dataset consists of video sequences demonstrating the tasks from first- and third-person perspectives, along with metadata about the affordances that are manifested in the task, and is aimed towards training perception systems to recognize affordance manifestations. The demonstrations were collected from several participants and in total record about seven hours of human activity. The variety of task performances also allows studying preparatory maneuvers that people may perform for a task, such as how they arrange their task space, which is also relevant for collaborative service robots.
Problem

Research questions and friction points this paper is trying to address.

Dataset for affordance learning in household tasks
Video sequences with first- and third-person perspectives
Training perception systems to recognize affordance manifestations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video sequences for affordance learning
First- and third-person perspectives
Metadata on affordance manifestations