History-Aware Visuomotor Policy Learning via Point Tracking

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual motor policies often struggle to model long-term dependencies and recurrent states due to the Markov assumption; existing approaches that extend the observation window lack flexibility in accommodating diverse memory requirements. To address this, we propose a point-tracking–based, object-centric historical representation method that abstracts past observations into compact, structured object-level trajectory sequences. Leveraging lightweight encoding and aggregation modules, our approach unifies support for multiple memory functions—including task-phase recognition, spatial memory maintenance, and action counting—while enabling both continuous memory updating and pre-loaded memory initialization. Crucially, it requires no modification to downstream policy architectures and can be seamlessly integrated into mainstream visual motor policies. Evaluated on multiple embodied manipulation tasks, our method significantly outperforms Markovian baselines and existing history-aware approaches, achieving substantial improvements in task completion rate and decision accuracy.

Technology Category

Application Category

📝 Abstract
Many manipulation tasks require memory beyond the current observation, yet most visuomotor policies rely on the Markov assumption and thus struggle with repeated states or long-horizon dependencies. Existing methods attempt to extend observation horizons but remain insufficient for diverse memory requirements. To this end, we propose an object-centric history representation based on point tracking, which abstracts past observations into a compact and structured form that retains only essential task-relevant information. Tracked points are encoded and aggregated at the object level, yielding a compact history representation that can be seamlessly integrated into various visuomotor policies. Our design provides full history-awareness with high computational efficiency, leading to improved overall task performance and decision accuracy. Through extensive evaluations on diverse manipulation tasks, we show that our method addresses multiple facets of memory requirements - such as task stage identification, spatial memorization, and action counting, as well as longer-term demands like continuous and pre-loaded memory - and consistently outperforms both Markovian baselines and prior history-based approaches. Project website: http://tonyfang.net/history
Problem

Research questions and friction points this paper is trying to address.

Addressing memory limitations in visuomotor policies for manipulation tasks
Overcoming Markov assumption constraints for repeated states and long-horizon dependencies
Providing efficient history representation for diverse memory requirements in robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric history representation via point tracking
Compact structured abstraction of past task-relevant observations
Seamless integration into various visuomotor policies
🔎 Similar Papers
No similar papers found.
Jingjing Chen
Jingjing Chen
Fudan University
MultimediaComputer VisionMachine LearningPattern recognition
Hongjie Fang
Hongjie Fang
Shanghai Jiao Tong University
RoboticsRobot LearningRobotic Manipulation
C
Chenxi Wang
Noematrix
S
Shiquan Wang
Noematrix, Flexiv Robotics
C
Cewu Lu
Shanghai Jiao Tong University, Noematrix, Flexiv Robotics