🤖 AI Summary
Existing methods rely on zero-velocity constraints and joint-level contact modeling, limiting dense foot–ground contact estimation from a single RGB image. Key challenges include poor generalization due to high visual diversity across footwear types and insufficient feature discriminability caused by monotonous ground textures. To address these, we propose a footwear-agnostic adversarial training framework coupled with a ground-aware spatial contextual feature extractor: the former decouples shoe-style variations, while the latter explicitly encodes geometric and material priors of the ground surface. Furthermore, we relax the zero-velocity assumption and formulate a pixel-wise foot-contact probability field. Our method significantly outperforms state-of-the-art approaches across diverse footwear and ground conditions. It is the first to achieve robust, fine-grained, and physically interpretable dense foot–ground contact estimation from a single image—establishing a new paradigm for human–robot interaction and biomechanical motion analysis.
📝 Abstract
Foot contact plays a critical role in human interaction with the world, and thus exploring foot contact can advance our understanding of human movement and physical interaction. Despite its importance, existing methods often approximate foot contact using a zero-velocity constraint and focus on joint-level contact, failing to capture the detailed interaction between the foot and the world. Dense estimation of foot contact is crucial for accurately modeling this interaction, yet predicting dense foot contact from a single RGB image remains largely underexplored. There are two main challenges for learning dense foot contact estimation. First, shoes exhibit highly diverse appearances, making it difficult for models to generalize across different styles. Second, ground often has a monotonous appearance, making it difficult to extract informative features. To tackle these issues, we present a FEet COntact estimation (FECO) framework that learns dense foot contact with shoe style-invariant and ground-aware learning. To overcome the challenge of shoe appearance diversity, our approach incorporates shoe style adversarial training that enforces shoe style-invariant features for contact estimation. To effectively utilize ground information, we introduce a ground feature extractor that captures ground properties based on spatial context. As a result, our proposed method achieves robust foot contact estimation regardless of shoe appearance and effectively leverages ground information. Code will be released.