MAMMA: Markerless&Automatic Multi-Person Motion Action Capture

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional marker-based motion capture relies on specialized hardware and manual calibration, resulting in high costs and labor-intensive workflows; existing learning-based approaches are largely limited to single-subject, sparse 2D keypoints and struggle with occlusion and physical human-human interaction. To address these challenges, MAMMA introduces the first markerless, multi-view motion capture framework tailored for two-person interaction. Its core innovations include: (1) segmentation-mask-guided prediction of dense 2D surface keypoints; (2) a learnable query mechanism enabling robust identity association and correspondence estimation under severe occlusion; and (3) the first large-scale synthetic multi-view interactive motion capture dataset with SMPL-X ground truth. Evaluated on real-world two-person interaction sequences, MAMMA achieves SMPL-X parameter reconstruction accuracy comparable to commercial marker-based systems—without manual post-processing. The code, models, dataset, and benchmark are fully open-sourced.

Technology Category

Application Category

📝 Abstract
We present MAMMA, a markerless motion-capture pipeline that accurately recovers SMPL-X parameters from multi-view video of two-person interaction sequences. Traditional motion-capture systems rely on physical markers. Although they offer high accuracy, their requirements of specialized hardware, manual marker placement, and extensive post-processing make them costly and time-consuming. Recent learning-based methods attempt to overcome these limitations, but most are designed for single-person capture, rely on sparse keypoints, or struggle with occlusions and physical interactions. In this work, we introduce a method that predicts dense 2D surface landmarks conditioned on segmentation masks, enabling person-specific correspondence estimation even under heavy occlusion. We employ a novel architecture that exploits learnable queries for each landmark. We demonstrate that our approach can handle complex person--person interaction and offers greater accuracy than existing methods. To train our network, we construct a large, synthetic multi-view dataset combining human motions from diverse sources, including extreme poses, hand motions, and close interactions. Our dataset yields high-variability synthetic sequences with rich body contact and occlusion, and includes SMPL-X ground-truth annotations with dense 2D landmarks. The result is a system capable of capturing human motion without the need for markers. Our approach offers competitive reconstruction quality compared to commercial marker-based motion-capture solutions, without the extensive manual cleanup. Finally, we address the absence of common benchmarks for dense-landmark prediction and markerless motion capture by introducing two evaluation settings built from real multi-view sequences. We will release our dataset, benchmark, method, training code, and pre-trained model weights for research purposes.
Problem

Research questions and friction points this paper is trying to address.

Accurately captures two-person interactions without markers
Overcomes occlusion challenges in multi-person motion capture
Eliminates need for manual cleanup in motion reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Markerless motion-capture using multi-view video
Dense 2D landmarks from segmentation masks
Learnable queries for landmark prediction
🔎 Similar Papers
Hanz Cuevas-Velasquez
Hanz Cuevas-Velasquez
Research & Engineer, Max Planck Institute
Computer VisionMachine Learning
A
Anastasios Yiannakidis
Max Planck Institute for Intelligent Systems, Tübingen, Germany
S
Soyong Shin
Carnegie Mellon University, Pittsburgh, USA
Giorgio Becherini
Giorgio Becherini
Max Planck Institute for Intelligent Systems
M
Markus Hoschle
Max Planck Institute for Intelligent Systems, Tübingen, Germany
J
Joachim Tesch
Max Planck Institute for Intelligent Systems, Tübingen, Germany
T
Taylor Obersat
Max Planck Institute for Intelligent Systems, Tübingen, Germany
T
Tsvetelina Alexiadis
Max Planck Institute for Intelligent Systems, Tübingen, Germany
M
Michael Black
Max Planck Institute for Intelligent Systems, Tübingen, Germany