🤖 AI Summary
This work addresses the accessibility barriers of traditional 3D character animation, which typically relies on specialized software or costly motion capture systems. The authors propose a lightweight vision-based motion capture approach that reframes animation creation as “digital puppetry”: users manipulate arbitrary everyday objects—such as plush toys or bananas—in front of a monocular camera, and the system translates these coarse manipulations into realistic character animations. This is achieved through a bounding-box-driven generative motion model, leveraging large-scale human motion priors and synthetically generated proxy-animation paired data. Notably, the method operates without precise human pose tracking or real-world paired training data, substantially lowering the entry barrier for animation creation. User studies demonstrate that the system supports a wide variety of physical proxies and enables intuitive, creative character animation.
📝 Abstract
Creating compelling 3D character animations typically requires either expert use of professional software or expensive motion capture systems operated by skilled actors. We present DancingBox, a lightweight, vision-based system that makes motion capture accessible to novices by reimagining the process as digital puppetry. Instead of tracking precise human motions, DancingBox captures the approximate movements of everyday objects manipulated by users with a single webcam. These coarse proxy motions are then refined into realistic character animations by conditioning a generative motion model on bounding-box representations, enriched with human motion priors learned from large-scale datasets. To overcome the lack of paired proxy-animation data, we synthesize training pairs by converting existing motion capture sequences into proxy representations. A user study demonstrates that DancingBox enables intuitive and creative character animation using diverse proxies, from plush toys to bananas, lowering the barrier to entry for novice animators.