Problem
Research questions and friction points this paper is trying to address.
Reduces computational cost of 3D human mesh recovery models
Merges redundant transformer layers and background tokens
Maintains accuracy using diffusion decoding with temporal context
Innovation
Methods, ideas, or system contributions that make the work stand out.
Merges transformer layers with minimal error impact
Combines redundant background tokens using mask guidance
Employs diffusion decoder with temporal context integration