🤖 AI Summary
This work addresses the limitations of conventional image-based deformation measurement methods, which rely on inter-frame infinitesimal motion assumptions and thus struggle with highly dynamic scenes while demanding high-speed cameras that incur substantial storage and computational costs. To overcome these challenges, the authors propose an event-image fusion framework that leverages event streams for temporally dense motion cues and image frames for spatial fidelity. Deformation fields are modeled via a low-dimensional, piecewise-linear representation using affine-invariant simplex parametrization. Long-term tracking stability is enhanced through a neighborhood greedy optimization strategy combined with an elastic deformation prior. Evaluated on a newly curated benchmark dataset comprising over 120 sequences, the method achieves a 1.6% higher survival rate than the current state-of-the-art while requiring only 18.9% of the storage and computational resources of high-speed video-based approaches.
📝 Abstract
Visual Deformation Measurement (VDM) aims to recover dense deformation fields by tracking surface motion from camera observations. Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space, which limits their applicability to highly dynamic scenes or necessitates high-speed cameras at the cost of prohibitive storage and computational overhead. We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation. Revisiting the solid elastic modeling prior, we propose an Affine Invariant Simplicial (AIS) framework. It partitions the deformation field into linearized sub-regions with low-parametric representation, effectively mitigating motion ambiguities arising from sparse and noisy events. To speed up parameter searching and reduce error accumulation, a neighborhood-greedy optimization strategy is introduced, enabling well-converged sub-regions to guide their poorly-converged neighbors, effectively suppress local error accumulation in long-term dense tracking. To evaluate the proposed method, a benchmark dataset with temporally aligned event streams and frames is established, encompassing over 120 sequences spanning diverse deformation scenarios. Experimental results show that our method outperforms the state-of-the-art baseline by 1.6% in survival rate. Remarkably, it achieves this using only 18.9% of the data storage and processing resources of high-speed video methods.