🤖 AI Summary
To address the core challenges of real-time dynamic target tracking under high-frequency, large-magnitude disturbances and high end-to-end latency in robotic ultrasound systems (RUSS), this paper proposes a novel perception–control co-design paradigm. We introduce a decoupled dual-stream convolutional network that achieves robust 2D-to-3D pose estimation at over 60 Hz. Coupled with a single-step deep generative control strategy—bypassing conventional iterative optimization—it directly outputs stable, task-oriented control actions. This architecture breaks the end-to-end latency bottleneck, achieving an average 3D tracking error of <6.5 mm and successful reacquisition after large displacements (>170 mm) on dynamic phantoms. Clinical feasibility is validated in human trials, demonstrating real-time tracking at speeds up to 102 mm/s with submillimeter terminal positioning accuracy (<1.7 mm).
📝 Abstract
Real-time tracking of dynamic targets amidst large-scale, high-frequency disturbances remains a critical unsolved challenge in Robotic Ultrasound Systems (RUSS), primarily due to the end-to-end latency of existing systems. This paper argues that breaking this latency barrier requires a fundamental shift towards the synergistic co-design of perception and control. We realize it in a novel framework with two tightly-coupled contributions: (1) a Decoupled Dual-Stream Perception Network that robustly estimates 3D translational state from 2D images at high frequency, and (2) a Single-Step Flow Policy that generates entire action sequences in one inference pass, bypassing the iterative bottleneck of conventional policies. This synergy enables a closed-loop control frequency exceeding 60Hz. On a dynamic phantom, our system not only tracks complex 3D trajectories with a mean error below 6.5mm but also demonstrates robust re-acquisition from over 170mm displacement. Furthermore, it can track targets at speeds of 102mm/s, achieving a terminal error below 1.7mm. Moreover, in-vivo experiments on a human volunteer validate the framework's effectiveness and robustness in a realistic clinical setting. Our work presents a RUSS holistically architected to unify high-bandwidth tracking with large-scale repositioning, a critical step towards robust autonomy in dynamic clinical environments.