A Vision-Language-Action Model for Adaptive Ultrasound-Guided Needle Insertion and Needle Tracking

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This study addresses the significant challenges in ultrasound-guided needle insertion, which arise from the highly dynamic imaging environment and poor needle visibility. Existing automated approaches rely on handcrafted pipelines and exhibit limited performance in complex clinical scenarios. To overcome these limitations, this work proposes an end-to-end vision–language–action (VLA) model that unifies adaptive needle tracking and insertion control within a single framework. Key innovations include a Cross-Depth Fusion tracking head, a Tracking-Conditioning feature modulation mechanism, an uncertainty-aware control strategy, and an asynchronous VLA pipeline. Experimental results demonstrate that the proposed method substantially outperforms state-of-the-art techniques and human operators in terms of tracking accuracy, insertion success rate, and procedural efficiency, thereby enhancing the safety and robustness of robotic ultrasound-guided interventions.

Technology Category

Application Category

📝 Abstract

Ultrasound (US)-guided needle insertion is a critical yet challenging procedure due to dynamic imaging conditions and difficulties in needle visualization. Many methods have been proposed for automated needle insertion, but they often rely on hand-crafted pipelines with modular controllers, whose performance degrades in challenging cases. In this paper, a Vision-Language-Action (VLA) model is proposed for adaptive and automated US-guided needle insertion and tracking on a robotic ultrasound (RUS) system. This framework provides a unified approach to needle tracking and needle insertion control, enabling real-time, dynamically adaptive adjustment of insertion based on the obtained needle position and environment awareness. To achieve real-time and end-to-end tracking, a Cross-Depth Fusion (CDF) tracking head is proposed, integrating shallow positional and deep semantic features from the large-scale vision backbone. To adapt the pretrained vision backbone for tracking tasks, a Tracking-Conditioning (TraCon) register is introduced for parameter-efficient feature conditioning. After needle tracking, an uncertainty-aware control policy and an asynchronous VLA pipeline are presented for adaptive needle insertion control, ensuring timely decision-making for improved safety and outcomes. Extensive experiments on both needle tracking and insertion show that our method consistently outperforms state-of-the-art trackers and manual operation, achieving higher tracking accuracy, improved insertion success rates, and reduced procedure time, highlighting promising directions for RUS-based intelligent intervention.

Problem

Research questions and friction points this paper is trying to address.

Ultrasound-guided needle insertion

needle tracking

robotic ultrasound

adaptive control

medical robotics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action model

Cross-Depth Fusion

Tracking-Conditioning