DualTrack: Sensorless 3D Ultrasound needs Local and Global Context

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address inaccurate probe trajectory estimation and insufficient robustness in modeling both local and global features in sensorless 3D ultrasound, this paper proposes DualTrack—a dual-encoder architecture that decouples learning of local (inter-frame subtle motion) and global (anatomical contextual) features. Local features are extracted via dense spatiotemporal convolution, while global features are encoded using an image backbone (e.g., 2D CNN or foundation model). A temporal attention mechanism and a lightweight multi-scale fusion module are further integrated to enhance feature alignment and temporal coherence. This design avoids performance degradation caused by feature coupling in conventional methods. Evaluated on a large public benchmark, DualTrack achieves a mean reconstruction error below 5 mm—significantly outperforming state-of-the-art approaches—and delivers high-accuracy, globally consistent 3D ultrasound reconstructions.

Technology Category

Application Category

📝 Abstract

Three-dimensional ultrasound (US) offers many clinical advantages over conventional 2D imaging, yet its widespread adoption is limited by the cost and complexity of traditional 3D systems. Sensorless 3D US, which uses deep learning to estimate a 3D probe trajectory from a sequence of 2D US images, is a promising alternative. Local features, such as speckle patterns, can help predict frame-to-frame motion, while global features, such as coarse shapes and anatomical structures, can situate the scan relative to anatomy and help predict its general shape. In prior approaches, global features are either ignored or tightly coupled with local feature extraction, restricting the ability to robustly model these two complementary aspects. We propose DualTrack, a novel dual-encoder architecture that leverages decoupled local and global encoders specialized for their respective scales of feature extraction. The local encoder uses dense spatiotemporal convolutions to capture fine-grained features, while the global encoder utilizes an image backbone (e.g., a 2D CNN or foundation model) and temporal attention layers to embed high-level anatomical features and long-range dependencies. A lightweight fusion module then combines these features to estimate the trajectory. Experimental results on a large public benchmark show that DualTrack achieves state-of-the-art accuracy and globally consistent 3D reconstructions, outperforming previous methods and yielding an average reconstruction error below 5 mm.

Problem

Research questions and friction points this paper is trying to address.

Estimating 3D probe trajectory from 2D ultrasound images

Decoupling local and global feature extraction for robustness

Achieving accurate and globally consistent 3D reconstructions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-encoder architecture for decoupled feature extraction

Local encoder uses spatiotemporal convolutions

Global encoder employs temporal attention layers

🔎 Similar Papers

No similar papers found.

Authors to Follow