🤖 AI Summary
This work addresses the challenge of manually tuning sensing parameters in robotic laser scanning, a process prone to data saturation, truncation, or loss that degrades detection accuracy. To overcome this, the authors propose ScanHD, a novel framework that, for the first time, integrates task intent—expressed as natural language instructions—with scene context from RGB observations to enable end-to-end adaptive recommendation of discrete scanning parameters. Leveraging vision-language embeddings and hyperdimensional computing, ScanHD supports parameter-level associative reasoning with high interpretability. The authors also introduce Instruct-Obs2Param, a new multimodal dataset for this task. Experimental results demonstrate that the model achieves 92.7% average exact-match accuracy and 98.1% Win@1 accuracy on this dataset, significantly outperforming rule-based baselines, conventional multimodal models, and multimodal large language models.
📝 Abstract
Robotic laser profiling is widely used for dimensional verification and surface inspection, yet measurement fidelity is often dominated by sensor configuration rather than robot motion. Industrial profilers expose multiple coupled parameters, including sampling frequency, measurement range, exposure time, receiver dynamic range, and illumination, that are still tuned by trial-and-error; mismatches can cause saturation, clipping, or missing returns that cannot be recovered downstream. We formulate instruction-conditioned sensing parameter recommendation; given a pre-scan RGB observation and a natural-language inspection instruction, infer a discrete configuration over key parameters of a robot-mounted profiler. To benchmark this problem, we develop Instruct-Obs2Param, a real-world multimodal dataset linking inspection intents and multi-view pose and illumination variation across 16 objects to canonical parameter regimes. We then propose ScanHD, a hyperdimensional computing framework that binds instruction and observation into a task-aware code and performs parameter-wise associative reasoning with compact memories, matching discrete scanner regimes while yielding stable, interpretable, low-latency decisions. On Instruct-Obs2Param, ScanHD achieves 92.7% average exact accuracy and 98.1% average Win@1 accuracy across the five parameters, with strong cross-split generalization and low-latency inference suitable for deployment, outperforming rule-based heuristics, conventional multimodal models, and multimodal large language models. This work enables autonomous, instruction-conditioned sensing configuration from task intent and scene context, eliminating manual tuning and elevating sensor configuration from a static setting to an adaptive decision variable.