Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing vision-language-action models rely on discrete waypoint prediction, which struggles to capture the continuity of physical motion, leading to limited sampling resolution, lack of higher-order differentiability, and quantization artifacts. This work proposes the Neural Implicit Action Field (NIAF), which, for the first time, formulates action representation as a continuous and differentiable function. NIAF leverages a multimodal large language model as a hierarchical spectral modulator to generate trajectories of effectively infinite resolution atop a learnable motion prior. The framework enables explicit supervision over velocity, acceleration, and jerk, thereby achieving seamless integration between semantic understanding and dynamic execution. It attains state-of-the-art performance on the CALVIN and LIBERO benchmarks and demonstrates robust impedance control in real-world robotic experiments.

Technology Category

Application Category

📝 Abstract

Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), a paradigm shift that reformulates action prediction from discrete waypoints to continuous action function regression. By utilizing an MLLM as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes infinite-resolution trajectories as continuous-time manifolds. This formulation enables analytical differentiability, allowing for explicit supervision of velocity, acceleration, and jerk to ensure mathematical consistency and physical plausibility. Our approach achieves state-of-the-art results on CALVIN and LIBERO benchmarks across diverse backbones. Furthermore, real-world experiments demonstrate that NIAF enables stable impedance control, bridging the gap between high-level semantic understanding and low-level dynamic execution.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

discrete waypoints

continuous motion

quantization artifacts

physical plausibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Implicit Action Fields

continuous action representation

vision-language-action models