Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

📅 2024-06-17

📈 Citations: 1

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Transformer hidden states in motion prediction lack clear physical semantics and are difficult to edit controllably. Method: We propose an interpretable control framework leveraging supervised linear probes and sparse autoencoders (SAEs). First, linear probes identify functionally meaningful, physically grounded directions (e.g., velocity, heading) in the latent space. Second, we construct additive, semantically aligned linear control vectors enabling zero-shot generalization to unseen motion patterns. Third, SAEs refine latent representations to enhance control linearity and mechanistic interpretability. Results: Controlled predictions preserve physical plausibility; control responses exhibit high linearity; zero-shot adaptation incurs only millisecond-level inference overhead—no fine-tuning required. This work establishes the first method for extracting semantically explicit, plug-and-play linear control vectors and enabling generalized, interpretable modulation in motion Transformers.

Technology Category

Application Category

📝 Abstract

Transformer-based models generate hidden states that are difficult to interpret. In this work, we analyze hidden states and modify them at inference, with a focus on motion forecasting. We use linear probing to analyze whether interpretable features are embedded in hidden states. Our experiments reveal high probing accuracy, indicating latent space regularities with functionally important directions. Building on this, we use the directions between hidden states with opposing features to fit control vectors. At inference, we add our control vectors to hidden states and evaluate their impact on predictions. Remarkably, such modifications preserve the feasibility of predictions. We further refine our control vectors using sparse autoencoders (SAEs). This leads to more linear changes in predictions when scaling control vectors. Our approach enables mechanistic interpretation as well as zero-shot generalization to unseen dataset characteristics with negligible computational overhead.

Problem

Research questions and friction points this paper is trying to address.

Interpret hidden states in Transformer-based motion forecasting models.

Extract and modify control vectors to influence motion predictions.

Enable zero-shot generalization with minimal computational overhead.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear probing extracts interpretable motion features.

Control vectors modify hidden states at inference.

Sparse autoencoders refine control vectors for linearity.

🔎 Similar Papers

No similar papers found.