š¤ AI Summary
This work investigates the problem of accurately recovering the internal attention parameters of a Transformer model when only black-box access to its outputs is available. For single-head softmax attention regressors, the paper provides the first theoretical guarantee that the parameters are exactly learnable, introducing an adaptive random query algorithm based on compressed sensing that reduces query complexity to O(d²), and further improves it to O(rd) when the head dimension r satisfies r ⪠d. By reducing the problem to ReLU feedforward networks and incorporating robustness analysis, the method achieves ε-accurate estimation under noise with only polynomial query complexity. The study also establishes a fundamental identifiability barrier for multi-head attention, showing that its parameters cannot be uniquely recovered without additional structural assumptions.
š Abstract
We study the problem of learning Transformer-based sequence models with black-box access to their outputs. In this setting, a learner may adaptively query the oracle with any sequence of vectors and observe the corresponding real-valued output. We begin with the simplest case, a single-head softmax-attention regressor. We show that for a model with width $d$, there is an elementary algorithm to learn the parameters of single-head attention exactly with $O(d^2)$ queries. Further, we show that if there exists an algorithm to learn ReLU feedforward networks (FFNs), then the single-head algorithm can be easily adapted to learn one-layer Transformers with single-head attention. Next, motivated by the regime where the head dimension $r \ll d$, we provide a randomised algorithm that learns single-head attention-based models with $O(rd)$ queries via compressed sensing arguments. We also study robustness to noisy oracle access, proving that under mild norm and margin conditions, the parameters can be estimated to $\varepsilon$ accuracy with a polynomial number of queries even when outputs are only provided up to additive tolerance. Finally, we show that multi-head attention parameters are not identifiable from value queries in general -- distinct parameterisations can induce the same input-output map. Hence, guarantees analogous to the single-head setting are impossible without additional structural assumptions.