A mathematical framework for parameter recovery in large language models via a joint Euclidean mirror

πŸ“… 2026-04-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the problem of inferring fine-tuning parameters of black-box large language models from their output responses. Treating the mapping from model parameters to response distributions as a geometrically structured family of probability measures, the authors propose a β€œJoint Euclidean Mirror” framework that embeds discrepancies between response distributions into a low-dimensional Euclidean space. This approach achieves, for the first time, consistent recovery of unknown fine-tuning parameters while endowing the response space with an interpretable geometric structure. Theoretical analysis and experiments demonstrate that distinct fine-tuning parameters correspond to distinguishable directions in the embedding space, and the proposed estimator exhibits statistical consistency and favorable asymptotic properties.
πŸ“ Abstract
Understanding the behavior of black-box large language models and determining effective means of comparing their performance is a key task in modern machine learning. We consider how large language models respond to a specific query by analyzing how the distributions of responses vary over different values of tuning parameters. We frame this problem in a general mathematical setting, treating the mapping from model parameters to response distributions as a structured family of probability measures, endowed with a geometry via a dissimilarity measure. We show how dissimilarities between response distributions can be represented in low-dimensional Euclidean space through a joint Euclidean mirror surface encoding the underlying geometry, which permits both qualitative and quantitative analysis of large language models and provides insight into predicting response distributions for different values of tuning parameters. We propose an estimation procedure for the underlying joint Euclidean mirror based on observed samples from the response distributions, and we prove its asymptotic properties. Additionally, we propose a statistically consistent procedure to infer the value of an unknown model parameter based on samples from the corresponding response distribution and the estimated joint Euclidean mirror. In an experimental setting with large language models, we find that changes in different tuning parameter values correspond to distinct directions in the embedding space, making it possible to estimate the tuning parameters that were used to generate a given response.
Problem

Research questions and friction points this paper is trying to address.

parameter recovery
large language models
response distributions
tuning parameters
black-box models
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint Euclidean mirror
parameter recovery
response distribution geometry
large language models
dissimilarity embedding
πŸ”Ž Similar Papers
No similar papers found.