Learning Parametric Distributions from Samples and Preferences

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work investigates how preference feedback improves estimation accuracy for unknown parameters of continuous parametric distributions (e.g., Gaussian, Laplace). Addressing the Θ(1/√n) convergence bottleneck of conventional methods relying solely on i.i.d. samples, we propose a novel M-estimator incorporating deterministic pairwise preference constraints. We establish, for the first time, that under mild regularity conditions, such deterministic preferences accelerate the estimation error rate to O(1/n), and we derive a matching information-theoretic lower bound. Our theoretical analysis confirms that the estimator achieves asymptotically optimal 1/n-rate convergence over broad distribution families satisfying these conditions—substantially outperforming sample-only baselines. Empirical results corroborate the predicted statistical gains. This work introduces the first statistically optimal paradigm for preference-augmented parameter estimation, providing rigorous, rate-optimal guarantees for learning from pairwise comparisons.

Technology Category

Application Category

📝 Abstract

Recent advances in language modeling have underscored the role of preference feedback in enhancing model performance. This paper investigates the conditions under which preference feedback improves parameter estimation in classes of continuous parametric distributions. In our framework, the learner observes pairs of samples from an unknown distribution along with their relative preferences depending on the same unknown parameter. We show that preference-based M-estimators achieve a better asymptotic variance than sample-only M-estimators, further improved by deterministic preferences. Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $mathcal{O}(1/n)$ -- a significant improvement over the $Theta(1/sqrt{n})$ rate attainable with samples alone. Next, we establish a lower bound that matches this accelerated rate; up to dimension and problem-dependent constants. While the assumptions underpinning our analysis are restrictive, they are satisfied by notable cases such as Gaussian or Laplace distributions for preferences based on the log-probability reward.

Problem

Research questions and friction points this paper is trying to address.

Investigates conditions for preference feedback improving parameter estimation in distributions

Shows preference-based M-estimators outperform sample-only estimators in asymptotic variance

Proves lower bound matching accelerated error rate under restrictive assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-based M-estimators enhance asymptotic variance

Deterministic preferences improve estimation error scaling

Lower bound matches accelerated estimation rate

🔎 Similar Papers

No similar papers found.