🤖 AI Summary
Astronomical data analysis frequently suffers from biased parameter estimates due to model misspecification—particularly the presence of outliers—while conventional heuristic approaches (e.g., sigma-clipping) rely on subjective thresholds and lack statistical consistency. To address this, we propose a Bayesian linear regression framework based on the Student’s *t* distribution, the first systematic application of this approach to astronomical modeling. Its heavy-tailed likelihood inherently confers robustness against outliers without requiring ad hoc data rejection. Inference is performed via Markov Chain Monte Carlo (MCMC), and we release an open-source Python package, *t-cup*, to facilitate implementation. Experiments on both synthetic and real astronomical datasets demonstrate substantially reduced estimation bias, consistent performance with established robust estimators, and minimal efficiency loss—introducing at most 10% additional uncertainty in outlier-free scenarios. Our core contribution is a principled, statistically rigorous, and robust alternative to threshold-based outlier handling, eliminating the need for pre-specified clipping criteria and offering broad applicability across astronomical modeling tasks.
📝 Abstract
Model mis-specification (e.g. the presence of outliers) is commonly encountered in astronomical analyses, often requiring the use of ad hoc algorithms (e.g. sigma-clipping). We develop and implement a generic Bayesian approach to linear regression, based on Student's t-distributions, that is robust to outliers and mis-specification of the noise model. Our method is validated using simulated datasets with various degrees of model mis-specification; the derived constraints are shown to be systematically less biased than those from a similar model using normal distributions. We demonstrate that, for a dataset without outliers, a worst-case inference using t-distributions would give unbiased results with $lesssim!10$ per cent increase in the reported parameter uncertainties. We also compare with existing analyses of real-world datasets, finding qualitatively different results where normal distributions have been used and agreement where more robust methods have been applied. A Python implementation of this model, t-cup, is made available for others to use.