Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of detecting astrophysical signals—such as gravitational waves—in real observational data characterized by strong non-Gaussian, non-stationary noise and severe scarcity of labeled examples. We investigate the modeling advantages of large language models (LLMs) over conventional neural networks in this low-resource regime. We propose a novel LLM fine-tuning paradigm for few-shot astronomical time-series classification that bypasses reliance on large-scale synthetic waveforms and instead learns discriminative temporal structures directly from authentic observations. Fine-tuned on only 90 real LIGO gravitational-wave events, our model achieves 97.4% detection accuracy—substantially outperforming CNN and LSTM baselines trained on the same limited dataset. Performance scales predictably with both model size and data quantity. This work constitutes the first empirical validation of LLMs’ strong generalization capability in high-noise, low-label astrophysical tasks, establishing a new paradigm for scientific discovery under extreme annotation scarcity.

Technology Category

Application Category

📝 Abstract
This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary noise and limited labeled samples. Gravitational wave observations provide an suitable test case, using only 90 LIGO events, finetuned LLMs achieve 97.4% accuracy for identifying signals. Further experiments show that, in contrast to traditional networks that rely on large simulated datasets, additional simulated samples do not improve LLM performance, while scaling studies reveal predictable gains with increasing model size and dataset size. These results indicate that LLMs can extract discriminative structure directly from observational data and provide an efficient assessment for gravitational wave identification. The same strategy may extend to other astronomical domains with similar noise properties, such as radio or pulsar observations.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs vs traditional networks for astronomical data with noisy, limited samples.
Testing LLMs on gravitational wave identification using only 90 real LIGO events.
Extending LLM efficiency to other astronomical domains with similar noise challenges.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs achieve high accuracy with limited real gravitational wave data
LLMs do not require large simulated datasets for performance
LLMs extract discriminative structure directly from observational data
🔎 Similar Papers
No similar papers found.
Y
Yixuan Li
School of Mathematics and Physics, University of South China, Hengyang, 421001, China
Y
Yuhao Lu
School of Computer Science, University of South China, Hengyang, 421001, China
Y
Yang Liu
Department of Physics E. Pancini, University Federico II, Naples 80126, Italy
L
Liang Li
Institute of Fundamental Physics and Quantum Technology, Ningbo University, Ningbo, Zhejiang 315211, People’s Republic of China
R
R. Ruffini
ICRANet, Piazza della Repubblica 10, 65122 Pescara, Italy
Di Li
Di Li
Associate Professor of Finance, Peking University HSBC Business School
Mergers and AcquisitionsCorporate GovernanceCorporate FinanceStructural Estimation
R
Rong-Gen Cai
Institute of Fundamental Physics and Quantum Technology, Ningbo University, Ningbo, Zhejiang 315211, People’s Republic of China
Xiaoyan Zhu
Xiaoyan Zhu
Tsinghua University
W
Wenbin Lin
School of Computer Science, University of South China, Hengyang, 421001, China
Y
Yu Wang
ICRA, Dip. di Fisica, Sapienza Universit`a di Roma, Piazzale Aldo Moro 5, I-00185 Roma, Italy