TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of specialized evaluation benchmarks for assessing large language models’ (LLMs) mathematical reasoning capabilities in telecommunications—spanning signal processing, network optimization, and performance analysis. To this end, we introduce TeleMath, the first domain-specific benchmark comprising 500 numerically answerable problems covering core telecommunication mathematics. We propose a domain-expert-driven, scalable, structured Q&A generation pipeline and establish a zero-shot and few-shot multi-model evaluation framework. Our key contributions are: (1) the release of TeleMath—the first dedicated benchmark for telecom mathematics; (2) open-sourcing of the TeleMath dataset and evaluation code; and (3) empirical findings demonstrating substantial limitations of general-purpose LLMs on these tasks, while math- and logic-specialized models achieve up to a 32.7% absolute accuracy gain.

Technology Category

Application Category

📝 Abstract
The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs in telecom-specific mathematical problem solving
Assess LLM performance in signal processing and network optimization
Bridge gap in specialized domain mathematical reasoning for LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

TeleMath benchmark for telecom math problems
QnA generation pipeline with expert-crafted seeds
Evaluation of LLMs on domain-specific performance
V
Vincenzo Colle
Universit`a degli Studi di Cassino e del Lazio Meridionale, Cassino, Italy
M
Mohamed Sana
Paris Research Center, Huawei Technologies, Boulogne-Billancourt, France
Nicola Piovesan
Nicola Piovesan
Huawei Technologies
Mobile networksEnergy efficiencyMachine learningLarge Language ModelsGenerative AI
Antonio De Domenico
Antonio De Domenico
Huawei Technologies
machine learningmobile networks5Gwireless communications
Fadhel Ayed
Fadhel Ayed
Department of Statistics, University of Oxford
StatisticsMachine Learning
M
M. Debbah
Khalifa University of Science and Technology, Abu Dhabi, UAE