TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the lack of specialized evaluation benchmarks for assessing large language models’ (LLMs) mathematical reasoning capabilities in telecommunications—spanning signal processing, network optimization, and performance analysis. To this end, we introduce TeleMath, the first domain-specific benchmark comprising 500 numerically answerable problems covering core telecommunication mathematics. We propose a domain-expert-driven, scalable, structured Q&A generation pipeline and establish a zero-shot and few-shot multi-model evaluation framework. Our key contributions are: (1) the release of TeleMath—the first dedicated benchmark for telecom mathematics; (2) open-sourcing of the TeleMath dataset and evaluation code; and (3) empirical findings demonstrating substantial limitations of general-purpose LLMs on these tasks, while math- and logic-specialized models achieve up to a 32.7% absolute accuracy gain.

Technology Category

Application Category

📝 Abstract

The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs in telecom-specific mathematical problem solving

Assess LLM performance in signal processing and network optimization

Bridge gap in specialized domain mathematical reasoning for LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

TeleMath benchmark for telecom math problems

QnA generation pipeline with expert-crafted seeds

Evaluation of LLMs on domain-specific performance

🔎 Similar Papers

Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications