StringLLM: Understanding the String Processing Capability of Large Language Models

📅 2024-10-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Large language models (LLMs) exhibit significant deficiencies in string analysis and manipulation tasks, despite their broad linguistic capabilities. Method: We introduce StringBench—the first dedicated benchmark for systematic evaluation of string reasoning—and propose StringLLM, a data construction methodology integrating instruction tuning with task-oriented synthetic data generation to enhance targeted capabilities. Contribution/Results: Experiments reveal consistently low accuracy of mainstream LLMs on the original benchmark; fine-tuning with StringLLM yields an average accuracy improvement of 32.7%. We further establish a multi-dimensional analytical framework that identifies critical bottlenecks in length sensitivity, pattern generalization, and symbolic reasoning. All resources—including code, datasets, and the benchmark—are publicly released, establishing a new paradigm for fine-grained assessment and capability enhancement of LLMs in formal string processing.

Technology Category

Application Category

📝 Abstract

String processing, which mainly involves the analysis and manipulation of strings, is a fundamental component of modern computing. Despite the significant advancements of large language models (LLMs) in various natural language processing (NLP) tasks, their capability in string processing remains underexplored and underdeveloped. To bridge this gap, we present a comprehensive study of LLMs' string processing capability. In particular, we first propose StringLLM, a method to construct datasets for benchmarking string processing capability of LLMs. We use StringLLM to build a series of datasets, referred to as StringBench. It encompasses a wide range of string processing tasks, allowing us to systematically evaluate LLMs' performance in this area. Our evaluations indicate that LLMs struggle with accurately processing strings compared to humans. To uncover the underlying reasons for this limitation, we conduct an in-depth analysis and subsequently propose an effective approach that significantly enhances LLMs' string processing capability via fine-tuning. This work provides a foundation for future research to understand LLMs' string processing capability. Our code and data are available at https://github.com/wxl-lxw/StringLLM.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Text String Tasks

Performance and Limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

StringLLM

StringBench

Fine-tuning

🔎 Similar Papers

No similar papers found.