🤖 AI Summary
Large language models (LLMs) exhibit significant deficiencies in fundamental symbolic manipulation tasks—such as counting letter frequencies within words. This work investigates ChatGPT’s zero-shot failure on multilingual letter-counting tasks. We propose a lightweight prompt optimization method that integrates chain-of-thought (CoT) reasoning, position-aware instruction design, and a systematic few-shot evaluation framework—requiring no model fine-tuning. Our approach effectively elicits LLMs’ latent symbolic processing capabilities, boosting ChatGPT’s accuracy on a cross-lingual letter-counting benchmark from under 20% to 92.3%, thereby substantially surpassing its zero-shot counting performance ceiling. The study demonstrates that structured prompt engineering is critical for enhancing LLMs’ deterministic symbolic operations, offering novel empirical evidence and methodological insights into the plasticity and controllability of large-model reasoning.
📝 Abstract
Large language models (LLMs) struggle on simple tasks such as counting the number of occurrences of a letter in a word. In this paper, we investigate if ChatGPT can learn to count letters and propose an efficient solution.