LLMSR@XLLM25: An Empirical Study of LLM for Structural Reasoning

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the limitations of large language models (LLMs) in structured reasoning—specifically, their insufficient fine-grained control, interpretability, and logical fidelity. We propose a lightweight, zero-fine-tuning, retrieval-free, and ensemble-free method grounded solely in prompt engineering. Our approach employs multi-turn few-shot prompting to guide Llama-3-8B-Instruct in precisely extracting problem constraints and explicitly decomposing chain-of-thought reasoning into statement–evidence pairs, each validated for logical validity. To ensure structural consistency, we integrate regex-based span normalization and strict JSON Schema validation. Evaluated on LLMSR@XLLM25, our method ranks fifth overall; its macro-F1 score matches that of significantly more complex, resource-intensive baselines. This demonstrates that high-fidelity structured reasoning can be achieved efficiently through carefully designed prompting and minimal post-processing—without architectural modification or external components.

Technology Category

Application Category

📝 Abstract

We present Team asdfo123's submission to the LLMSR@XLLM25 shared task, which evaluates large language models on producing fine-grained, controllable, and interpretable reasoning processes. Systems must extract all problem conditions, decompose a chain of thought into statement-evidence pairs, and verify the logical validity of each pair. Leveraging only the off-the-shelf Meta-Llama-3-8B-Instruct, we craft a concise few-shot, multi-turn prompt that first enumerates all conditions and then guides the model to label, cite, and adjudicate every reasoning step. A lightweight post-processor based on regular expressions normalises spans and enforces the official JSON schema. Without fine-tuning, external retrieval, or ensembling, our method ranks 5th overall, achieving macro F1 scores on par with substantially more complex and resource-consuming pipelines. We conclude by analysing the strengths and limitations of our approach and outlining directions for future research in structural reasoning with LLMs. Our code is available at https://github.com/asdfo123/LLMSR-asdfo123.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for fine-grained controllable reasoning processes

Decomposing reasoning into verifiable statement-evidence pairs

Achieving competitive performance without fine-tuning or external resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot multi-turn prompt for reasoning steps

Regular expression-based post-processor normalization

Off-the-shelf LLM without fine-tuning or retrieval

🔎 Similar Papers

Large Language Model Enhanced Knowledge Representation Learning: A Survey