HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak syntactic inference capabilities in few-shot settings and generate Backus-Naur Form (BNF) grammars with insufficient structural fidelity and semantic correctness. Method: We propose HyGenar, the first LLM-driven hybrid genetic algorithm for BNF generation, integrating prompt engineering, LLM-guided population initialization, symbolic evolutionary operators, grammar validity verification, and multi-objective fitness evaluation to jointly optimize structural evolution and semantic guidance. Contribution/Results: We introduce the first benchmark dataset comprising 540 BNF-generation tasks and a six-dimensional evaluation framework. Experiments across multiple mainstream LLMs demonstrate substantial improvements: average BNF structural accuracy increases by 37.2%, semantic compliance by 41.5%, and overall performance surpasses the best baseline by 28.6%.

Technology Category

Application Category

📝 Abstract

Grammar plays a critical role in natural language processing and text/code generation by enabling the definition of syntax, the creation of parsers, and guiding structured outputs. Although large language models (LLMs) demonstrate impressive capabilities across domains, their ability to infer and generate grammars has not yet been thoroughly explored. In this paper, we aim to study and improve the ability of LLMs for few-shot grammar generation, where grammars are inferred from sets of a small number of positive and negative examples and generated in Backus-Naur Form. To explore this, we introduced a novel dataset comprising 540 structured grammar generation challenges, devised 6 metrics, and evaluated 8 various LLMs against it. Our findings reveal that existing LLMs perform sub-optimally in grammar generation. To address this, we propose an LLM-driven hybrid genetic algorithm, namely HyGenar, to optimize grammar generation. HyGenar achieves substantial improvements in both the syntactic and semantic correctness of generated grammars across LLMs.

Problem

Research questions and friction points this paper is trying to address.

Improving LLMs' ability for few-shot grammar generation

Evaluating LLMs on structured grammar generation challenges

Proposing HyGenar to optimize grammar generation correctness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid genetic algorithm optimizes grammar generation

LLM-driven approach enhances syntactic correctness

Novel dataset evaluates grammar generation challenges

🔎 Similar Papers

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models