Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

To address the challenge of simultaneously achieving dialect-agnosticism and dialect-effectiveness in compiler testing for low-resource language dialects, this paper proposes a fully automated, seedless test generation method. It pioneers the integration of pretrained large language models (LLMs) with automatically extracted dialect-specific syntactic structures to produce high-quality, dialect-aware initial inputs. Built upon the MLIR framework, the approach combines grammar-guided fuzzing with coverage-driven feedback. Evaluated across 91 language dialects, it achieves an average line coverage improvement of 10%–120%. The method uncovered 88 previously unknown vulnerabilities (40 confirmed), including 23 from dialects previously lacking automated test coverage. Its core contribution lies in unifying cross-dialect generalizability with dialect-specific testing capability—without requiring manual annotation or handcrafted seed inputs.

Technology Category

Application Category

📝 Abstract

Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural and type constraints, can often be extracted automatically from the dialect specification; and (ii) these grammars can be used in combination with pre-trained large language models to automatically generate representative and diverse seed inputs from the full dialect space without requiring any manual input or training data. These seeds can then be used to bootstrap coverage-guided fuzzers. We built this approach into a tool, Germinator. When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines. We compare against grammar-based baselines because they are the only class of existing automatic seed generators that can be applied uniformly across MLIR's heterogeneous dialect ecosystem. Germinator discovers 88 previously unknown bugs (40 confirmed), including 23 in dialects with no prior automated test generators, demonstrating effective and controllable testing of low-resource dialects at scale.

Problem

Research questions and friction points this paper is trying to address.

Automated test generation for extensible compiler dialects

Ensuring dialect-agnostic and dialect-effective fuzzing

Bootstrapping fuzzers for low-resource language dialects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically extracts grammars from dialect specifications

Uses large language models to generate diverse seed inputs

Bootstraps coverage-guided fuzzers for low-resource dialects

🔎 Similar Papers

On the Challenges of Fuzzing Techniques via Large Language Models