From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the limitation of existing large language models (LLMs) in data augmentation, which typically rely on heuristic or brute-force search strategies without awareness of downstream task performance. The authors propose the first performance-aware closed-loop system that enables LLMs to internalize semantic-level performance signals—guided solely by empirical performance rankings without reinforcement learning or explicit reward models—and autonomously design task-aligned data transformation strategies. Leveraging a curated dataset of over 6,000 PyTorch augmentation functions annotated with downstream accuracy, the system aligns LLM outputs via LoRA fine-tuning and pairwise performance ranking. Compared to brute-force search, the approach reduces candidate evaluations by up to 600× while maintaining competitive peak accuracy, demonstrating that the model learns semantic performance cues rather than memorizing syntactic patterns.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved notable performance in code synthesis; however, data-aware augmentation remains a limiting factor, handled via heuristic design or brute-force approaches. We introduce a performance-aware, closed-loop solution in the NNGPT ecosystem of projects that enables LLMs to autonomously engineer optimal transformations by internalizing empirical performance cues. We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions, each annotated solely by downstream model accuracy. Training uses pairwise performance ordering (better-worse transformations), enabling alignment through empirical feedback without reinforcement learning, reward models, or symbolic objectives. This reduces the need for exhaustive search, achieving up to 600x times fewer evaluated candidates than brute-force discovery while maintaining competitive peak accuracy and shifting generation from random synthesis to task-aligned design. Ablation studies show that structured Chain-of-Thought prompting introduces syntactic noise and degrades performance, whereas direct prompting ensures stable optimization in performance-critical code tasks. Qualitative and quantitative analyses demonstrate that the model internalizes semantic performance cues rather than memorizing syntax. These results show that LLMs can exhibit task-level reasoning through non-textual feedback loops, bypassing explicit symbolic rewards.

Problem

Research questions and friction points this paper is trying to address.

data augmentation

large language models

code synthesis

performance-guided design

empirical feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

performance-guided transformation

LLM fine-tuning

data augmentation