KForge: Program Synthesis for Diverse AI Hardware Accelerators

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Architectural fragmentation across AI accelerators hinders efficient cross-platform optimization of GPU kernels. Method: This paper proposes a platform-agnostic, LLM-driven program synthesis framework comprising a generation agent and a performance analysis agent. It enables single-example-driven cross-platform adaptation through iterative optimization guided by compiler feedback, correctness verification, and parsing of multi-source performance data—including outputs from API- and GUI-based profiling tools. Contribution/Results: The framework introduces novel cross-architecture knowledge transfer and automates high-performance kernel generation for diverse hardware backends (e.g., NVIDIA CUDA, Apple Metal). Experimental evaluation demonstrates substantial improvements in code quality and optimization efficiency, achieving superior performance over hand-tuned implementations across heterogeneous platforms.

Technology Category

Application Category

📝 Abstract

GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a generation agent that produces and iteratively refines programs through compilation and correctness feedback, and a performance analysis agent that interprets profiling data to guide optimization. This agent-based architecture requires only a single-shot example to target new platforms. We make three key contributions: (1) introducing an iterative refinement system where the generation agent and performance analysis agent collaborate through functional and optimization passes, interpreting diverse profiling data (from programmatic APIs to GUI-based tools) to generate actionable recommendations that guide program synthesis for arbitrary accelerators; (2) demonstrating that the generation agent effectively leverages cross-platform knowledge transfer, where a reference implementation from one architecture substantially improves generation quality for different hardware targets; and (3) validating the platform-agnostic nature of our approach by demonstrating effective program synthesis across fundamentally different parallel computing platforms: NVIDIA CUDA and Apple Metal.

Problem

Research questions and friction points this paper is trying to address.

Optimizing GPU kernels across diverse AI hardware accelerators

Automating program synthesis through collaborative LLM-based agents

Enabling platform-agnostic code generation with minimal examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Platform-agnostic framework with collaborative LLM-based agents

Iterative refinement system using compilation feedback and profiling

Cross-platform knowledge transfer requiring only single-shot examples

🔎 Similar Papers

No similar papers found.

Nvidia

The base salary range is 168,000 USD - 258,750 USD for Level 4, and 208,000 USD - 327,750 USD for Level 5. You will also be eligible for equity and benefits.

US, CA, Santa Clara

Software Engineer, ML platform and Infrastructure

Apple

San Francisco Bay Area, United States of America

Authors to Follow