NineToothed: A Triton-Based High-Level Domain-Specific Language for Machine Learning

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Deep learning domain-specific languages (DSLs) such as Triton require expertise in parallel programming and expose low-level hardware details, leading to high kernel development and maintenance costs. To address this, we propose NineToothed—a high-level DSL for machine learning that supports sequential programming while automatically generating efficient parallel code. Our key contributions are: (1) tensor-oriented metaprogramming (TOM), enabling abstract, block-wise computation specification; (2) an “arrange-apply” paradigm that decouples algorithmic logic from hardware-specific scheduling; and (3) a fully automated sequential-to-parallel translation framework with a high-performance code generator. Evaluation shows that NineToothed achieves near-Triton performance—with average overhead under 5%—while significantly reducing development complexity, improving maintainability, and enhancing programmer productivity.

Technology Category

Application Category

📝 Abstract

The emergence of deep learning domain-specific languages (DSLs) has substantially reduced the obstacles in developing high-performance, cross-platform compute kernels. However, current DSLs, such as Triton, still demand that developers possess expertise in parallel programming and expose them to many low-level details. This requirement complicates the development process and adds to the difficulty of maintaining compute kernels. Consequently, developing a new programming model that supports serial programming for deep learning workloads is crucial. This paper introduces NineToothed, a domain-specific language that offers serial semantics for machine learning programming. Through the automatic transformation of serial code into parallel code, NineToothed significantly streamlines the development process while causing minimal performance degradation. NineToothed encompasses (1) a language with tensor-oriented metaprogramming (TOM) that adopts the arrange-and-apply paradigm, enabling the expression of tiled computations without the need to manage low-level details and (2) a code generator for generating high-performance parallel code. Our evaluation results indicate that NineToothed can greatly simplify compute kernel development while maintaining performance comparable to that of Triton.

Problem

Research questions and friction points this paper is trying to address.

Reducing complexity in developing high-performance ML kernels

Eliminating need for parallel programming expertise in DSLs

Enabling serial-to-parallel code transformation for ML workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

Serial semantics for ML programming

Automatic serial-to-parallel code transformation

Tensor-oriented metaprogramming with arrange-and-apply paradigm

🔎 Similar Papers

No similar papers found.