Carbon Aware Transformers Through Joint Model-Hardware Optimization

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior carbon footprint assessments of machine learning systems have largely neglected embodied emissions—such as those from hardware manufacturing—and lack a holistic, lifecycle-aware quantification and optimization framework. Method: We propose the first model-hardware co-optimization framework that jointly minimizes operational carbon (from training and inference) and embodied carbon (from hardware fabrication and lifetime). It is the first to incorporate embodied carbon into the neural architecture search (NAS) objective function, revealing fundamental discrepancies between carbon-optimal and conventional latency-/energy-optimal solutions. The framework enables early-stage, hardware-aware NAS for low-carbon Transformer architectures. Contribution/Results: Leveraging this framework, we develop the CarbonCLIP family, achieving a 17% reduction in total carbon emissions over a small-scale edge CLIP baseline—without compromising accuracy or inference latency.

Technology Category

Application Category

📝 Abstract
The rapid growth of machine learning (ML) systems necessitates a more comprehensive evaluation of their environmental impact, particularly their carbon footprint, which comprises operational carbon from training and inference execution and embodied carbon from hardware manufacturing and its entire life-cycle. Despite the increasing importance of embodied emissions, there is a lack of tools and frameworks to holistically quantify and optimize the total carbon footprint of ML systems. To address this, we propose CATransformers, a carbon-aware architecture search framework that enables sustainability-driven co-optimization of ML models and hardware architectures. By incorporating both operational and embodied carbon metrics into early design space exploration of domain-specific hardware accelerators, CATransformers demonstrates that optimizing for carbon yields design choices distinct from those optimized solely for latency or energy efficiency. We apply our framework to multi-modal CLIP-based models, producing CarbonCLIP, a family of CLIP models achieving up to 17% reduction in total carbon emissions while maintaining accuracy and latency compared to state-of-the-art edge small CLIP baselines. This work underscores the need for holistic optimization methods to design high-performance, environmentally sustainable AI systems.
Problem

Research questions and friction points this paper is trying to address.

Lack tools to quantify ML systems' total carbon footprint
Need joint optimization of models and hardware for sustainability
Current designs prioritize latency over carbon emissions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Carbon-aware architecture search framework
Joint model-hardware co-optimization
Holistic operational and embodied carbon metrics
🔎 Similar Papers
No similar papers found.