Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

This work addresses the high computational costs, substantial carbon emissions, and inefficient deployment of large language models in software engineering by introducing, for the first time, a carbon pricing mechanism into model compression. Inspired by carbon taxation, the proposed approach implements a systematic multi-stage compression pipeline that strategically integrates pruning, quantization, and knowledge distillation. By penalizing inefficient architectural components and rewarding energy-efficient compression, the method is adaptable to diverse architectures—including encoder-only, decoder-only, and encoder-decoder models. Evaluated across multiple software engineering tasks, it retains 89%–98% of the original model performance while achieving up to 49× memory reduction, 10× faster inference, and an 81% decrease in carbon emissions, thereby effectively balancing performance, efficiency, and environmental sustainability.

📝 Abstract

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. This reality threatens not only the scalability and accessibility of AI-powered SE, but also its long-term environmental sustainability. The research challenge is clear: we must go beyond accuracy and address efficiency and environmental cost as first-class design constraints. To meet this challenge, we introduce Carbon-Taxed Transformers (CTT), a systematic multi-architectural compression principled pipeline ordering inspired by economic carbon taxation principles. Drawing from the economic concept of carbon pricing, CTT operationalizes a computational carbon tax that penalizes architectural inefficiencies and rewards deployment-ready compression. We evaluate CTT across three core SE tasks: code clone detection, code summarization, and code generation, with models spanning encoder-only, encoder-decoder, and decoder-only architecture. Our results show that CTT delivers on inference: (1) up to 49x memory reduction, (2) time reduction up to 8-10x for clone detection, up to 3x for summarization, and 4-7x for generation, (3) up to 81% reduction in CO2 emissions and (4) CTT retains around 98% accuracy on clone detection, around 89% on summarization, and up to 91% (textual metrics) and 68% (pass@1) for generation. Two ablation studies show that pipeline ordering and individual component contributions are both essential, providing empirical justification for CTT's design and effectiveness. This work establishes a viable path toward responsible AI in SE through aggressive yet performance-preserving compression.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Computational Cost

Carbon Emissions

Model Compression

Software Engineering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Carbon-Taxed Transformers

Model Compression

Green AI