CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices

๐Ÿ“… 2025-04-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Real-time function calling by large language models (LLMs) on edge devices incurs excessive power consumption and high carbon emissions. Method: We propose the first sustainability-first function-calling framework for edge LLMs, featuring a novel carbon-aware execution mechanism that jointly optimizes dynamic tool selection, real-time carbon-intensity-driven adaptive power-threshold adjustment, and coordinated scheduling of multi-precision LLM variants. The framework integrates carbon-intensity forecasting, dynamic power gating, and a lightweight tool selector, enabling end-to-end optimization on the Jetson AGX Orin platform. Contributions/Results: Experiments demonstrate a 52% reduction in carbon emissions, 30% lower power consumption, and 30% decreased end-to-end latency versus baselinesโ€”while sustaining high token-per-second throughput. To our knowledge, this is the first work to achieve joint optimization of energy efficiency, latency, and carbon footprint in edge LLM inference.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) enable real-time function calling in edge AI systems but introduce significant computational overhead, leading to high power consumption and carbon emissions. Existing methods optimize for performance while neglecting sustainability, making them inefficient for energy-constrained environments. We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized LLM adaptation. CarbonCall adjusts power thresholds based on real-time carbon intensity forecasts and switches between model variants to sustain high tokens-per-second throughput under power constraints. Experiments on an NVIDIA Jetson AGX Orin show that CarbonCall reduces carbon emissions by up to 52%, power consumption by 30%, and execution time by 30%, while maintaining high efficiency.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in edge AI LLMs
Addressing high power consumption and emissions
Optimizing sustainability without sacrificing performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic tool selection for sustainability
Carbon-aware execution with real-time forecasts
Quantized LLM adaptation under constraints
๐Ÿ”Ž Similar Papers
No similar papers found.