๐ค AI Summary
Real-time function calling by large language models (LLMs) on edge devices incurs excessive power consumption and high carbon emissions. Method: We propose the first sustainability-first function-calling framework for edge LLMs, featuring a novel carbon-aware execution mechanism that jointly optimizes dynamic tool selection, real-time carbon-intensity-driven adaptive power-threshold adjustment, and coordinated scheduling of multi-precision LLM variants. The framework integrates carbon-intensity forecasting, dynamic power gating, and a lightweight tool selector, enabling end-to-end optimization on the Jetson AGX Orin platform. Contributions/Results: Experiments demonstrate a 52% reduction in carbon emissions, 30% lower power consumption, and 30% decreased end-to-end latency versus baselinesโwhile sustaining high token-per-second throughput. To our knowledge, this is the first work to achieve joint optimization of energy efficiency, latency, and carbon footprint in edge LLM inference.
๐ Abstract
Large Language Models (LLMs) enable real-time function calling in edge AI systems but introduce significant computational overhead, leading to high power consumption and carbon emissions. Existing methods optimize for performance while neglecting sustainability, making them inefficient for energy-constrained environments. We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized LLM adaptation. CarbonCall adjusts power thresholds based on real-time carbon intensity forecasts and switches between model variants to sustain high tokens-per-second throughput under power constraints. Experiments on an NVIDIA Jetson AGX Orin show that CarbonCall reduces carbon emissions by up to 52%, power consumption by 30%, and execution time by 30%, while maintaining high efficiency.