LightCode: Compiling LLM Inference for Photonic-Electronic Systems

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the latency and energy-efficiency bottlenecks of GPU-based large language model (LLM) inference, this paper proposes LightCode—a photonic–electronic hybrid inference architecture and compiler framework. Methodologically, LightCode introduces a Stacked Graph intermediate representation, jointly optimizing photonic tensor unit (PTU) and electronic processor scheduling via constraint-aware subgraph selection and a parametric energy/latency cost model; it further integrates a PTU simulation module to enable hardware-aware subgraph partitioning and resource allocation. Evaluation on GPT-2 and Llama-7B prefilling shows that LightCode achieves up to 50% energy reduction and over 10× latency improvement versus purely electronic baselines. Moreover, it supports customizable hardware mapping strategies—e.g., power- or latency-optimized—enhancing compilation efficiency and adaptability of hybrid photonic–electronic systems.

Technology Category

Application Category

📝 Abstract

The growing demand for low-latency, energy-efficient inference in large language models (LLMs) has catalyzed interest in heterogeneous architectures. While GPUs remain dominant, they are poorly suited for integration with emerging domain-specific accelerators like the Photonic Tensor Units (PTUs), which offer low-power, high-throughput linear computation. This motivates hybrid compilation strategies that combine photonic and electronic resources. We present LightCode, a compiler framework and simulator for mapping LLM inference workloads across hybrid photonic-electronic systems. LightCode introduces the Stacked Graph, an intermediate representation that encodes multiple hardware-specific realizations of each tensor operation. Hardware assignment is formulated as a constrained subgraph selection problem optimized for latency or energy under parametric cost models. We evaluate LightCode on the prefill stage of GPT-2 and Llama-7B showing that under our workload and hardware assumptions, (i) Photonic hardware reduced energy by up to 50% in our simulated workloads at maximum sequence length; (ii) multiplexing and assignment strategy yielded latency improvements exceeding 10x; and (iii) Optimizing for latency or energy resulted in distinct hardware mappings in our simulations. LightCode offers a module, foundational framework and simulator for compiling LLMs to emerging photonic accelerators.

Problem

Research questions and friction points this paper is trying to address.

Compiling LLM inference for hybrid photonic-electronic systems efficiently

Optimizing hardware assignment as constrained subgraph selection problem

Reducing energy consumption and latency in LLM inference workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compiler framework mapping LLMs to photonic-electronic systems

Stacked Graph IR encoding multiple hardware-specific realizations

Constrained subgraph selection for latency or energy optimization

🔎 Similar Papers

No similar papers found.