Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the area-efficiency bottleneck in FPGA logic blocks—where LUTs and carry-chain adders cannot operate concurrently due to shared input routing resources—this paper proposes the Double Duty architecture. It repurposes four existing input wires to bypass the LUT and directly drive the carry chain, enabling, for the first time, concurrent LUT and adder utilization within a single logic block without increasing input pin count. We implement circuit-level and CAD-level models within an open-source FPGA toolchain targeting a Stratix-10-like architecture, and evaluate the design using Kratos, Koios, and VTR benchmark suites. Results show an average 13.0% reduction in logic area (−21.6% on Kratos, −9.3% on Koios, −8.2% on VTR), a 9.7% average improvement in area-delay product, and no degradation in critical-path delay.

Technology Category

Application Category

📝 Abstract
Flexibility and customization are key strengths of Field-Programmable Gate Arrays (FPGAs) when compared to other computing devices. For instance, FPGAs can efficiently implement arbitrary-precision arithmetic operations, and can perform aggressive synthesis optimizations to eliminate ineffectual operations. Motivated by sparsity and mixed-precision in deep neural networks (DNNs), we investigate how to optimize the current logic block architecture to increase its arithmetic density. We find that modern FPGA logic block architectures prevent the independent use of adder chains, and instead only allow adder chain inputs to be fed by look-up table (LUT) outputs. This only allows one of the two primitives -- either adders or LUTs -- to be used independently in one logic element and prevents their concurrent use, hampering area optimizations. In this work, we propose the Double Duty logic block architecture to enable the concurrent use of the adders and LUTs within a logic element. Without adding expensive logic cluster inputs, we use 4 of the existing inputs to bypass the LUTs and connect directly to the adder chain inputs. We accurately model our changes at both the circuit and CAD levels using open-source FPGA development tools. Our experimental evaluation on a Stratix-10-like architecture demonstrates area reductions of 21.6% on adder-intensive circuits from the Kratos benchmarks, and 9.3% and 8.2% on the more general Koios and VTR benchmarks respectively. These area improvements come without an impact to critical path delay, demonstrating that higher density is feasible on modern FPGA architectures by adding more flexibility in how the adder chain is used. Averaged across all circuits from our three evaluated benchmark set, our Double Duty FPGA architecture improves area-delay product by 9.7%.
Problem

Research questions and friction points this paper is trying to address.

Enable concurrent LUT and adder chain usage in FPGAs
Optimize logic block architecture for higher arithmetic density
Improve area efficiency without impacting critical path delay
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enables concurrent LUT and adder chain usage
Bypasses LUTs with existing inputs for adders
Improves area-delay product by 9.7%
🔎 Similar Papers
No similar papers found.