Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work proposes a real-time, high-accuracy gait anomaly detection system tailored for edge devices to prevent falls. Through a holistic hardware-software co-design approach, the study introduces the first cross-layer optimized LSTM accelerator architecture dedicated to gait analysis, jointly refining algorithmic formulation, hardware implementation, and physical layout. Leveraging hardware-aware quantization, RTL design space exploration, and multi-placement strategies, an ASIC is implemented in 65 nm CMOS technology. The high-accuracy variant occupies only 0.325 mm², while a low-complexity version further reduces area by 15.4%. The design achieves an inference throughput 4.05× higher than application requirements, significantly lowering power consumption and area overhead without compromising accuracy.

Technology Category

Application Category

📝 Abstract
Long Short-Term Memory (LSTM) neural networks have penetrated healthcare applications where real-time requirements and edge computing capabilities are essential. Gait analysis that detects abnormal steps to prevent patients from falling is a prominent problem for such applications. Given the extremely stringent design requirements in performance, power dissipation, and area, an Application-Specific Integrated Circuit (ASIC) enables an efficient real-time exploitation of LSTMs for gait analysis, achieving high accuracy. To the best of our knowledge, this work presents the first cross-layer co-optimized LSTM accelerator for real-time gait analysis, targeting an ASIC design. We conduct a comprehensive design space exploration from software down to layout design. We carry out a bit-width optimization at the software level with hardware-aware quantization to reduce the hardware complexity, explore various designs at the register-transfer level, and generate alternative layouts to find efficient realizations of the LSTM accelerator in terms of hardware complexity and accuracy. The physical synthesis results show that, using the 65 nm technology, the die size of the accelerator's layout optimized for the highest accuracy is 0.325 mm^2, while the alternative design optimized for hardware complexity with a slightly lower accuracy occupies 15.4% smaller area. Moreover, the designed accelerators achieve accurate gait abnormality detection 4.05x faster than the given application requirement.
Problem

Research questions and friction points this paper is trying to address.

LSTM
gait analysis
real-time
ASIC
edge computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-layer co-optimization
LSTM accelerator
hardware-aware quantization
ASIC design
gait analysis
🔎 Similar Papers
No similar papers found.