AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the insufficient reliability of edge DNN accelerators under hardware faults and communication errors, this paper proposes an accuracy-aware fault-tolerant partitioning framework. Unlike conventional optimization paradigms focused primarily on latency or energy consumption, our approach uniquely prioritizes accuracy degradation under fault conditions as the core objective. We formulate a multi-objective optimization model jointly minimizing latency, energy consumption, and accuracy loss, and solve it using the NSGA-II algorithm to obtain Pareto-optimal partitioning schemes. The framework integrates runtime fault injection, accuracy monitoring, and feedback-driven reconfiguration. Experimental evaluation on AlexNet, SqueezeNet, and ResNet18 demonstrates that our method achieves up to 27.7% improvement in fault tolerance with negligible performance overhead, significantly enhancing the robustness and safety of DNN inference in heterogeneous edge systems.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) are increasingly deployed across distributed and resource-constrained platforms, such as System-on-Chip (SoC) accelerators and edge-cloud systems. DNNs are often partitioned and executed across heterogeneous processing units to optimize latency and energy. However, the reliability of these partitioned models under hardware faults and communication errors remains a critical yet underexplored topic, especially in safety-critical applications. In this paper, we propose an accuracy-aware, fault-resilient DNN partitioning framework targeting multi-objective optimization using NSGA-II, where accuracy degradation under fault conditions is introduced as a core metric alongside energy and latency. Our framework performs runtime fault injection during optimization and utilizes a feedback loop to prioritize fault-tolerant partitioning. We evaluate our approach on benchmark CNNs including AlexNet, SqueezeNet and ResNet18 on hardware accelerators, and demonstrate up to 27.7% improvement in fault tolerance with minimal increase in performance overhead. Our results highlight the importance of incorporating resilience into DNN partitioning, and thereby paving the way for robust AI inference in error-prone environments.

Problem

Research questions and friction points this paper is trying to address.

Optimizes DNN partitioning for fault resilience and performance

Addresses reliability under hardware faults in edge accelerators

Minimizes accuracy degradation while managing energy and latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

NSGA-II optimizes DNN partitioning for fault tolerance

Runtime fault injection guides resilient partition selection

Feedback loop prioritizes accuracy under hardware faults

🔎 Similar Papers

No similar papers found.