Improving DNN Modularization via Activation-Driven Training

📅 2024-11-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the high retraining cost and accumulating technical debt when adapting deep neural networks (DNNs) to new tasks, this paper proposes MODA, an activation-driven modular training framework. Unlike existing modularization methods relying on masks or post-hoc processing, MODA introduces an end-to-end differentiable optimization over activation space, jointly pursuing intra-class aggregation, inter-class separation, and module compactness—enabling natural, non-overlapping modular decomposition across all layers (not limited to convolutional layers). It incorporates activation regularization and class-aware distribution constraints, allowing plug-and-play module replacement without fine-tuning. Experiments demonstrate that MODA reduces training time by 29%, decreases module parameters by 58%, lowers weight overlap by 71%, and incurs zero accuracy loss. After module replacement, target-class accuracy improves by 12%, while accuracy on other classes varies by less than 0.5%.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and three datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 29% less training time; (2) the resultant modules generated by MODA comprise 2.4x fewer weights and 3.5x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.

Problem

Research questions and friction points this paper is trying to address.

Reducing technical debt and retraining costs in DNNs

Minimizing weight overlaps and accuracy losses in modules

Improving modularization efficiency without auxiliary masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Activation-driven modular training approach

Regulates activation outputs for modular objectives

Reduces training time and weight overlap

🔎 Similar Papers

Training Neural Networks for Modularity aids Interpretability