🤖 AI Summary
In few-shot reinforcement learning (RL), the scarcity of unstructured data severely hampers policy generalization—particularly in embedded Dynamic Voltage and Frequency Scaling (DVFS) control, where labeled state-action trajectories are extremely limited.
Method: This paper proposes a distribution-aware flow-matching generative framework tailored for embedded DVFS. It innovatively integrates continuous normalizing flows with random-forest-driven feature importance weighting and latent-space bootstrap augmentation to synthesize high-fidelity, diverse state-action pairs from minimal observed data.
Contribution/Results: Leveraging the law of large numbers, the framework ensures statistical consistency of the learned policy, effectively mitigating overfitting and data dependency bias. Experiments demonstrate significantly stabilized Q-value convergence, a 30% improvement in initial timestamp frame rate, and markedly enhanced RL training efficiency and deployment feasibility under severe resource constraints.
📝 Abstract
Generating realistic and diverse unstructured data is a significant challenge in reinforcement learning (RL), particularly in few-shot learning scenarios with limited data availability. Traditional RL methods often rely on real data for exploration, which can be time-consuming and inefficient. In this paper, we introduce a distribution-aware flow matching approach designed to generate synthetic unstructured data, specifically tailored for the few-shot RL application of Dynamic Voltage and Frequency Scaling (DVFS) on embedded processors. Our method leverages the flow matching algorithm as a sample-efficient generative model and incorporates bootstrapping techniques to enhance latent space diversity and generalization. Additionally, we apply feature weighting using Random Forests to prioritize critical features, improving the precision of the generated synthetic data. Our approach addresses key challenges in traditional model-based RL, such as overfitting and data correlation, while aligning with the principles of the Law of Large Numbers to support empirical consistency and policy improvement as the number of samples increases. We validate our approach through extensive experimentation on a DVFS application for low-energy processing. Results demonstrate that our method achieves stable convergence in terms of maximum Q-value while enhancing frame rates by 30% in the initial timestamps. These improvements make the proposed RL model more efficient in resource-constrained environments.