State Diversity Matters in Offline Behavior Distillation

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Offline Behavior Distillation (OBD) suffers from a misalignment between raw data quality and distilled policy performance: high-quality raw data does not necessarily yield superior synthetic datasets. We observe that state diversity dominates over state quality in distillation when training loss is large; the priority reverses under low loss. To address this, we propose State Density-Weighted Distillation (SDWD): it models the state distribution via kernel density estimation and incorporates the inverse density as explicit weighting in the distillation objective—thereby upweighting low-density (i.e., high-diversity) regions. This mitigates policy degradation caused by insufficient state coverage under high prediction error. Evaluated on multiple D4RL benchmarks, SDWD significantly improves both behavior cloning and downstream policy optimization when the original dataset exhibits limited state diversity. Our work identifies and leverages the pivotal, context-dependent role of state diversity in OBD, offering a principled solution to the quality-diversity trade-off.

Technology Category

Application Category

📝 Abstract

Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis establishes that surrounding error plays a more crucial role in policy performance when pivotal error is large, thereby highlighting the importance of state diversity in OBD scenario. Furthermore, we propose a novel yet simple algorithm, state density weighted (SDW) OBD, which emphasizes state diversity by weighting the distillation objective using the reciprocal of state density, thereby distilling a more diverse state information into synthetic data. Extensive experiments across multiple D4RL datasets confirm that SDW significantly enhances OBD performance when the original dataset exhibits limited state diversity.

Problem

Research questions and friction points this paper is trying to address.

Addresses misalignment between original and distilled datasets in Offline Behavior Distillation

Analyzes impact of state diversity versus state quality on policy performance

Proposes algorithm to enhance state diversity in synthetic behavioral datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weight distillation objective by state density reciprocal

Emphasize state diversity over quality in synthetic datasets

Propose SDW algorithm to enhance OBD performance

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning