🤖 AI Summary
In multi-source reinforcement learning, manually designed state encoders struggle to adapt to heterogeneous inputs (e.g., sensors, time-series, images, text). Method: This paper proposes a composite neural architecture search (NAS) framework that jointly optimizes multimodal source modules and fusion modules. It innovatively incorporates large language model (LLM)-derived prior knowledge to guide search space construction and employs intermediate-layer output signals along with multi-source representation quality feedback to enhance sample efficiency. Results: Evaluated on a mixed autonomous traffic control task, the method discovers high-performance encoders with significantly fewer candidate architecture evaluations than conventional NAS and the GENIUS framework. The resulting policies achieve superior performance and improved generalization across diverse traffic scenarios.
📝 Abstract
Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -- remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules -- such as their representation quality -- limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline that leverages language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.