🤖 AI Summary
This paper addresses the joint learning of state representations and controllers for unknown partially observable linear systems under the LQG control paradigm. Unlike conventional representation learning approaches that require observation reconstruction, we propose a cost-driven latent dynamics modeling framework that directly optimizes multi-step control costs—bypassing observation reconstruction and enabling end-to-end joint learning of representations and controllers. Theoretically, we establish the first finite-sample, provably guaranteed analysis for cost-driven latent model learning, revealing that accurate multi-step cost prediction is both necessary and sufficient for representation identifiability and near-optimal control performance. Methodologically, our approach integrates empirical risk minimization, system identification, and robust control analysis. Under finite-sample conditions, the learned representation and controller converge to the optimal solution, thereby closing a long-standing gap in provable guarantees for this paradigm.
📝 Abstract
We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.