๐ค AI Summary
To address view redundancy caused by human motion in indoor monocular video novel-view synthesis, this paper proposes a diversity-driven optimal view subset selection framework. Methodologically, we formulate view selection as a combinatorial optimization problem for the first time, integrating a theoretically grounded utility function with an explicit diversity metric, and introduce IndoorTrajโthe first indoor video benchmark featuring complex human trajectories. Experiments demonstrate that selecting only 5โ20% of input views on IndoorTraj surpasses the full-view baseline in both modeling efficiency and synthesis quality. Our core contributions are: (1) a theory-driven utility function design ensuring provable optimality; (2) a quantitative diversity modeling framework explicitly capturing spatial and temporal view complementarity; and (3) a realistic evaluation paradigm tailored to dynamic indoor scenes with articulated human motion.
๐ Abstract
Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions. We provide a theoretical analysis of these utility functions and validate their effectiveness through extensive experiments. Furthermore, we introduce IndoorTraj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. Experiments on IndoorTraj show that our framework consistently outperforms baseline strategies while using only 5-20% of the data, highlighting its remarkable efficiency and effectiveness. The code is available at: https://github.com/zehao-wang/IndoorTraj