🤖 AI Summary
This work addresses the lack of pre-capture 3D collaborative planning in portrait photography by proposing the first framework that jointly optimizes human pose, camera parameters, and lighting configuration for aesthetic 3D portrait planning. The method introduces a photographic scene graph to unify the representation of scene functionality, subject–environment relationships, and illumination structure, and employs an aesthetics-guided contrastive iterative planning mechanism. By integrating 3D scene understanding, differentiable rendering, and multimodal aesthetic evaluation, the framework generates visually compelling and physically feasible shooting plans. Experiments demonstrate that the proposed approach significantly outperforms baseline methods across diverse indoor and outdoor scenes, achieving superior results in both human perceptual ratings and evaluations by multimodal large language models, thereby advancing computational photography from post-capture correction toward intelligent pre-capture planning.
📝 Abstract
Portrait photography is largely decided before the shutter opens: the subject's pose, the camera configuration, and the lighting devices must be coordinated within the surrounding 3D scene. In contrast, most existing computational methods focus on post-production in 2D image space, such as retouching, relighting, or editing images that already exist; pre-capture photographic planning remains largely unexplored. We introduce 3D aesthetic portrait planning, the task of generating human pose, camera, lighting, and exposure plans that produce visually compelling portraits while satisfying geometric and photometric feasibility in a 3D scene. Our approach builds a Photographic Scene Graph that represents scene affordances, subject-scene relations, and portrait-relevant lighting structure. Built on this representation, we perform aesthetic-guided comparative planning over previous attempts and current viewfinder observations. Experiments across diverse indoor and outdoor scenes show that our method produces portraits preferred by human raters and MLLM evaluators over competitive baselines, while maintaining high physical plausibility. Together, our results suggest a path from post-capture correction toward pre-capture computational portrait planning. Project repository: https://github.com/songrise/Before-the-Shutter