🤖 AI Summary
Existing 3D scene editing methods struggle to insert arbitrary novel objects with high fidelity while preserving geometric and appearance consistency across views. This paper introduces the first generative object insertion framework tailored for Gaussian Splatting representations, pioneering the integration of multi-view consistent diffusion models into 3D editing—enabling end-to-end, mask-free, and fine-tuning-free view-consistent synthesis. Our approach jointly optimizes differentiable Gaussian rendering, cross-view feature alignment loss, and implicit shape-appearance co-modeling. Evaluated on real multi-view imagery, the method generates novel objects that are consistent in illumination, pose, and semantics across all viewpoints. Both qualitative and quantitative evaluations demonstrate superior performance over NeRF-based and conventional editing baselines. Moreover, our method achieves significantly faster inference than generative NeRF approaches, offering a practical, high-fidelity solution for 3D scene composition.