🤖 AI Summary
Existing 3D local editing methods often suffer from identity leakage, weakened edits, and global distortion due to processing edit signals outside the ODE sampler. This work proposes VS3D, a novel framework that achieves high-fidelity local editing without inversion, training, or masks by performing a three-stage collaborative intervention directly within the velocity field space during ODE sampling. Its core innovations include Reconstruction-Anchored Source Injection (RASI) to suppress identity leakage, Partial Mean Guidance (PMG) to amplify editing signals, and Two-fold Aligned Residual injection (TAR) for per-token preservation decisions. By operating entirely inside the ODE solver, VS3D overcomes the limitations of external constraints, enabling precise control over target-region geometry and appearance while preserving the integrity of non-edited regions.
📝 Abstract
Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editing region while vanishing on preserved content. Yet a single velocity field can hardly satisfy both requirements simultaneously, leading to three problems: (i) identity leakage that keeps the edit signal non-zero on preserved regions; (ii) no dedicated edit-amplification channel, so strengthening the edit inevitably perturbs identity; and (iii) an identity drag at the geometry and material stages, where a global condition pulls every token toward the target. We propose VS3D (Velocity-Space 3D Asset editing}), an inversion-free, training-free, and mask-free framework that addresses each problem with a targeted intervention inside the sampler. VS3D integrates three complementary modules, each corresponding to a specific stage of the editing pipeline. Reconstruction-Anchored Source Injection (RASI) absorbs identity leakage by turning the unconditional embedding into a per-step, asset-specific anchor calibrated through source reconstruction. Partial-Mean Guidance (PMG) amplifies the edit signal by contrasting high- and low-quality subsample estimates of the velocity difference, active only where a consistent edit exists. Twin-Agreement Residual injection (TAR) lets the sampler decide token by token what to preserve at the geometry and material stages.