🤖 AI Summary
This paper addresses the challenging fitting problem in constructive solid geometry (CSG) modeling for scene geometric parsing—characterized by unknown numbers of primitives, strong coupling between positive and negative primitives, and highly non-convex optimization. We propose a multi-regression ensemble framework integrating learned initialization, gradient-based geometric optimization, and Boolean composition modeling. Crucially, we introduce explicit representation of *negative convex primitives* to model voids and occlusions—a first in CSG-based reconstruction. Furthermore, we design a “refine-then-select” ensemble strategy that breaks from conventional paradigms. On standard benchmarks, our method significantly reduces depth and surface normal prediction errors; over 70% of images benefit from negative primitive modeling. The ensemble strategy substantially outperforms baselines, empirically confirming the problem’s severe non-convexity and validating the effectiveness of our approach.
📝 Abstract
Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established vision problem. This is a good model of a difficult fitting problem: different scenes require different numbers of primitives and primitives interact strongly, but any proposed solution can be evaluated at inference time. The state of the art method involves a learned regression procedure to predict a start point consisting of a fixed number of primitives, followed by a descent method to refine the geometry and remove redundant primitives. Methods are evaluated by accuracy in depth and normal prediction and in scene segmentation. This paper shows that very significant improvements in accuracy can be obtained by (a) incorporating a small number of negative primitives and (b) ensembling over a number of different regression procedures. Ensembling is by refining each predicted start point, then choosing the best by fitting loss. Extensive experiments on a standard dataset confirm that negative primitives are useful in a large fraction of images, and that our refine-then-choose strategy outperforms choose-then-refine, confirming that the fitting problem is very difficult.