Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

In regression and causal inference, controlled subgroup selection aims to identify subpopulations—within the covariate space—whose average response or treatment effect significantly exceeds a prespecified threshold, while providing rigorous statistical inference guarantees. Existing methods often sacrifice efficiency (e.g., via data splitting), lack flexibility, or fail to ensure valid inference. This paper introduces Chiseling: an interactive machine learning framework that iteratively refines candidate subgroups via targeted contraction. Crucially, contraction directions are determined solely from data outside the current subgroup, ensuring valid hypothesis testing under only finite-moment conditions. The framework seamlessly incorporates domain knowledge and accommodates arbitrary machine learning algorithms, supporting both randomized experiments and observational studies. Simulation and empirical analyses demonstrate that Chiseling achieves substantially improved subgroup detection power and practical utility over existing guaranteed methods—without compromising statistical validity.

Technology Category

Application Category

📝 Abstract

In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.

Problem

Research questions and friction points this paper is trying to address.

Identifying subgroups with specific treatment effects using inferential guarantees

Overcoming limitations of existing subgroup selection methods lacking validity

Enabling interactive refinement of subgroups while maintaining statistical control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive subgroup refinement via chiseling framework

Inferential validity under minimal assumptions like bounded moments

Leverages prior information and machine learning flexibly

🔎 Similar Papers

Sample Selection Bias in Machine Learning for Healthcare