Event
In-Sample Evaluation of Subgroups Identified by Generic Machine Learning
Xinzhou Guo
Assistant Professor
Hong Kong University of Science and Technology
Date: 24 February 2026, Tuesday
Time: 3 pm, Singapore
Venue: S16-06-118, Seminar Room
When a subgroup is identified from the data, we must evaluate the post-hoc identified subgroup in a replicable way. The usual in-sample approach, which evaluates the post-hoc identified subgroup as predefined, might suffer from selection bias. This issue is particularly challenging due to the intrinsic characteristics of subgroup analysis, generic machine learning-based subgroup identification and non-smooth subgroup boundary. The latter is also known as nonregularity. The out-of-sample approach, which splits data into two parts, one for subgroup identification and the other for evaluation, can help address selection bias but might suffer from efficiency loss and instability issue, as the subgroup is identified using only part of the data. In this paper, we propose a conditional $m$-out-of-$n$ perturbation approach to remove selection bias in in-sample subgroup evaluation and deliver valid inference on post-hoc identified subgroups when the subgroup is identified from the whole dataset by generic machine learning. The proposed method is easy-to-compute and model-free, and remains valid regardless of whether regularity is satisfied. Through a novel theoretical framework of triple robustness linking rates of subgroups identification and nuisance estimation, we show that the proposed method, with an adaptive selection of the subsample size, achieves full efficiency across broad scenarios in generic machine learning for subgroup analysis. The merits of the proposed method are demonstrated by a re-analysis of the ACTG 175 trial.