Location:S16-06-118, DSAP Seminar Room, Faculty of Science
Time:03:00pm - 04:00pm
The recent model-X knockoffs method selects variables with provable and non-asymptotical error control and with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known distribution. In this talk, I will show that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as Ω(np) parameters, where p is the dimension and n is the number of covariate samples (including unlabeled samples if available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models, conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. I will demonstrate how to do this for medium-dimensional Gaussian models, high-dimensional Gaussian graphical models, and discrete graphical models. Simulations show the new approach remains powerful under the weaker assumptions.