Location:S16-06-118, DSAP Seminar Room, Faculty of Science
Time:03:00pm - 04:00pm
Community detection in networks has found much success in diverse areas of science. However, many open problems still remain. In this talk, we show that tackling these problems may require us to both zoom in to a specific application for an appropriate remedy, and zoom out to look for general patterns and hidden connections. We start with a specific application that studies the 3D structure of chromatin conformation, where the well-studied stochastic block model and its variants are too restrictive. We propose a new network model for detecting topologically associating domains (TADs) using Hi-C data. Our model leads to a likelihood objective that can be efficiently optimised via relaxation with theoretical guarantees. Furthermore, the model can be easily generalised to perform joint TAD calling across multiple cell lines. Next, zooming out to the general problem of clustering as a form of unsupervised learning, we note that selecting hyperparameters in this setting is often challenging due to the lack of ground truth for validation. We propose a unified framework with provable guarantees that works for various network models, as well as sub-gaussian mixtures. The hyperparameters considered include the Lagrange multiplier in semidefinite programming relaxations, the bandwidth parameter in kernel spectral clustering, and the number of clusters in network models. In a variety of simulated and real data experiments, we compare our framework with other widely used tuning procedures in a broad range of parameter settings.