Date:08 November 2017, Wednesday
Location:S16-05-96, Computer Lab 4
Time:11:00am - 12:00pm
PHD ORAL PRESENTATION
In modern scientific research, data with mixture structure have been frequently identified in various research areas, such as fiance, zoology, and psychology. One important feature of these data is that observations typically come from several different subpopulations, and for each observation, the membership of its subpopulation is unknown to some extend; this introduces significant challenges in the subsequent analysis and leads to vast interesting research topics in statistics. In this thesis, we consider the density estimation for two-sample mixture data, where we assume that for each observation, the probabilities that it belongs to the subpopulations are known, and the density ratio of the subpopulations satisfies a likelihood ratio ordering condition. We observe that likelihood ratio ordering is an important concept in various fields, such as mechanism design, economics, and finance. This thesis contains two main parts. In the first part, with a smoothed likelihood principal, we propose a kernel-based nonparametric method for estimating the two-sample densities in the aforementioned mixture data problem; and derive a majorization-minimization algorithm to compute the density estimates numerically. Interestingly, we observe that the bandwidth selection can be adaptively incorporated into the majorization-minimization algorithm. We establish theoretical properties of our proposed method. In particular, we show that starting from any initial value, the proposed algorithm converges; and we establish the asymptotic converge rate of our proposed estimators. We conduct vast simulation studies to compare the proposed method and the existing methods in the literature. A malaria data example is used to illustrate the application of our method in practice. In the second part, we propose two important applications of our proposed density estimates: (1) receiver operating characteristic (ROC) curve estimation; (2) the estimation of the posterior probabilities in the binary response data, and its application in the classification problem. In both applications, we conduct vast simulation studies to compare our methods with existing methods in the literature; and in all the numerical studies, the performances of our methods are promising.
PHD ORAL PRESENTATION