Clustering, or unsupervised learning, is a major problem in statistics with many applications. In the Big Data era, it faces two main challenges: 1. the number of features is much larger than the sample size; 2. The signals are sparse and weak, masked by large amount of noise. The two problems are very common in high dimensional problems, such as detection and signal recovery problems.
We consider the two-class clustering problem under the proposed asymptotic rare and weak signals model. In the two-dimensional phase space calibrating the rarity and strengths of the signals, we find the precise demarcation for the Region of Impossibility (clustering is impossible with any method) and Region of Possibility (there are successful clustering methods). Especially, we zoom-in an interesting region of this phase diagram, and find a more delicate demarcation as the result of a newly proposed method, IF-PCA.
The phase diagrams are also studied for the related detection problem and signal recovery problem.