Supplementary Materials of Sparse Optimal Scoring

by Chenlei Leng.

The paper: Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data.

Motivation
Gene expression data sets hold the promise to provide cancer diagnosis on the molecular level. However, using all the gene profiles for diagnosis may be suboptimal. Detection of the molecular signatures not only reduces the number of genes needed for discrimination purposes, but may elucidate the roles they play in the biological processes. Therefore, a central part of diagnosis is to detect a small set of tumor biomarkers which can be used for accurate multiclass cancer classification. This task calls for effective multiclass classifiers with build-in biomarker selection mechanism.

Result
We propose the sparse optimal scoring (SOS) method for multiclass cancer characterization. SOS is a simple prototype classifier in which predictive biomarkers can be automatically determined together with accurate classification. Thus, SOS differentiates itself from many other commonly used classifiers, where gene preselection must be applied before classification. We obtain satisfactory performance while applying SOS to several public data sets.

Figure 1, Figure 2 and Figure 3.

Further results for Brown dataset and SRBCT dataset.

Additional datasets: GCM datset. Brown dataset.