QLS Seminar Series - Sebastien Lemieux
Challenges and opportunities for biomarker discovery when applying machine learning techniques to large Ï㽶ÊÓƵ-Seq cohorts
Sebastien Lemieux (University of Montreal)
Tuesday September 29, 12-1pm
Zoom Link:Â
Abstract: Over two decades of statistical developments have allowed transcriptomics, from microarrays to Ï㽶ÊÓƵ-Seq, to become an indispensable tool to characterize changes in expression profiles. Computationally, a central step is the application of specialized parametric statistical tests such as DESeq2, EdgeR or Voom to single out differentially expressed genes. In parallel, several new tricks have been developed within the machine learning framework to facilitate the training of high-dimensional classifiers that can take advantage of the whole transcriptome characterizations. Unfortunately, identifying biomarkers from these trained classifiers has proven more difficult than expected. These computational advances, coupled with reduced costs and protocol stabilization for Ï㽶ÊÓƵ-Seq has led to the emergence of large cohorts of hundreds of high-quality Ï㽶ÊÓƵ-Seq expression profiles. I will show, using large datasets developed to refine sub-typing of acute myeloid leukemia (AML), that standard statistical tools reveal their inadequacy for the identification of biomarkers. I will present a novel approach to the identification of biomarkers based on machine learning principles that scales well with large datasets.