Extensive benchmark of machine learning methods for quantitative microbiome data

Magali Berland (MetaGenoPolis, INRAE)

18 juin 2021

Characterization of microbial communities with omics technologies shed to light powerful biomarkers for diagnosis and prognosis in human health. In particular, shotgun metagenomics allows a highly precise microbiome profiling. Indeed, prediction of phenotypic features, such as clinical status or disease states can help to stratify patients which is the first step toward precision medicine. Many machine learning (ML) methods have been developed to tackle classification and regression problems yet statistical specificities of metagenomic data make difficult the learning task. We developed a R workflow designed to compare ML methods for classification or regression from the caret package. The Activeon Proactive engine was used to efficiently distribute the computing load on multiple servers. We then applied our workflow on a dataset where the fecal microbiota of patients with cardiovascular diseases is compared to healthy controls using shotgun metagenomics.;;