Statistical multivariate modelling of omics data with copulas

Gildas Mazo (MaIAGE, INRAE)

09 sept. 2022

Many omics data produced by next generation sequencing technologies are of different types (e.g., discrete read counts and continuous methylation data). From an integrative biology viewpoint, this makes the building of statistical models difficult. To address this issue, one can rely on copula theory to effectively build multivariate models by “adding” a dependence structure to “couple” arbitrary random variables. However, doing the inference in copula-based models with heterogeneous datasets is challenging, because the log-likelihood is complex and becomes untractable when the dimension increases. To alleviate this issue, a randomized pairwise likelihood method is presented. Randomization allows us to control the tradeoff between statistical and computational efficiency. The method is illustrated in theory, on simulations, and on a RNA-seq dataset. Future directions of research are discussed.