Gene Network Inference Meeting 2013
STATSEQ meeting on Gene Network Inference with Systems genetic data and beyond.
28-29th March 2013, Paris, France AgroParisTech - ENGREF 19 avenue du Maine Paris 15e, France
Systems Genetics (SG) data consist of genotyping data and other datasets that potentially reflect the effect of a perturbation of the system caused by these naturally diverse genotypes: phenotypes of interest (e.g. disease, biomass yield), ‘molecular phenotypes’ such as omics datasets: gene expression levels, gene methylations, proteins and metabolite levels. Regular genetic studies (with genotyping and phenotyping data alone) permit the identification of genetic loci which affect a given phenotype. The availability of measurements of tens of thousands of molecular phenotypes enables algorithms to elucidate the regulatory networks underlying the complex genotype-phenotype relationships. Beyond the causal effect of genotypes onto phenotypes, other types of data such as time-series can also be used to infer gene network structure.
A large number of network inference methods for SG data or other types of data have been proposed and more algorithms are expected to appear now that genotyping and expression data availability increases due to growing use of Next Generation Sequencing. The meeting is focusing on existing methods for network inference from high-throuput data.
The workshop will take place in the AgroParisTech - ENGREF, Amphitheater B208 Metro: Montparnasse Bienvenüe.
|14H00||De la Fuente A: Simulating Systems Genetics for algorithm evaluation (Slides)|
|14H30||Flassig R: An Effective Framework for Reconstructing Gene Regulatory Networks From Genetical Genomics Data (Slides)|
|15H00||Huynh-thu VA, Geurts P: Gene regulatory network inference from expression and genetic data using tree-based methods|
|16H00||Ramayo-Caldas Y, Reverter A, Fortes MRS, Ballester M, Esteve-Codina A, Noguera JL, Fernández AI, Pérez-Enciso M, Folch JM: Co-association and gene network analysis for pig intramuscular fatty acid composition (Slides)|
|16H20||Behrouzi P, Johannes F, Wit EC: Sparse latent graphical models in high dimensional setting with application to genetics|
|16H40||Mohammadi A, Wit EC: Gene Network inference for high-dimensional problems (Slides)|
|9H00||Vignes M: Exploration of methods to infer gene regulatory networks from simulated System Genetics data: what could be learnt ? (Slides)|
|9H30||Kueffner R: Extending partially known networks (Slides)|
|10H30||Li Y: Causal inference with QTL data (Slides)|
|11H00||Wang H, van Eeuwijk F: A general method for inferring causal relationships between associated phenotypes using both phenotypic and QTL information|
|11H20||Rau A, Jaffrézic F, Nuel G: Joint estimation of causal effects from observational and intervention gene expression data (Slides)|
|11H40||Lim N, Senbabaoglu Y, Michailidis G, d'Alché-Buc F: Gene regulatory network inference using a boosting algorithm and operator-valued kernels|
|13H30||Zhu J: RIMBANet for Reconstructing Integrative Molecular Bayesian Networks (Slides)|
|14H00||Aravena A, Eveillard D, Maass A, Siegel A: A logics-based integrative approach to decipher putative regulatory relationships infered from genomic and transcriptomic data|
|14H20||Barah P, Jayavelu ND, Rasmussen S, Nielsen HB, Mundy J, Bones AM: Transcriptional regulatory network in Arabidopsis thaliana during response to single and combined stresses|
|14H40||Whalley J, Birméle E, Rizzon C: What can biological networks tell us about the fate of duplicate genes in Arabidopsis thaliana? (Slides)|
|15H30||Di Camillo B: From genetic variants to RNAs and protein signaling networks towards a system level understanding of biology (Slides)|
|16H00||Di Bernado D: Reverse-engineering post-translation modifications from gene expression profiles (Slides)|
For people involved in gene regulation network inference, using either real or simulated, with System Genetics or other types of data, with high-throuput expression data or other type of data, with any type of methods from Statistics, Machine Learning, or Computer Science. The number of participants is strictly limited and priority will be given to contributed talk authors.
Thanks to the support of STATSEQ and the MIA department of INRA (see Sponsors), the workshop has free registration. We have now reached our maximum number of registrations and registration is closed.
Accomodation and food: Lunch on march 28th is provided however you are on your own to book an hotel in Paris and organise your dinner on march 27th.
The call for contributed talks is now CLOSED.
- March 8: deadline for registration (see above). The room is limited to roughly 60 persons.
- March 28-29: meeting.
- Jun Zhu, Genetics and Genomic Sciences. The Mount Sinai Hospital. NY. USA.
- Title: RIMBANet for Reconstructing Integrative Molecular Bayesian Networks
- Abstract: A diverse population provides a good source of natural and systematic genetic perturbations. By leveraging these genetic perturbations, GWAS have identified thousands of candidate variations for human diseases. To interpret these candidate variations in a meaningful way, we need a comprehensive biological context within which these variations are functional. We developed RIMBANet, a general Bayesian network framework to integrate genetic data with diverse omic data. We recently extended the framework to integrate data including endogenous metabolite concentration, miRNA expression variation, DNA mutations and DNA copy number variations, DNA-protein binding, protein-metabolite interaction, protein-protein interaction data, and more, to construct probabilistic causal networks that elucidate the complexity of cell regulation. With available time series data, we are further extending the systems genetic approach to elucidate dynamic changes of biological pathways under genetic controls.
- Alberto de la Fuente, CRS4 Bioinformatica, Italy.
- Title: Simulating Systems Genetics for algorithm evaluation
- Abstract: Systems Genetics combines Systems Biology with the study of the genetic variation underlying complex phenotypes. A large number of methods for SG data analysis are available, but at present not much is known about their strengths and weaknesses.
- One of the goals in Systems Genetics research is to elucidate the gene networks underlying complex phenotypes. Benchmarking on real biological data is challenging as true regulatory networks are largely unknown. The availability of realistically simulated datasets, which are generated under a set of assumptions most relevant to real SG data, is of utmost importance for the verification of algorithms for SG data analysis. Only for these data are we certain about the true complex system underlying the data.
- I will present on an in silico Systems Genetics dataset generated with the program SysGenSIM. This dataset has been used by an international group of scientists to evaluate and compare different data analysis approaches.
- Robert Kueffner, Lehr- und Forschungseinheit Bioinformatik, Institut für Informatik Ludwig-Maximilians-Universität München, Germany.
- Title: Extending partially known networks
- Abstract: Besides experimental techniques, computational inference approaches have contributed to the reconstruction of gene regulatory networks in model organisms. Particularly successful are supervised approaches that take the known regulatory interactions and gene expression data into account. However, they have not yet been applied to individuals genotyped by systems genetics data, where genetic polymorphisms are the major source of variation in gene expression profiles.
- We apply a supervised inference framework to expression datasets, genotype information, and the known gene regulatory interactions that are generated in a standardized setup by the SysGenSim software. We confirmed in this setup that supervised approaches exploiting the known interactions perform better than pure expression-based methods as well as methods exploiting expression data and genotype information. The performance of supervised methods was robust with respect to parameterization and data pre-processing. Furthermore, whether or not the genotype information was explicitly used influenced the performance of supervised approaches only little. We also analyzed differences between real and artificial data and setups to assess the chances of a successful inference in real systems. Due to reasons discussed in this chapter, several extensions of supervised approaches that considerably improve performance on real data were not effective in the SysGenSim case.
- Our thorough comparison between real and artificial setups suggested that the application of supervised approaches to real systems might be more robust and straightforward in comparison to current unsupervised approaches. In particular, as real genotypes likely are more complex and cause more versatile responses, the finding that supervised approaches are not dependent on the explicit representation of genotype information might prove of advantage.
- Barbara Di Camillo, Università degli Studi di Padova · Department of Information Engineering, Italy.
- Title: From genetic variants to RNAs and protein signaling networks towards a system level understanding of biology
- Abstract: Genetic variations can be thought as perturbations and the gene/protein expression profile of each individual as the system response to his/her specific set of perturbation. With this rationale, we developed SP-ABACUS: SNP-Pathways Analysis based on Bivariate Cumulative Statistic, an algorithm that exploit pathway concept to search for a combination of rare and common variants belonging to the same pathway/functional group, that are likely to contribute to the disease.
- Applied to the WTCCC (Wellcome Trust Case Control Consortium, Nature, 2007) type 2 diabetes GWAS datasets, SP-ABACUS allowed the identification of genetic variants that were then integrated at different levels with transcriptomic and protein signaling data, to better understand the mechanisms beyond the pathogenesis of a disease. For example, the consistency between SNPs associated to the diseases in specific pathways and gene expression was evaluated by analyzing whether the SNP markers influence expression of specific transcripts. Moreover, since, at the basis of development, functioning and homeostasis there is the cell signaling, a mass action model of insulin signaling in muscle cells treated with insulin was used to analyze the changes in the activity of multiple proteins in normal and insulin resistant muscle cells. The modeling approach has yielded important insights into reciprocal relationships between insulin resistance and changes in PI3K/Akt pathways that might be relevant for generating novel therapeutic approaches.
- Yang Li,Faculty of Mathematics and Natural Sciences, Bioinformatics - Gron Inst Biomolecular Sciences & Biotechnology, Groningen, The Netherlands
- Title: Causal inference with QTL data
- Abstract: Systems genetics aims to unravel the biological information flow from genotype to phenotype using natural genetic and phenotypic variation. The identification of polymorphic loci that genetically control molecular and traditional phenotypes, e.g. disease, is followed by the elucidation of the regulators residing in these loci, and of their networks causal for the pathogenesis. We here present a method to evaluate the causal links that are inferred from quantitative trait loci data, based on the information from sample size, genotype ratio and QTL effect size. Our method is further extended to combine causal results from multiple studies.
- Diego di Bernardo,Systems and Synthetic Biology Lab, Dept. of Computer and Systems Engineering. Universita’ “Federico II” of Naples, Italy.
- Title: Reverse-engineering post-translation modifications from gene expression profiles
- Abstract: Protein activity is tightly regulated by the interplay among protein–protein interactions and post-translational modifications (PTMs. “Reverse engineering” of gene networks from Gene Expression Profiles (GEPs) has been extensively used to show the complexity of transcriptional programs and to identify key genes acting as master regulators of transcription in physiological and pathological conditions. An open challenge in reverse engineering is the identification of PTMs starting from GEPs. Here we propose a generalised method that we called Differential Multi-Information (DMI) used to identify post-translational modulators M of a TF by observing the change in co-regulation (measured as Multi-Information) among a set of n targets G1...Gn in the presence or absence of the modulator M. We tested DMI performance, on a real dataset by identifying the post-transcriptional modulators of the transcription factor p53 and novel modulators of the transcription Factor TFEB. Our method could be instrumental in identifying post-transcriptional regulatory interactions in an efficient and cost-effective manner, thus filling the gap between transcriptional networks, identified by classic reverse-engineering approaches, and signaling networks identified by ad-hoc experimental approaches.
- Robert Flassig, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg. Germany.
- Title: An Effective Framework for Reconstructing Gene Regulatory Networks From Genetical Genomics Data
- Abstract: Systems Genetics approaches that rely on genetical genomics data use naturally occurring multifactorial perturbations (e.g. polymorphisms) in properly controlled and screened genetic crosses to elucidate causal relationships in biological networks. Although genetical genomics data contain rich information, a clear dissection of causes and effects as required for reconstructing gene regulatory networks is not easily possible.
- We present our recently developed framework for reconstructing gene regulatory networks from genetical genomics data . Following a simple-yet-effectiv paradigm we use genotype and phenotype correlation measures to derive an initial graph which is subsequently reduced by pruning strategies to minimize false positive predictions . Applied to realistic simulated genetic data from a recent DREAM challenge we demonstrate that our approach is simple yet effective and outperforms more complex methods (including the best performer) with respect to (i) reconstruction quality (especially for small sample sizes) and (ii) applicability to large data sets due to relatively low computational costs. We further present how our framework performs on real genetical genomics data from yeast.
-  Flassig, R. J. et al. An Effective Framework for Reconstructing Gene Regulatory Networks From Genetical Genomics Data. Bioinformatics, 29:246-54, 2013.
-  Klamt, S. et al. TRANSWESD: inferring cellular networks with transitive reduction. Bioinformatics, 26: 2160-2168, 2010.
- Vân Anh Huynh-thu & Pierre Geurts, University of Liège, Department of Electrical engineering and Computer Science, Institut Montefiore, Liège, Belgique.
- Title: Gene regulatory network inference from expression and genetic data using tree-based methods
- Abstract: One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs). In our previous work, we have proposed a method, called GENIE3, for the unsupervised inference of gene regulatory networks from expression data. This method decomposes the prediction of a regulatory network between p genes into p different regression problems and solves each of these problems using tree-based ensemble methods such as Random Forests. After a brief presentation of the method and a discussion of its performance when applied on steady-state and time series expression data, we will discuss its adaptation in the context of systems genetics data.
- The idea of systems genetics is to exploit the natural variations that exist between the DNA sequences of related individuals and that can represent the randomized and multifactorial perturbations necessary to recover GRNs. We propose two new methods, called GENIE3-SG-joint and GENIE3- SG-sep, that incorporate information about genetic markers into the original GENIE3 method. The first one builds a single joint model incorporating both expression and genetic markers, while the second one builds two separate models and aggregate their predictions a posteriori. Experiments on the artificial data of the DREAM5 Systems Genetics challenge and of the more recent StatSeq benchmark show that both methods can benefit strongly from genetic data, with however a significant advantage to the GENIE3-SG-sep method, which outperforms by a large extent the official best performing method of the DREAM5 challenge data.
- Matthieu Vignes, BIA, INRA Toulouse, France.
- Title: Exploration of methods to infer gene regulatory networks from simulated System Genetics data: what could be learnt ?
- Abstract: in this talk, we present three different methods which were applied to simulated Systems Genetics data sets (see A. de la Fuente contribution in the same meeting). (i) Bootstrapped versions of lasso and Dantzig regressions, (ii) a bootstrap greedy search in a Bayesian network modelling and (iii) a random forest algorithm were used. In particular, we emphasize the ability of these approaches to provide scores, which we believe could reflect the confidence in the prediction of relationships between genes. The aim of this work was to decide whether, in the context of network reconstruction, some methods would perform differently from others depending on simulation settings. In particular, we study the effect of network size, sample size, marker spacing and gene heritability as parameters of simulated data. We give first careful recommendations to analyse gene networks from Systems Genetics data in the form of a discussion on the results we obtained.
Abstract of contributed talks
- Alberto de la Fuente (CRS4 Bioinformatica, Italy)
- Brigitte Mangin (BIA, INRA Toulouse, France)
- Stéphane Robin (AgroParisTech, Paris, France)
- Thomas Schiex (BIA, INRA Toulouse, France)
For submission or enquiries about the meeting, please contact the organisation committee at GNISG13firstname.lastname@example.org