Statistics and Algorithms for Biology (SaAB)
Team leader : Simon de Givry (+33 561285074) simon[dot]de-givry[at]inra[dot]fr
The team develops mathematical, statistical and computational methods to address life science research problems. These methods are usually directly made available to biologists through dedicated software.
Bioinformatics problems addressed
The topics addressed in the team concern the localization and identification of functional elements in bacterial, plant and animal genomes. Three investigation levels are considered.
- Genetical level A genome is essentially seen through molecular markers whose locations on a chromosome are highly informative in genetics investigation. Localizing these markers on the chromosomes (genetic mapping and radiated hybrid mapping: Carthagène) in order to subsequently locate the regions linked to quantitative traits of interest (disease resistance, yield ...) with respect to those markers (QTL or quantitative trait loci localization by analyzing allelic transmission: MCQTL and by modelling linkage disequilibrium: HAPim). These QTLs can then be used in selecting varieties that combine several desirable traits.
- Molecular level At the molecular level, the DNA sequence of the genome is directly analyzed to decode and identify functional regions in the sequence. These may be genes coding for proteins (in bacterial genomes and EST cclusters FrameD or in eukaryotic genomes: EuGène) or non coding genes corresponding to functional RNAs ( MilPat, DARN!, ApolloRNA, RNAspace). The comparison of genomes of different species and identification of key events that separate them (recombination) can enable the transfer of information between genomes.
- Gene expression level The use of DNA microarrays allows to partially observe the cellular activity at a given time. It is then possible to establish a link between the contextual conditions of the cell at observation time (disease, polluted environment) and the genes that are over (or under) expressed. This link may help trace the genes related to disease or allow for a diagnosis.
To go beyond the localization of isolated functional elements, we are are now increasingly interested in approaches aiming at the inference of gene regulatory networks. We are currently studying the simultaneous analysis of expression data and polymorphism data (such as SNP) on a collection of individuals. This allows to observe different perturbated modes of operation of the network to better infer gene network structures.
Statistical and computer science methods
To address the above problems, the team exploits and develops methods in mathematics, statistics, probability (modeling, inference, mixture models, penalized regression, graph-based models, processes), and computer science (modeling, combinatorial optimization, constraint networks, algorithmics). The goal is to embed the methods developed in software tools that can be used directly by biologists and that faithfully account for the complexity and variety of usable data.
The team develops innovative methods, especially in the field of combinatorial optimization on weighted constraint networks, a type of graph-based model that is dedicated to optimization and that generalizes constraint networks used in constraint programming. These techniques, implemented in the software  (developed by the team and top-ranked in various international competitions), are then used in bioinformatics problems (localization of RNAs of known families, diagnoses of complex pedigrees large size, computational protein design,...).
On this topic, our closest partners are the Institut de Recherche en Informatique de Toulouse and the ONERA research center of Toulouse. Toulbar2 also benefits from collaboration with the University of Caen (GREYC), University of Aix-Marseilles (LSIS),the Polytechnic University of Catalonia and the Artificial Intelligence Research Institute) in Barcelona (CSIC), and Chinese University of Hong Kong (CUHK) ().
Cette catégorie ne contient actuellement aucune page ni fichier multimédia.