DeepG4: A deep learning model for sequence-driven DNA G4 formation

Vincent Rocher (CBI, Université Paul Sabatier)

19 mars 2021

G-Quadruplex (G4) are alternative DNA secondary structures composed of Guanine-rich DNA sequences which can form a four-stranded structure based on a simple strand, and let the second one free. These structures have been found initially on telomeres, but more recent studies found an enrichment of theses structures on promoters of active genes, and suggest an active role in transcription of these genes. Former in-silico methods to detect and study G4 remained mostly on the detection of a specific motif chain, but recent methods have been developed to identify G4 at genome-wide scale using Next Generation sequencing approach, like G4-seq (in-vitro G4) and BG4-seq (in-vivo). Here, we propose a sequence-based computational Deep learning model to predict in-vivo DNA G4 using the DNA sequences of BG4-seq peaks, in order to detect new motifs involved in the G4 prediction. Deep learning is a recent and popular Machine learning set of approaches where model learn features directly from the data, meaning that we could identify de-novo motifs that are related to G4 prediction. This model can be applied to any DNA sequence to predict the G4 formation, and be used in genetics to study the impact of SNP’s on the DNA G4 formation propensities.;;