Titles and abstracts below are tentative and subject to change.
Genomic variation and the rise of antibiotic-resistant microbes
(Mark Adams - J. Craig Venter Institute)
Increasing rates of antibiotic resistance in bacterial pathogens have caused global concern, leading to increased attention to the diagnosis and treatment of infections with a goal of preventing the spread of resistant organisms. Understanding of the genetic basis of drug resistance is important in the design of effective diagnostic and surveillance strategies. Many resistance mechanisms are well understood, but less is known about their genetic context and the ways in which they spread through bacterial populations. Through comprehensive analysis of hundreds of genomes of Gram-negative pathogens Acinetobacter baumannii and Klebsiella pneumoniae we have found evidence for considerable variation at multiple levels from single nucleotide to gene content to insertion sequence and mobile element locations. Together these data present a picture of continued evolution in these pathogens driven by both antibiotic resistance and host adaptation.
Bioinformatic prediction of interactions from metagenomes
(Bas Dutilh - Utrecht University)
Powered by recent advances in next-generation sequencing technologies, metagenomics has unveiled the microbial biodiversity in a range of environments at unprecedented resolution. Moreover, metagenomic datasets provide information about the ecological interactions of microbes and viruses in these microbial communities. Besides inferring distributions of species and functions in the whole community, bioinformatic advances now allow complete microbial and viral genomes to be reconstructed from deeply sequenced metagenomes. I will show some examples of how ecological interactions can be inferred from metagenomes, going from "traditional" species profiling to genomes from metagenomes.
Increasing the impact of genomics and bioinformatics on Chagas disease research; Finally, there are complete genome sequences.
(Bjorn Andersson - Karolinska Institutet)
Trypanosoma cruzi, a zoonotic kinetoplastid protozoan is the causative agent of American trypanosomiasis (Chagas disease). The genome of the parasite is highly complex. It contains thousands of related copies of genes encoding a highly diverse repertoire of surface molecules, with roles in cell invasion, immune evasion and pathogenesis. Genome sequencing has therefore been difficult and the reference sequences that have been published have been incomplete and fragmented. We have produced complete genome assemblies for a T. cruzi one (TcI) and a TcII strain using PacBio high coverage single molecule sequencing, and the complete genomes have been resolved into entire chromosomes. We can now study surface molecule genes and their importance for immune evasion and pathogenesis for the first time. The organization of the surface molecule genes and the overrepresentation of repeat elements in subtelomeres suggest mechanisms for the generation of novel surface molecule variants and these have been confirmed in comparative studies. I will here present the results from this study and also several large comparative genomics efforts that are in progress, including I. Comparative analysis of 35 T. cruzi TcI isolates and clones from different geographic locations, sample sources and clinical outcomes. II. A study of TcIV strains from Venezuela and Brazil that have revealed the complex hybrid nature of this clade, and III. studies of clinical and field isolates from Ecuador.
Accelerating the Search for the Human Proteome’s “Missing Proteins”
(Shoba Ranganathan - Macquarie University)
The Human Proteome Project (HPP) will support defining what it is to be human in molecular terms. It currently strives to “know thyself” by finding high-stringency evidence for the 20,000 or so proteins encoded by the human genome. Here, we focus on what has been called the HPP’s “missing proteins” and what renders them unobservable in high-stringency proteomic experiments. Our re-analysis of publicly available mass spectrometry (MS) data for the largest missing family (olfactory receptors) along with other specific examples reveals an argument for the capture of as much additional complementary evidence as possible. To do so, we present MissingProteinPedia, a community strategy to capture and pool comprehensive biological data (complementary to the current high stringency method) with an aim to accelerate discovery of the missing human proteome.
Alternative synthetic biology strategies for medical diagnosis applications
(Franck Molina - Sys2Diag / CNRS)
Synthetic biology approaches to personalized medicine provide new ways to probe, monitor and interface human physiopathology. This emerging field applies engineering principles to design and build biological systems clinically compliant. We developed and compared alternative routes to design versatile, programmable and robust diagnostic devices closely interconnected with therapy. These methodologies require tight relationships between clinics, engineering, modeling and biotechnologies in order to address the complexity and the stochasticity of human biology.
Computational design of new protein function using homologous backbone fragments
(Sarel Fleishman - Weizmann Institute of Science)
Dr. Sarel Fleishman started his academic career in the Adi Lautman Interdisciplinary Program for Outstanding Students in Tel Aviv University where he studied molecular biology, chemistry, physics, history, and philosophy. He conducted his Ph.D. research as a Clore Doctoral Fellow with Prof. Nir Ben-Tal in Tel-Aviv University where he developed novel tools for predicting structure and dynamics in membrane proteins. Several of his predictions on membrane transporters, channels, and receptors, were subsequently proved experimentally, and Sarel earned the prestigious Science Magazine and GE Healthcare Award for Young Life Scientists for these studies. Following completion of his Ph.D., Sarel conducted his postdoctoral training as a Human Frontier Fellow with Prof. David Baker at the University of Washington (Seattle), where he developed the first general methodology for de novo design of protein interactions. He applied this approach to generate novel protein inhibitors of influenza hemagglutinin, which neutralize pathogenic influenza strains. Such methods could unlock the vast potential of controlling molecular interaction networks, producing novel diagnostics and therapeutics. Sarel joined the Weizmann Institute's Department of Biomolecular Sciences in the summer of 2011.
(Natasa Przulj - University College London)
We are faced with a flood of molecular and clinical data. Various biomolecules interact in a cell to perform biological function, forming large, complex systems. Large amounts of patient-specific datasets are available, providing complementary information on the same disease type. The challenge is how to mine these complex data systems to answer fundamental questions, gain new insight into diseases and improve therapeutics. Just as computational approaches for analyzing genetic sequence data have revolutionized biological understanding, the expectation is that analyses of networked “omics” and clinical data will have similar ground-breaking impacts. However, dealing with these data is nontrivial, since many questions we ask about them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions. We develop methods for extracting new biomedical knowledge from the wiring patterns of large networked biomedical data, linking network wiring patterns with function and translating the information hidden in the wiring patterns into everyday language. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and re-purposing of drugs for treating particular cancer patient groups. Our new methods stem from network science approaches coupled with graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets.
Bioinformatics for personalized medicine, a French experience
(François Artiguenave - Centre National de Genotypage)
The CNG, the French national platform for Human Genomic is involved in national and international projects for technological development and disease studies. It is involved in many projects to identify genetics causes for different diseases, rare diseases, complex disease as autism, as well as to identify genetic contribution to drug toxicity, in breast cancer for example. Recently, the CNG has increased significantly its sequencing capacity in order to tackle the raising need for genomics data in medical practice. I will give an overview of informatics / bioinformatics challenges that has to be solved to leverage the capacity of analysis. I will give some details on solutions implemented at CNG for WGS, RNA or epigenetics workflows. In the context of Personalized medicine, I will also present solutions we propose to deliver results to end-users (clinicians) to let them benefit of genome information in clinical decision, giving some indications on performance, security and scalability of the system.
Systems BioMedicine - When signaling networks meet multiple OMICS data types
(Jan Baumbach - University of Southern Denmark)
Recent advances in modern OMICS technology allow measuring the expression of all kinds of biological entities (genes, proteins, metabolites, miRNAs, etc.) at low cost and in high-throughput. Computational challenges for analyzing such big data emerge, ranging from the low signal to noise ratio to high model complexity, which render simple statistical questions arbitrarily complicated. We will discuss three bioinformatics tools for de-isolating biological networks and multiple OMICS data types: de novo pathway enrichment, in vitro high-throughput screening (HTS) data integration, and cancer subtyping. Using Huntington’s disease patients’ expression data as running example, we will employ a guilt-by-association approach to illuminate the power of networks to identify novel disease mechanisms. We will then extend this principle to study HTS data gained from large-scale chemical compound screens, siRNA knock-down and CRISPR/CAS9 knock-out screens, as well as microRNA inhibitor and -mimics screens. We will present two use cases that demonstrate how one may fully exploit HTS screening data in quite heterogeneous contexts to generate novel hypotheses for follow-up experiments: (1) a combined siRNA and miRNA mimics screen on vorinostat resistance, and (2) a small compound screen on KRAS synthetic lethality. Finally, we discuss how this kind of computational network biology has strong potential to enable precision medicine by classifying breast cancer subtypes utilizing complex combo-features gained from combining networks with gene expression data and DNA methylation data.
Co-evolution between Argonaute proteins and small non-coding RNAs
(João Marques - Universidade Federal de Minas Gerais)
RNA interference utilizes small non-coding RNAs associated with Argonaute proteins to regulate gene expression. A large diversity of Argonautes and small RNAs define different RNA interference mechanisms in eukaryotes. However, it remains unclear how new populations of small RNAs evolve compared to the phylogeny of Argonautes. Here, we compared small RNAs in Drosophila melanogaster, Aedes aegypti, and Lutzomyia longipalpis representing three branches of dipteran insects separated by ~200 million years. Insects are highly diverse and have four major clades of Argonautes that associate with three distinct classes of small RNAs. Since most small RNA classes cannot be identified by sequence conservation, we utilized molecular characteristics to identify, classify and analyze the evolution of small RNA populations in the three dipterans compared to the classical phylogeny of Argonaute proteins. Altogether, our results show that molecular characteristics of small RNAs respond in different manners to changes in Argonaute proteins.
Using a combined NGS/bioinformatics approach to identify microRNAs and their targets
(Simon Moxon - The University of East Anglia)
MicroRNAs (miRNAs) are short non-coding RNAs with critical functions in gene regulation. They regulate gene expression post-transcriptionally by binding to target mRNAs leading to transcript degradation. My group uses computational approaches to analyse a variety of next-generation sequencing datasets to identify novel miRNA sequences and miRNA-mRNA interactions. I will talk about approaches for miRNA prediction and introduce novel methods for target identification and visualisation of miRNA-target interaction networks.
Global analysis of the RNA secondary structure and RNA-protein interaction landscapes of plants
(Brian Gregory - University of Pennsylvania)
At the heart of post-transcriptional regulatory pathways in eukaryotes are cis- and trans-acting features and factors including RNA secondary structure as well as RNA-binding proteins (RBPs) and their recognition sites on target RNAs. This is especially evident for RNA molecules whose functionality, maturation, and regulation requires formation of correct secondary structure and RNA-protein interactions. However, the global influence of these features on plant gene expression is still largely unclear. We have recently developed a high-throughput sequencing based approach that allows a simultaneous view of the RNA secondary structure and RNA-protein interaction site landscapes transcriptomewide in eukaryotes. We have used this approach on multiple plant species and during their responses to various conditions and treatments. Our most recent findings from these studies will be presented.
Mobile real-time surveillance of Zika virus in Brazil
(Nuno Faria - University of Oxford)
Knowledge of Zika virus (ZIKV) genomic epidemiology is currently limited, owing to challenges in obtaining and processing samples for sequencing. In order to improve this situation we initiated a pilot project, named ZiBRA (Zika in Brazil real-time analysis), which aimed at improving the molecular surveillance and sequencing of ZIKV in Brazil directly from clinical samples. Between 2 to 17 June 2016, we engaged in a 2000km road trip across Rio Grande do Norte, Paraíba, Recife, Maceió and Bahia. Using our mobile lab and central laboratory facilities, we tested 1,349 samples and found 180 ZIKV RNA+ isolates. PCR results were shared with the central laboratories and the Ministry of Health within 48 hours of analysis. Complete genomic characterisation was performed using portable whole genome sequencing after tiling PCR. The novel genome data generated by ZiBRA represents over a third of all publicly available data from the Americas. These data help elucidating the origins of ZIKV in Brazil and clarify its epidemic trajectory across the Americas.
Comparative and evolutionary genomics reveal idiosyncrasies linked to plant-parasitism and asexual reproduction in a devastating plant pest.
(Etienne Danchin - French National Institute for Agricultural Research)
With several billion dollars of damages to worldwide agriculture every year, the root-knot nematodes are one of the most damaging pests. In collaboration with an international community, our laboratory has conducted analysis of the genome of the most damaging root-knot nematode: Meloidogyne incognita. Comparative and evolutionary genomics analyzes have revealed three main singularities in this genome: (i) a whole set of genes involved in degradation of plant carbohydrates acquired via horizontal gene transfers. Enzymes for the degradation of the plant cell walls are usually not coded by the genomes of animals but by their associated gut microorganisms. However, in the genome of M. incognita, we have identified a full set of genes coding enzymes for the degradation of the plant cell wall. These genes have no homologs in other animals but resemble bacterial genes. Systematic phylogenetic analyses showed that these genes have been acquired via horizontal gene transfers and domesticated by the nematode genome. (ii) a proportion of genes specific to plant parasites representing good candidates for nematode control. Comparison of the M. incognita predicted proteins to those of free-living nematodes and other animals reveal a proportion of ‘pioneer’ genes without clear homology in animals. Further bioinformatics analyses revealed that these genes are actually transcribed by the nematode and that they are specific to root-knot nematodes and possibly other plant parasites. Silencing of some of these genes via RNAi yielded reduced infection by the nematode, suggesting these genes represent interesting targets for new and safer control methods. (iii) a peculiar duplicated and diverged genome structure linked to a hybrid origin and asexual reproduction. The genome of M. incognita is made of duplicated blocks of synteny with high nucleotide divergence. These block present synteny breakpoints and are sometimes on a same scaffold which is incompatible with conventional meiosis. We have shown that these blocks most probably result from recent hybridization events and allow functional divergence between the gene copies. This peculiar genome structure might thus allow plasticity in the absence of sexual reproduction. I will briefly present these three main topics and further detail one of these during the presentation.
Inferring Latin American evolutionary history using Approximate Bayesian Computation (ABC)
(Eduardo Tarazona - Universidade Federal de Minas Gerais)
ABC is a general, powerful and flexible computational approach that has been introduced in populations genetics few years ago and become popular. It allows for model-based inferences about the evolutionary history of human populations. ABC inferences requires a set of validation of performed inferences. We show how our research group is using genomewide data and ABC to address questions about the evolutionary history of Native American populations, as well as to study the admixture dynamics in South America during the last 500 years.
Animal Genomics and Phylogenomics: software development and comparative analyses - Young Scientist Highlight
(Francisco Prosdocimi - Universidade Federal do Rio de Janeiro)
Since the publication of the worm genome in 1998, animals have been an important target of genome sequencing. A comprehensive analysis of ~40 animal genomes available on ENSEMBL database suggested that genomes in version 1.0 are still draft and may present significant amount of errors (Prosdocimi et al., 2012a). After studying the genomes of dozen animals for the first time, we have produced an intellectual workflow to apply on partial genome data: the generic genome paper. In order to accomplish the first step in the workflow, we developed a set of bioinformatics tools to assemble, annotate and perform phylogenomics analyses of mitochondria (Uliano-Silva et al., 2016). The most striking results obtained in the genome sequencing and analyses of 40 complete bird genomes will be shown, including the tree obtained from a super-matrix dataset containing 8251 concatenated genes (Jarvis et al., 2014). Finally, a software to find anonymous loci in complete genomes and execute phylogenomics analyses will be presented (Costa et al., 2016).
Let’s work together: Integrate, analyze, and discovery with BaseSpace Sequence Hub
(Christopher Larson - Illumina)
BaseSpace Sequence Hub is the Illumina cloud-based genomics computing environment for next-generation sequencing (NGS) data management and analysis. Sequencing labs can store and share sequencing data, and researchers can simplify and accelerate NGS data analysis with push-button tools. Labs can also set-up and monitor their sequencing runs in real time on any Illumina instrument. BaseSpace Sequence Hub can be accessed via an intuitive web-based interface or Linux-based command line tool. In this seminar we will introduce and guide you into BaseSpace and show how this platform can be the right choice for beginners and experts in bioinformatics.
HEAT-Seq target enrichment system. New option for oncology genes analysis
(Felipe Camargo Braga - Roche)
The ability to detect genetic variants quickly and efficiently is an increasingly important consideration in the selection of a sequencing-based research tool. While Whole Exome Sequencing (WES) can serve as comprehensive tools for detecting variants, they are also accompanied by a greater sequencing and informatics analysis burden. Focused target enrichment methods simplify the analysis and enable higher throughput. One of those techniques is Molecular Inversion Probes (MIPs) which brings additional advantages such as the elimination of a separate library prep, fast workflow and inherent scalability.
Agilent Whole Transcriptome & Targeted RNA-seq Solutions: More Agile, More Flexible Options
(Yuri Moreira - Agilent)
RNA-Seq is a powerful tool to determine expression levels of various transcripts using shallow sequencing approaches. However, in order to find novel splice variants, gene fusions and rare transcripts, a deeper sequencing is required, while keeping strand information is also critical to capture overlapping antisense transcription and understanding novel transcribed regions. Agilent’s SureSelect Strand Specific RNA Library Preparation Kit offers a highly sensitive, strand-specific method for preparing libraries for whole transcriptome analysis. It is also compatible with targeted RNA-seq, allowing the focus of the research on the regions that matter most, for better discovery of novel transcripts and gene fusions. The SureSelect RNA-Seq Library Preparation Kit is an integral part of Agilent’s complete NGS Gene Regulation solutions, including SureSelect RNA Target Enrichment, SureSelect Human Methyl-Seq, and GeneSpring NGS data analysis.