My undergraduate degree was in Applied Physics at Delft University of Technology (Netherlands), and I obtained my Ph.D. in 2001 in Physics from the University of California, Berkeley, where I was first introduced to computational biology. I did a postdoc in bioinformatics first at the University of Tokyo and then at Columbia University in New York, and in 2007 I moved to RIKEN in Yokohama where I am now team leader of the Laboratory of Applied Computational Genomics in the RIKEN Center for Integrative Medical Sciences. At RIKEN I have been heavily involved in the FANTOM (Functional Annotation of Mammalian Genomes) project, a large consortium effort to understand the mammalian (mainly human and mouse) genome and transcriptome. I am currently co-leading (together with Jay Shin and Piero Carninci at RIKEN) the sixth edition of FANTOM, where our goal is to functionally classify and annotate long non-coding RNAs.
Functional Annotation of the Mammalian Genome in the FANTOM projects
Mammals are complex multicellular organisms composed of hundreds of cell types that vary widely in shape, function, development, mutual interactions, and localization. This extraordinary variety in cellular behavior is achieved by using the same genomic information encoded in the DNA in different ways, in particular by expressing coding and non-coding transcripts in a cell type specific manner under the control of transcription factors and regulatory RNAs. FANTOM (Functional ANnoTation Of the Mammalian genome) is an international research consortium that aims at a comprehensive identification of mammalian transcripts as well as their functional annotation.
In the fifth edition of FANTOM (FANTOM5) [1,2], we have used single molecule sequencing across a broad panel of primary cells, cell lines and tissues, to produce a comprehensive atlas of gene expression in mammalian cells by mapping transcription start sites at single-nucleotide resolution using CAGE (Cap Analysis Gene Expression).
Using this atlas, we identified cell type specific promoter usage, key transcription factors, novel transcripts, as well as enhancer activity profiles as signatures of cell states . In addition, we performed short RNA sequencing to profile microRNAs in human and mouse cells, and systematically identified the transcription start site of primary microRNA transcripts in human and mouse .
We also used CAGE to identify with high confidence the 5' end and therefore the promoter region of 27,919 long non-coding RNA (lncRNA) genes in human . An analysis of expression quantitative trait loci
(eQTL)- and disease-associated single nucleotide polymorphisms (SNPs) overlapping lncRNA loci suggested biological significance of lncRNAs regulation and disease .
While the number of lncRNAs encoded in mammalian genomes exceed those of protein-coding genes, for the vast majority of lncRNAs no functional annotation is currently available. In the sixth edition of FANTOM, we build on our unique expression atlas and collection of lncRNA annotations to create the first broad functional annotation and categorization of lncRNAs.
 Forrest AR, et al. Nature 507: 462-470 (2014).
 Arner E, et al. Science 347: 1010-1014 (2015).
 Andersson R, et al. Nature 407: 455-561 (2014).
 De Rie D, et al., Nature Biotechnology 35: 872-878 (2017).
 Hon CC, et al., Nature 543: 199-204 (2017).