William Pearson - University of Virginia School of Medicine

William Pearson
William Pearson

Dr. Pearson received his PhD in Biochemistry from the California Institute of Technology and did a post-doctoral fellowship at Johns Hopkins School of Medicine. In 1983, he joined the faculty in the Department of Biochemistry at the University of Virginia, and, while waiting to recruit his first graduate student, wrote the FASTP program for rapid protein similarity searches (working with David Lipman). FASTP evolved into FASTA, which revolutionized the process of making DNA and protein sequence comparisons by providing faster, more rigorous approaches and improved scoring techniques. The FASTA text-based sequence format became a standard format in bioinformatics. FASTA is the second most widely used program for searching protein and DNA sequence databases. More recently, Dr. Pearson has become interested in more effective strategies for integrating sequence annotations into sequence alignments. Dr. Pearson has published more than 75 articles and 23 book chapters. His 1988 PNAS paper co-authored with David Lipman, “Improved tools for Biological Sequence Comparison”, has been cited more than 10,000 times.
Dr. Pearson is a fellow of the American Association for the Advancement of Science. He speaks widely on bioinformatics topics, teaches a course in computational biology at the University of Virginia, has co-directed the CSHL Computational and Comparative Genomics course for 20 years, and participated in the Woods Hole Workshop on Molecular Evolution since 1995. In 2018, he was elected a Fellow of the International Society for Computational Biology.

Abstract:

From Sequences to Science: lessons and challenges from 30 years of biological sequence comparison

More than 30 years ago, the convergence of cloning technology, DNA sequencing, publically available protein and DNA databases, and rapid biological sequence comparison revolutionized biological discovery. New proteins were sequenced based on differential expression rather than enzyme purification, and hundreds of novel biological pathways were discovered. Ten years later, genomes were sequenced, and for the first time, biologists had complete sets of proteins from living organisms. More recently, next-generation sequencing technologies have produced explosive growth in biological sequence databases. As a result, the pre-genome focus on search sensitivity must shift towards a post-genome focus on alignment accuracy. The size and redundancy of modern sequence databases requires improvement and evaluation strategies that confront special challenges of working with datasets with complex evolutionary histories. While similarity searching will remain the most powerful tool for annotating genome function, new approaches will exploit the non-sequence functional and genetic resources to improve sequence-based biological inference.