The SMRT way to sequence a yeast: de novo genome sequencing and assembly with PacBio

Dr Richard Edwards
School of BABS
4 September 2015 - 3:00pm
Rountree Room 356, Level 3, Biological Sciences Building D26

We have performed PacBio single molecule real time (SMRT) sequencing of three yeast whole genomes. A haploid reference yeast strain (S288C) and two novel diploid strains were sequenced as part of a larger functional genomics project. For each strain, 2-2.6 Gb of usable sequence data was generated with read lengths of up to 53.3 kb. Pure PacBio whole genome de novo assemblies were generated using the HGAP3 pipeline. initial assembly of S288C yielded over 99.9% genome coverage at 99.997% accuracy with 15 of 17 reference chromosomes (16 nuclear chromosomes plus mitochondrion) essentially returned as a single, complete unitig. We are now using the S288C data to optimise the assembly process and derive assembly settings for the two novel strains. To this end, we have developed a new pipeline for the comparative assessment of high quality whole genomes against a reference. We are also exploring the trade-off between accuracy and sequencing depth of the PacBio “pre-assembly” and how this affects the final assembly.

Biography: Rich Edwards is a Senior Lecturer in Bioinformatics in BABS. Originally from southern England, Rich trained a geneticist at the University of Nottingham before moving to Dublin, Ireland to become a full time bioinformatician in 2001. The lab was established in 2007 when Rich moved to Southampton, before moving to UNSW in 2013. Core research in the lab focuses on protein and DNA sequence analysis, protein-protein interactions (including host-pathogen interactions) and molecular evolution.