I develop software that takes fragmented genome sequencing data and attempts to reassemble the original genome from which the short fragments were derived. I analyze genome sequencing data from a variety of species, including spruce trees and human. I have a PhD in bioinformatics, and my undergraduate degree is in computer engineering. I'm a programmer, an avid traveller, a singer, and an enthusiastic cook. I live in Vancouver, Canada.

ORCA

A Comprehensive Bioinformatics Container Environment for Education and Research

The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact... [Read More]

Three Minute Thesis 2019

DNA Sequencing and Assembly

Bioinformatics applies programming and statistics to better understand biology. I first became interested in bioinformatics when in 2003 the Human Genome Project had just sequenced the three billion nucleotides of the human genome. A genome sequencing machine is much like a scanner for scanning paper documents. It would ideally scan... [Read More]

Tigmint

Correct Misassemblies Using Linked Reads From Large Molecules

Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to... [Read More]

ABySS 2.0

Resource-Efficient Assembly of Large Genomes using a Bloom Filter

ABySS 1.0 was the first genome sequence assembly software capable of assembling a human genome using short read sequencing data. To aggregate enough memory to make that possible, multiple machines worked together in parallel and communicated using the message passing interface (MPI). ABySS 2.0, on the other hand, reduces the... [Read More]

Automating data-analysis pipelines using R and Make

Slides and a hands-on activity

Slides and a hands-on activity ‘Automating’ comes from the roots ‘auto-‘ meaning ‘self-‘, and ‘mating’, meaning ‘screwing’. Bioinformatics analysis often involves designing a pipeline of commands and running that pipeline on many data sets. There are many ways to tackle this common task. Running commands interactively at the command line... [Read More]

UniqTag

Abbreviate gene sequences to unique and stable identifiers

UniqTag is used to abbreviate gene sequences to unique and stable identifiers. It selects a representative k-mer from the sequence of each gene to be used as a systematic identifier for that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without... [Read More]

ABySS

Genome sequence assembler for large genomes

ABySS is a genome sequence assembler that distributes the computation of large genome sequence assembly over a cluster of computers using MPI. ABySS was used assemble the twenty gigabase white spruce (Picea glauca) genome, seven times the size of the human genome. [Read More]