ABySS 2.0

ABySS 1.0 was the first genome sequence assembly software capable of assembling a human genome using short read sequencing data. To aggregate enough memory to make that possible, multiple machines worked together in parallel and communicated using the message passing interface (MPI). ABySS 2.0, on the other hand, reduces the amount of memory required to assemble a human genome by using an efficient data structure to store the sequencing data, called a Bloom filter. The assembly is parallelized using OpenMP rather than MPI, which requires that all the memory be available on a single machine. Message passing is no longer needed as the parallel computation threads share the same memory. The memory required is ten fold less, so a single machine is typically all that is needed.

ABySS 2.0 assembles scaffolds of DNA sequence that are megabases in size using only Illumina short read sequencing data. Combined with a BioNano optical map, most chromosome arms are assembled in two to four scaffolds. The smaller chromosome arms are assembled in a single scaffold. Genome sequence scaffolds contain gaps where sequencing data is missing or where the sequence is too complex to resolve using short read sequencing. Filling in these gaps using only short read sequencing is a remaining challenge of genome sequence assembly.

Read the ABySS 2.0 paper published in the special issue on genome assembly of Genome Research, doi:10.1101/gr.214346.116. ABySS 2.0 has been released and is available at https://github.com/bcgsc/abyss.

ABySS 2.0 chromosomes