ABySS 1.0 was the first genome sequence assembly software capable of assembling a human genome using short read sequencing data. To aggregate enough memory to make that possible, multiple machines worked together in parallel and communicated using the message passing interface (MPI). ABySS 2.0, on the other hand, reduces the amount of memory required to assemble a human genome by using an efficient data structure to store the sequencing data, called a Bloom filter. The assembly is parallelized using OpenMPI rather than MPI, which requires that all the memory be available on a single machine. Message passing is no longer needed as the parallel computation threads share the same memory. The memory required is ten fold less, so a single machine is typically all that is needed.
ABySS 2.0 assembles scaffolds of DNA sequence that are megabases in size using only Illumina short read sequencing data. Combined with a BioNano optical map, most chromosome arms are assembled in two to four scaffolds. The smaller chromosome arms are assembled in a single scaffold. Genome sequence scaffolds contain gaps where sequencing data is missing or where the sequence is too complex to resolve using short read sequencing. Filling in these gaps using only short read sequencing is a remaining challenge of genome sequence assembly.
Read the preprint on bioRxiv doi:10.1101/068338. The ABySS 2.0 software will be tagged and released once the paper has been reviewed. Check out out the ABySS GitHub branch
bloom-abyss-preview in the mean time.