Assembling Genomes Using Large Molecule Sequencing

Shaun D Jackman

2016-06-30

Assembling Genomes Using Large Molecule Sequencing

Shaun Jackman @sjackman

2016-06-30

Creative Commons Attribution License

Fork me on GitHub!

Shaun Jackman

BC Cancer Agency Genome Sciences Centre
Vancouver, Canada
@sjackman | github.com/sjackman | sjackman.ca

Sequencing Technologies

Paired-end Sequencing

  • 40 kbp Fosmid
  • 200 kbp BAC
  • 800 bp Illumina paired-end
  • 6 kbp Illumina mate-pair

Large Molecule Sequencing

  • PacBio
    • up to 100 kbp reads
  • Oxford Nanopore
    • up to 500 kbp reads

New Technologies

  • 10x Genomics Chromium Linked Reads
    • up to 200 kbp molecules
  • BioNano Genomics Irys
    • up to 1000 kbp molecules

Assemble Long Reads

Long Read Assemblers

Polish

10x Genomics Chromium

10x Genomics Chromium Linked Reads
10x Genomics Chromium Linked Reads

Assemble Chromium

Scaffold with Chromium

BioNano Genomics Irys

BioNano Genomics Irys
BioNano Genomics Irys

Scaffold with BioNano

Sitka Spruce Picea sitchensis

Sitka Spruce Mitochondrion

Aim

Assemble the Sitka spruce mitochondrion into a single scaffold* using 10x Chromium data.

* if it has a single chromosome

Method

  • Align Sitka spruce reads to white spruce organelles
  • Filter out mismapped nuclear reads (AS < 40)
  • Identify 10x barcodes that contain at least one mitochondrial molecule
    • Four properly-paired mitochondrial reads
  • Select all the reads of these mitochondrial barcodes
  • Assemble with ABySS
  • Scaffold with ARCS and LINKS
  • Fill gaps with Sealer
  • Annotate genes with MAKER
  • Submit to NCBI GenBank

fin

Shaun Jackman

BC Cancer Agency Genome Sciences Centre
Vancouver, Canada
@sjackman | github.com/sjackman | sjackman.ca