Shaun Jackman

ORCA

A Comprehensive Bioinformatics Container Environment for Education and Research

Posted on April 11, 2019

The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact... [Read More]

Three Minute Thesis 2019

DNA Sequencing and Assembly

Posted on February 28, 2019

Bioinformatics applies programming and statistics to better understand biology. I first became interested in bioinformatics when in 2003 the Human Genome Project had just sequenced the three billion nucleotides of the human genome. A genome sequencing machine is much like a scanner for scanning paper documents. It would ideally scan... [Read More]

Tigmint

Correct Misassemblies Using Linked Reads From Large Molecules

Posted on July 25, 2017

Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to... [Read More]

ABySS 2.0

Resource-Efficient Assembly of Large Genomes using a Bloom Filter

Posted on August 8, 2016

ABySS 1.0 was the first genome sequence assembly software capable of assembling a human genome using short read sequencing data. To aggregate enough memory to make that possible, multiple machines worked together in parallel and communicated using the message passing interface (MPI). ABySS 2.0, on the other hand, reduces the... [Read More]

How to read the written plan of living cells

My research explained using only the ten hundred words used most often

Posted on March 2, 2016

The written plan of living cells is very large. Computers read these plans, but they are not very good at it. They read lots of small little bits of the written plan at a time, and that makes lots of problems. I make a computer thing that takes the small... [Read More]

Linuxbrew and Homebrew-Science

The Homebrew package manager for Linux

Posted on March 1, 2016

Linuxbrew is a package manager for Linux derived from Homebrew, the Mac OS package manager. It can be installed in your home directory and does not require root access. The same package manager can be used on both your Linux server and your Mac laptop. Installing a modern version of... [Read More]

Automating data-analysis pipelines using R and Make

Slides and a hands-on activity

Posted on February 15, 2016

Slides and a hands-on activity ‘Automating’ comes from the roots ‘auto-‘ meaning ‘self-‘, and ‘mating’, meaning ‘screwing’. Bioinformatics analysis often involves designing a pipeline of commands and running that pipeline on many data sets. There are many ways to tackle this common task. Running commands interactively at the command line... [Read More]

Organellar genomes of white spruce

The plastid and mitochondrial genomes of Picea glauca

Posted on February 13, 2016

Chloroplast genomes of gymnosperms, including conifers, are well studied, but little is known about the mitochondria of gymnosperms. In fact, only a single gymnosperm mitochondrion is found in NCBI GenBank, and no conifer mitochondrion genomes are to be found at all, until now. Roughly one percent of the whole genome... [Read More]

UniqTag

Abbreviate gene sequences to unique and stable identifiers

Posted on August 8, 2014

UniqTag is used to abbreviate gene sequences to unique and stable identifiers. It selects a representative k-mer from the sequence of each gene to be used as a systematic identifier for that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without... [Read More]

White spruce genome assembly

Assembly of a twenty gigabase genome using ABySS

Posted on July 13, 2014

The SMarTForests project used ABySS to assemble the genome of white spruce (Picea glauca), which is twenty gigabases, seven times larger than the human genome. I also assembled and annotated the organellar genomes of white spruce, the chloroplast and mitochondrion. [Read More]

ABySS

Genome sequence assembler for large genomes

Posted on July 12, 2014

ABySS is a genome sequence assembler that distributes the computation of large genome sequence assembly over a cluster of computers using MPI. ABySS was used assemble the twenty gigabase white spruce (Picea glauca) genome, seven times the size of the human genome. [Read More]