Monday, November 3, 2014

Liftover - what gets lost in translation?

Many genomic analysis require the conversion of genomic coordinates of one genome into another. When a new version of the genome assembly is created, all the data (such as annotation, recombination map, variant calls, genomic features) associated with the older version of the genome need to be "lifted" over to the new genome. 

Similarly, when performing comparative genomic analysis, the homologous regions of genomes need to be identified between genomes that are sometime rather distant. The initial draft annotations of genomes use the lifted over annotations from high quality genomes as the primary raw material. This is an extremely important source of information when RNAseq data is not available. 

Given the increasing importance of such lifting over of genomic coordinates, new tools that can perform this rather simple sounding task are increasing in number. Apart from the established liftOver utility from UCSC, CrossMap, Kraken, NCBI genome remapping service also provide well established pipelines for converting the genomic coordinates for files in different formats such as bed, gff, SAM(& BAM), VCF, Wiggle, BigWig, etc. The UCSC liftOver tool has been implemented in the GALAXY suite, used in VTools, pyliftOver and rtracklayer (a Bioconductor package). All implementations of the UCSC liftover tool as well as CrossMap use the chain files that require many steps to create. If you are lucky enough to be working on one of the genome supported by UCSC, you can download the ready to use chain files directly from the UCSC website.

The original paper that introduced the liftOver utility has some useful technical information about how the the process works and also performance in different regions of the genome. Kraken, uses a different method that does not require all vs all alignments to perform the lifting over of coordinates. One the key features of the liftOver utility is its ability to "handle large gaps" in alignment. Due to its robustness and ease of use, the liftOver utility has been used in numerous studies in very different studies. 

In a series of blog posts, we will analyze and quantify the performance of the UCSC liftOver utility.



No comments: