Monday, June 16, 2014

Largest known protein - Titin

With a length of more than 30Kb Titin is the largest known protein in the Human genome. Due to its repeated use of the same domains, it is thought have undergone very unpredictable exon losses. 

Apart from the biological reality of exon loss, the incredible length of the gene makes it prone to annotation errors. Prevalence of such large scale annotation errors makes it impossible to study the intricacies of the biology of such a gene. This is an attempt to identify such potential errors in annotation. The hope is that it will contribute to improving the annotation. 

Mutations in the Titin gene have been implicated in many diseases. Being the longest gene also makes it interesting from an evolutionary point of view. 

Chicken:

The human version of the gene (located on chr-2) has 363 annotated exons as per Human release 75 of Ensemble. The chicken version of the gene (located on chr-7) has only 47 annotated exons as per Chicken release 75 of Ensemble. Flanking genes are PLEKHA3 and CCDC141. However, a new gene, "ENSGALG00000026366" has been annotated in Chicken between PLEKHA3 and TITIN. While it has a name like "gga-mir-7474" and a link to the mirbase, it has gene type as protein coding. If that was not confusing enough, this gene has an aminoacid length of ~30Kb (has 269 exons). Given its location and length, it appears that the Titin gene has been incorrectly split into two genes (Titin itself and gga-mir-7474). As expected, the two genes are connected by a chicken cDNA EST (see figure below).



This "gga-mir-7474" gene gets top blast hit from Titin. 

So based on EST data and blast data these two genes can be merged as part of the Titin gene. We are still short by 47 more exons.

Flycatcher:

The release 75 of Ensemble has a ~61Kb long TTN gene with 106 exons annotated in Flycatcher. Two very short genes (with 1 and 2 exons) downstream from this TTN gene, ENSFALT00000015943 and ENSFALT00000003626 also give top blast hit to the TTN gene.

Even with availability of EST evidence, some very good objective predictions, the annotation seems rather sketchy.  


No comments: