RNA-seq is being used by a diverse group of users for investigating very diverse problems. De novo assembly has been seen as easy to use and effective way to utilize the power of RNA-sequencing in non-model organisms. While genome assembly programs have been investigated by various reviews and others such as ASEMBLETHON and GAGE, de novo transcriptome assemblers have not been investigated in great detail. We looked at de novo transcriptome assemblies generated by Illumina assemblers from many perspectives. Apart from the obvious structural error characterization, other factors such as sequencing error, polymorphism (pi), paralogs to name a few.
Using real data would be ideal to test programs, however, in this case it becomes difficult to distinguish artifacts from novel biology, so simulation is a good idea to perform some sort of validation and quantification of errors of various types.
The simulation paper on RNA-seq for the special issue on Next-gen sequencing is finally available online : Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments.