Friday, August 22, 2014

Merge multiple kmer count hashes into one

A previous attempt at merging two kmer count hashes was neither memory efficient nor capable of merging multiple kmer count hashes. Here, we use the age old trick of sorting to write a more memory efficient script that can handle as N number of hashes.


 cat *_"$kmer"_counts.fa|sort > sorted_"$kmer"_all.fa  

The above command will concatenate all the hashes and sort it. This sorted file can then be used by the below perl script to merge the hashes. Since, all kmers that need to be merged are in adjacent lines, the memory needed for merging is drastically reduced compared to the previous script.


 #!/usr/bin/perl  
 use warnings;  
 # Input parameters  
 open FASTA1, $ARGV[0] or die $!;  
 my $previous="Kmer";  
 my $previousCount="Kmercount"; 
 my @jelly;
  while($line = <FASTA1>){  
 chomp $line;  
 @jelly=split(/\s+/,$line);  
      if($previous=~/$jelly[0]/){  
      $previousCount=$previousCount+$jelly[1];  
      }  
      else{  
      print "$previous\t$previousCount\n";  
      $previous=$jelly[0];$previousCount=$jelly[1];  
      }  
 }  
      #printing last line if it needed merging  
      if($previous=~/$jelly[0]/){  
      print "$previous\t$previousCount\n";  
      }  
 close FASTA1;  

No comments: