SAM is a new file format for representing genomic sequence alignments. There are apis available for Java, Perl, and Python. But none for Ruby as far as I know, which is a shame when you want to manipulate SAM files in a rakefile. But – there is a little hackette we can pull. First, install RJB (the Ruby Java Bridge). Then we can do interesting things with the Picard Java API:
This routine will load a bam file (i.e. a binary SAM file), excise the sequences overlapping the regions listed in file “regions” (as “chr start end”), and print them out in fastq format.
require 'rjb'
#Setup JVM:
Rjb::load(classpath = '.:./sam-jdk-1.0.3.jar', jvmargs=[])
bam = "alignment.bam"
bai = bam+".bai"
#Some ugly bridging code:
file = Rjb::import('java.io.File')
samfilereader = Rjb::import('net.sf.samtools.SAMFileReader')
#most BAM files I've seen have errors, so let's be lenient:
samfilereader.setDefaultValidationStringency(
Rjb::import('net.sf.samtools.SAMFileReader$ValidationStringency').LENIENT
)
#Instantiate the java object:
sam=samfilereader.new_with_sig(
'Ljava.io.File;Ljava.io.File;',file.new(bam),file.new(bai)
)
#From here on it's plain sailing
File.open("regions").each_line do |line|
l = line.split
#we can use ruby ints and strings in java methods:
overlapping = sam.queryOverlapping(l[0],l[1].to_i,l[2].to_i)
while (overlapping.hasNext)
r = overlapping.next
#pull out what we need from the SAMRecord object:
puts "@"+r.getReadName
puts r.getReadString
puts "+"
puts r.getBaseQualityString
end
overlapping.close
end
I was pleasantly suprised that once the SAMFileReader object is created, using the API is quite nice! Hope this is useful to someone using SAM in Ruby.