For ws3, we have $ zcat ws3.expression.sam.gz |grep scaffold_25_contig_4 @SQ SN:scaffold_25_contig_4 LN:186 i.e. no reads align to this scaffold according to rsem-compute-expression. If we just look for lines containing scaffold_25, we find @SQ SN:scaffold_25_contig_0 LN:76 @SQ SN:scaffold_25_contig_1 LN:187 @SQ SN:scaffold_25_contig_10 LN:76 @SQ SN:scaffold_25_contig_2 LN:201 @SQ SN:scaffold_25_contig_3 LN:76 @SQ SN:scaffold_25_contig_4 LN:186 @SQ SN:scaffold_25_contig_5 LN:86 @SQ SN:scaffold_25_contig_6 LN:220 @SQ SN:scaffold_25_contig_7 LN:121 @SQ SN:scaffold_25_contig_8 LN:325 @SQ SN:scaffold_25_contig_9 LN:97 SRR203276.5781444 163 scaffold_25_contig_6 1 255 76M = 145 220 GGACTCTTCATTTTTAATTGTTATTCAATGTTTTATTTTCCAGAACTCATTTTTCACATTCAGGGGTGGGAGCTGA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBBBBCCCCCCCCCCCCCCCACCCCCACCCCCC@@;CCCC< XA:i:0 MD:Z:76 NM:i:0 SRR203276.5781444 83 scaffold_25_contig_6 145 255 76M = 1 -220 CGGGGCAATAGCTCCTGAGCAGACTTTCAGAAGGGAGAGGTCTTCTCAGAAGCCTGTTGGACCTGGCTTGCAGAGG BBDBB@BDBDBCDDACCDDCDCDCCDDCCCCCCCCCCCCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC XA:i:0 MD:Z:76 NM:i:0 SRR203276.6596098 163 scaffold_25_contig_8 1 255 76M = 250 325 GTGGGCTGAAAACAGTTTCTTCAGGGGATTACTAAACTCAAGAATTATTAACCTTTCTACTTTTTGAAACAATGAT CCCCCCCCCCCCCCCBCCCCCCCCCCCC?CCCCC@CDCCDBCADCCACCCCCDBCDCCACDCCCCCBBCCA9CCB@ XA:i:0 MD:Z:76 NM:i:0 SRR203276.6596098 83 scaffold_25_contig_8 250 255 76M = 1 -325 GAAAGACAACTGTTTATATAGACCCCATTGGAAAAATCCCAGTTCTGTACTGAGATCAGAGACCCCAAACTCTTAC #B9C@@AB@CCCABDCCAACBBCABCC@?DDCCCCABCCCCBCCCCCCCDCCCCCCCCCCCADCCCCCCCCCCCCC XA:i:0 MD:Z:76 NM:i:0 SRR203276.50638906 163 scaffold_25_contig_6 1 255 76M = 145 220 GGACTCTTCATTTTTAATTGTTATTCAATGTTTTATTTTCCAGAACTCATTTTTCACATTCAGGGGTGGGAGCTGA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBBBBCCCCCCCCCCCCCCCCCCCCC?CCCCBC@@>CDCC@ XA:i:0 MD:Z:76 NM:i:0 SRR203276.50638906 83 scaffold_25_contig_6 145 255 76M = 1 -220 CGGGGCAATAGCTCCTGAGCAGACTTTCAGAAGGGAGAGGTCTTCTCAGAAGCCTGTTGGACCTGGCTTGCAGAGA @DDBCADDDDBCDCDCCBCCDCBCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC XA:i:1 MD:Z:75G0 NM:i:1 i.e. only contigs 6 and 8 are said to be aligned. If we run $ bowtie -l 13 -q --phred33-quals -n 3 -e 99999999 -I 1 -X 1000 -p 8 -a -m 200 -S /tier2/deweylab/nathanae/rnaseq-eval/data-jul26//trinity-gridsearch/rsem_oracleset-summary.ws3.rsem-eval-workingdir/reference -1 /tier2/deweylab/nathanae/rnaseq-eval/data-jul26/sample_1.fq -2 /tier2/deweylab/nathanae/rnaseq-eval/data-jul26/sample_2.fq|grep scaffold_25_contig_4>>differences2.txt (the same bowtie command that rsem-compute-expression runs, except with -l 13 and -n 3 instead of -l 25 and -n 2) we find the matches @SQ SN:scaffold_25_contig_4 LN:186 SRR203276.32744748 99 scaffold_25_contig_4 61 255 76M = 108 123 AGGGGAAACTGTGGCGCCCAGGGGTTGAACTGACCTACCCTCAAAAATAGAGACTTTTGTAGGAGGCTGACCTTAG ############################################################################ XA:i:3 MD:Z:1T1A7A1C2T0G0G0C0C0T0T0C0G0C0T1C0T0A0T0T0A0T0G2A0A0A0A1T1T0C0A0T0C0C0A0C0A0A0G0A0C1G0C2G0C0A0A0G0C0G1T0A0C0G0T0 NM:i:55 SRR203276.32744748 147 scaffold_25_contig_4 108 255 76M = 61 -123 TTCCCCCAACTGAACCCACGGTCAACCCGACTTTAAAAAACCAAGGCCCCAAACAAGTTGGCGGGATACACCCGAT ############################################################################ XA:i:3 MD:Z:0A3A1A1G0A0C1G0C0G0G0G0C0A0A1C0G0C0T0A1G0T1T0A0C0C0G0C0T0T0T0G0T0G0T0G0C1A2T0G0C1G1G0C0C2C0T11C0T0G0 NM:i:49 Note that these two matches involve the two pairs of the single read SRR203276.32744748. The first match comes from sample_1.fq and the second match comes from sample_2.fq (reverse complemented). >scaffold_25_contig_4 scaffold_25 tid=9388 pos=672 length=2123 number_of_contigs=12 CTGACCCAGATGAGGTGGCCAGGAGATGGGGAAAGAGGAAAAACAAACCTAAGATGAATTATGAGAAACTGAGCCGTGGCCTTCGCTACTATTATGACAAAAATATCATCCACAAGACGGCGGGCAAGCGCTACGTATACCGCTTTGTGTGCGACCTGCAGAGCCTGCTGGGATACACCCCTGAAG which has reverse complement CTTCAGGGGTGTATCCCAGCAGGCTCTGCAGGTCGCACACAAAGCGGTATACGTAGCGCTTGCCCGCCGTCTTGTGGATGATATTTTTGTCATAATAGTAGCGAAGGCCACGGCTCAGTTTCTCATAATTCATCTTAGGTTTGTTTTTCCTCTTTCCCCATCTCCTGGCCACCTCATCTGGGTCAG The paired reads SRR203276.32744748 are in some sense the closest pair of reads to scaffold_25_contig_4, so one would think that they should have a good alignment to this scaffold. (This is because if the scaffold was included by collectContigs one would think that at least one read should really cover the scaffold well.) However, if we align the reads to this contig or its reverse complement using water, we find that the alignment is not very good at all. E.g. try water -asequence blah.a.fa -bsequence blah.b.fa -gapopen 10 -gapextend 0.5 -outfile /dev/stdout |less or water -asequence blah.a.fa -bsequence blah.b.fa -gapopen 100 -gapextend 0.5 -outfile /dev/stdout |less For the left read (from sample_1.fq) it is better in terms of score to align to the un-reverse complemented contig. With -gapopen 10 one gets SRR203276.327 10 TGTGGCGCCCAGGGGTTG-------------AACTGACCTACCCTCAAAA 46 ||.||.|.|||||.|.|| |||..|||||...|.||.. scaffold_25_c 11 TGAGGTGGCCAGGAGATGGGGAAAGAGGAAAAACAAACCTAAGATGAATT 60 SRR203276.327 47 AT-AGAGACTTTTGTAGGAGGC-TGACCTTAG 76 || |||.||| |||.| ||.||||.| scaffold_25_c 61 ATGAGAAACT-------GAGCCGTGGCCTTCG 85 and with -gapopen 100 one gets SRR203276.327 10 TGTGGCGCCCAGGGGTTG 27 ||.||.|.|||||.|.|| scaffold_25_c 11 TGAGGTGGCCAGGAGATG 28 It is similar for the right read. ---- We can also look at the BAM file produced by RSEM's expression computation of each true isoform wrt the reads. This is the BAM file that extract alignment information from needed by collectContigs. We do this using samtools view ../expression.rsem-expr-workingdir/expression.sorted.bam|less Note that the read SRR203276.32744748 doesn't appear anywhere in this output. How could this happen? In a sense, it is not that surprising, because the alignment of SRR203276.32744748 to scaffold_25_contig_4 is really bad.