Columns of assembly statistics

assembly.N25 the number L s.t. 25% of all bases in the assembly are in a contig of length less than L.
assembly.N50 the number L s.t. 50% of all bases in the assembly are in a contig of length less than L.
assembly.N75 the number L s.t. 75% of all bases in the assembly are in a contig of length less than L.
assembly.longest the length of the longest contig in the assembly.
assembly.mean the mean contig length in the assembly.
assembly.median the median contig length in the assembly.
assembly.shortest the length of the shortest contig in the assembly.
assembly.num.contigs the number of contigs in the assembly.

Columns relating oracleset to assembly, based on ``matches''

Let A and B be assemblies or oraclesets. Let s be a contig in A and t be a contig in B. We define "s matches t" to mean that:

These percentages are measured with respect to the length of t.

We define "num B in A" as follows. Let

C = \{(s, t)\in A\times B : s matches t\}.

Enumerate C according to the percent identity of s with t, and secondarily according to the negative of the percent insdel of s with respect to t. For each pair (s,t), if either s or t has already been seen earlier in the enumeration, throw out (s,t). Let "num B in A" be the cardinality of the remaining set.

The above definitions follow the Trinity paper.

We define "s matches t without checking insdel" to mean that there exists a contig t in B such that:

This percentage is measured with respect to the length of t.

We define "num B in A without checking insdel" as above, but replacing "s matches t" with "s matches t without checking insdel".

num.oracleset.in.assembly num oracleset in assembly
frac.oracleset.in.assembly (num oracleset in assembly)/|oracleset|
num.assembly.in.oracleset num assembly in oracleset
frac.assembly.in.oracleset (num assembly in oracleset)/|assembly|
num.oracleset.in.assembly.without.check.insdel num oracleset in assembly without checking insdel
frac.oracleset.in.assembly.without.check.insdel (num oracleset in assembly without checking insdel)/|oracleset|
num.assembly.in.oracleset.without.check.insdel num assembly in oracleset without checking insdel
frac.assembly.in.oracleset.without.check.insdel (num assembly in oracleset without checking insdel)/|assembly|

Columns relating oracleset to assembly, based on ``allmatches''

Let "s matches t" be as defined in the previous section. Let

C = \{t \in B : there exists s \in A such that s matches t\}.

We define "num B in A via allmatches" to be the cardinality of C.

Let "s matches t without checking insdel" be as defined in the previous section. We define "num B in A via allmatches without checking insdel" as above (for "num B in A via allmatches"), but replacing "s matches t" with "s matches t without checking insdel".

allmatches.num.oracleset.in.assembly num oracleset in assembly via allmatches
allmatches.frac.oracleset.in.assembly (num oracleset in assembly via allmatches)/|oracleset|
allmatches.num.assembly.in.oracleset num assembly in oracleset via allmatches
allmatches.frac.assembly.in.oracleset (num assembly in oracleset via allmatches)/|assembly|
allmatches.num.oracleset.in.assembly.without.check.insdel num oracleset in assembly via allmatches without checking insdel
allmatches.frac.oracleset.in.assembly.without.check.insdel (num oracleset in assembly via allmatches without checking insdel)/|oracleset|
allmatches.num.assembly.in.oracleset.without.check.insdel num assembly in oracleset via allmatches without checking insdel
allmatches.frac.assembly.in.oracleset.without.check.insdel (num assembly in oracleset via allmatches without checking insdel)/|assembly|

RSEM approx columns

These columns come from expression.approx in RSEM's output.
rsem.approx.approx I believe that this is the log model evidence, log P(D), computed using a convex approximation.
rsem.approx.bic Log model evidence, log P(D), computed using BIC.
rsem.approx.loglikelihood I believe that this is the log likelihood, log P(D|\theta), computed at the MAP \theta.
rsem.approx.loglikelihood.penalty This is the BIC penalty.
It should be the case that rsem.approx.bic = rsem.approx.loglikelihood - rsem.approx.loglikelihood.penalty.

RSEM eval columns

These columns come from expression.eval in RSEM's output.
rsem.eval.lognumer.minus.logdenom I believe that this is \log P(D|\theta') + \log P(\theta') - \log P(\theta'|D), where \theta' is a posterior mean estimate
rsem.eval.logprior I believe that this is \log P(\theta')
rsem.eval.loglikelihood I believe that this is \log P(D|\theta')
rsem.eval.logdenom I believe that this is \log P(\theta'|D)

RSEM prior columns

These columns come from expression.prior in RSEM's output.
rsem.prior.log.prob.M This is \log P(M).
rsem.prior.log.prob.L.given.M This is \log P(L|M).
rsem.prior.log.prob.Sequences.given.L.and.M This is \log P(Sequences|L, M).
rsem.prior.log.prob.A This is \log P(A) = \log P(M) + \log P(L|M) + \log P(Sequences|L, M).
rsem.eval.loglikelihood.plus.rsem.prior.log.prob.A This is \log P(A) + rsem.eval.loglikelihood
rsem.approx.loglikelihood.plus.rsem.prior.log.prob.A This is \log P(A) + rsem.approx.loglikelihood
rsem.approx.approx.plus.rsem.prior.log.prob.A This is \log P(A) + rsem.approx.approx
rsem.approx.bic.plus.rsem.prior.log.prob.A This is \log P(A) + rsem.approx.bic

RSEM ss columns

These columns come from expression.ss in RSEM's output.
rsem.ss.mean.num.reads.per.transcript Mean number of reads aligning to each transcript (based on countvs).
rsem.ss.median.num.reads.per.transcript Median number of reads aligning to each transcript (based on countvs).
rsem.ss.num.transcripts.with.zero.reads Number of transcripts with no reads aligning to them (based on countvs).
rsem.ss.num.matching.bases Number of matching bases, based on the qpro profiles.
rsem.ss.num.mismatching.bases Number of mismatching bases, based on the qpro profiles.