Columns of assembly statistics

assembly.N25	the number L s.t. 25% of all bases in the assembly are in a contig of length less than L.
assembly.N50	the number L s.t. 50% of all bases in the assembly are in a contig of length less than L.
assembly.N75	the number L s.t. 75% of all bases in the assembly are in a contig of length less than L.
assembly.longest	the length of the longest contig in the assembly.
assembly.mean	the mean contig length in the assembly.
assembly.median	the median contig length in the assembly.
assembly.shortest	the length of the shortest contig in the assembly.
assembly.num.contigs	the number of contigs in the assembly.

Columns relating oracleset to assembly, based on ``matches''

Let A and B be assemblies or oraclesets. Let s be a contig in A and t be a contig in B. We define "s matches t" to mean that:

s has at most 5% insertions or deletions with respect to t.
s has at least 95% sequence identity with t.

These percentages are measured with respect to the length of t.

We define "num B in A" as follows. Let

C = \{(s, t)\in A\times B : s matches t\}.

Enumerate C according to the percent identity of s with t, and secondarily according to the negative of the percent insdel of s with respect to t. For each pair (s,t), if either s or t has already been seen earlier in the enumeration, throw out (s,t). Let "num B in A" be the cardinality of the remaining set.

The above definitions follow the Trinity paper.

We define "s matches t without checking insdel" to mean that there exists a contig t in B such that:

s has at least 95% sequence identity with t.

This percentage is measured with respect to the length of t.

We define "num B in A without checking insdel" as above, but replacing "s matches t" with "s matches t without checking insdel".

num.oracleset.in.assembly num oracleset in assembly

frac.oracleset.in.assembly (num oracleset in assembly)/|oracleset|

num.assembly.in.oracleset num assembly in oracleset

frac.assembly.in.oracleset (num assembly in oracleset)/|assembly|

num.oracleset.in.assembly.without.check.insdel num oracleset in assembly without checking insdel

frac.oracleset.in.assembly.without.check.insdel (num oracleset in assembly without checking insdel)/|oracleset|

num.assembly.in.oracleset.without.check.insdel num assembly in oracleset without checking insdel

frac.assembly.in.oracleset.without.check.insdel (num assembly in oracleset without checking insdel)/|assembly|

num.oracleset.in.assembly	num oracleset in assembly
frac.oracleset.in.assembly	(num oracleset in assembly)/\|oracleset\|
num.assembly.in.oracleset	num assembly in oracleset
frac.assembly.in.oracleset	(num assembly in oracleset)/\|assembly\|
num.oracleset.in.assembly.without.check.insdel	num oracleset in assembly without checking insdel
frac.oracleset.in.assembly.without.check.insdel	(num oracleset in assembly without checking insdel)/\|oracleset\|
num.assembly.in.oracleset.without.check.insdel	num assembly in oracleset without checking insdel
frac.assembly.in.oracleset.without.check.insdel	(num assembly in oracleset without checking insdel)/\|assembly\|

Columns relating oracleset to assembly, based on ``allmatches''

Let "s matches t" be as defined in the previous section. Let

C = \{t \in B : there exists s \in A such that s matches t\}.

We define "num B in A via allmatches" to be the cardinality of C.

Let "s matches t without checking insdel" be as defined in the previous section. We define "num B in A via allmatches without checking insdel" as above (for "num B in A via allmatches"), but replacing "s matches t" with "s matches t without checking insdel".

allmatches.num.oracleset.in.assembly num oracleset in assembly via allmatches

allmatches.frac.oracleset.in.assembly (num oracleset in assembly via allmatches)/|oracleset|

allmatches.num.assembly.in.oracleset num assembly in oracleset via allmatches

allmatches.frac.assembly.in.oracleset (num assembly in oracleset via allmatches)/|assembly|

allmatches.num.oracleset.in.assembly.without.check.insdel num oracleset in assembly via allmatches without checking insdel

allmatches.frac.oracleset.in.assembly.without.check.insdel (num oracleset in assembly via allmatches without checking insdel)/|oracleset|

allmatches.num.assembly.in.oracleset.without.check.insdel num assembly in oracleset via allmatches without checking insdel

allmatches.frac.assembly.in.oracleset.without.check.insdel (num assembly in oracleset via allmatches without checking insdel)/|assembly|

allmatches.num.oracleset.in.assembly	num oracleset in assembly via allmatches
allmatches.frac.oracleset.in.assembly	(num oracleset in assembly via allmatches)/\|oracleset\|
allmatches.num.assembly.in.oracleset	num assembly in oracleset via allmatches
allmatches.frac.assembly.in.oracleset	(num assembly in oracleset via allmatches)/\|assembly\|
allmatches.num.oracleset.in.assembly.without.check.insdel	num oracleset in assembly via allmatches without checking insdel
allmatches.frac.oracleset.in.assembly.without.check.insdel	(num oracleset in assembly via allmatches without checking insdel)/\|oracleset\|
allmatches.num.assembly.in.oracleset.without.check.insdel	num assembly in oracleset via allmatches without checking insdel
allmatches.frac.assembly.in.oracleset.without.check.insdel	(num assembly in oracleset via allmatches without checking insdel)/\|assembly\|

RSEM approx columns

These columns come from expression.approx in RSEM's output.

rsem.approx.approx	I believe that this is the log model evidence, log P(D), computed using a convex approximation.
rsem.approx.bic	Log model evidence, log P(D), computed using BIC.
rsem.approx.loglikelihood	I believe that this is the log likelihood, log P(D\|\theta), computed at the MAP \theta.
rsem.approx.loglikelihood.penalty	This is the BIC penalty.

It should be the case that rsem.approx.bic = rsem.approx.loglikelihood - rsem.approx.loglikelihood.penalty.

RSEM eval columns

These columns come from expression.eval in RSEM's output.

rsem.eval.lognumer.minus.logdenom	I believe that this is \log P(D\|\theta') + \log P(\theta') - \log P(\theta'\|D), where \theta' is a posterior mean estimate
rsem.eval.logprior	I believe that this is \log P(\theta')
rsem.eval.loglikelihood	I believe that this is \log P(D\|\theta')
rsem.eval.logdenom	I believe that this is \log P(\theta'\|D)

RSEM prior columns

These columns come from expression.prior in RSEM's output.

rsem.prior.log.prob.M	This is \log P(M).
rsem.prior.log.prob.L.given.M	This is \log P(L\|M).
rsem.prior.log.prob.Sequences.given.L.and.M	This is \log P(Sequences\|L, M).
rsem.prior.log.prob.A	This is \log P(A) = \log P(M) + \log P(L\|M) + \log P(Sequences\|L, M).
rsem.eval.loglikelihood.plus.rsem.prior.log.prob.A	This is \log P(A) + rsem.eval.loglikelihood
rsem.approx.loglikelihood.plus.rsem.prior.log.prob.A	This is \log P(A) + rsem.approx.loglikelihood
rsem.approx.approx.plus.rsem.prior.log.prob.A	This is \log P(A) + rsem.approx.approx
rsem.approx.bic.plus.rsem.prior.log.prob.A	This is \log P(A) + rsem.approx.bic

RSEM ss columns

These columns come from expression.ss in RSEM's output.

rsem.ss.mean.num.reads.per.transcript	Mean number of reads aligning to each transcript (based on countvs).
rsem.ss.median.num.reads.per.transcript	Median number of reads aligning to each transcript (based on countvs).
rsem.ss.num.transcripts.with.zero.reads	Number of transcripts with no reads aligning to them (based on countvs).
rsem.ss.num.matching.bases	Number of matching bases, based on the qpro profiles.
rsem.ss.num.mismatching.bases	Number of mismatching bases, based on the qpro profiles.