Experimental procedure

Next: Results Up: Experiments Previous: Datasets

Experimental procedure

Document recovery is a difficult problem, and results are sensitive to the length of the documents. In order to compare our methods more effectively, we created synthetic documents of varying lengths from each original document and tested on these synthetic documents. Each synthetic document is comprised of contiguous real sentences from one original document, starting at a random sentence in the original. If the original document had fewer than sentences, we took the entire document; for , this occurred 65 out of 100 times.

To evaluate the results of document recovery on each set of synthetic documents, we use BLEU 4 [Papineni
$\bgroupet al.\end{tex2html_bgroup}$
2001], comparing the recovered documents to the synthetic documents. All reported BLEU scores are averages within a set of synthetic documents for a single and domain. In all our experiments, we set $\alpha=0.4$ and $\lambda=1000$ . We use the small stopword list in [Manning and Schütze1999], which has 114 word types, 57 uppercase and 57 lowercase.

Nathanael Fillmore 2008-07-18