Examples of a few recovered documents are shown in Table 1. Tables 2 and 3 show the results of our experiments on different combinations of index types, heuristics, pruning strategies, domains, and document lengths, in terms of the BLEU scores. We make the following observations:
1. The results show two conditions under which we can recover documents with good success: (i) if the original document is short (Table 2, ``'' row), or (ii) if the index is a bigram count vector (Table 2, ``bigram'' column). Long documents, given a unigram BOW, are much more difficult to recover. This makes intuitive sense: when the original document is short or the index preserves ordering constraints, the feasible set is small, which makes recovery easier.
2. Among unigram BOWs, the index type affects the recovery rate. It is easiest to recover documents from count BOWs, somewhat harder to recover from indicator BOWs, and hardest of all from stopwords-removed count BOWS (Table 2, ``counts, stopwords, indicator'' columns). The fact that we must infer the document length from the BOW contributes to the difficulty of the latter two index types; when we artificially substitute the true original document length for our estimated document length, recovery improves, especially for short documents where each word is relatively more important (Table 2, ``'' vs. ``'' columns).
3. The domains vary in difficulty of recovery. The medical and stock domains seem the easiest (rows in Table 2). This may be because they are both more similar to general Web text than, for example, Switchboard, and our language model is trained on Web text.
4. Finally, Table 3 shows that our choice of heuristic and pruning strategy affects recovery. The empirical heuristic performs consistently better than the admissible heuristic. As for pruning strategies, is substantially the worst, but the other two strategies, and , yield more or less equally good results. However, it is worth noting that A search is much faster using than it is using . For example, producing the third column of Table 3 took about an hour, while the second column took over a day.