Results

Examples of a few recovered

documents are shown in Table 1. Tables 2 and 3 show the results of our experiments on different combinations of index types, heuristics, pruning strategies, domains, and document lengths, in terms of the BLEU scores. We make the following observations:

1. The results show two conditions under which we can recover documents with good success: (i) if the original document is short (Table 2, ``

'' row), or (ii) if the index is a bigram count vector (Table 2, ``bigram'' column). Long documents, given a unigram BOW, are much more difficult to recover. This makes intuitive sense: when the original document is short or the index preserves ordering constraints, the feasible set is small, which makes recovery easier.

2. Among unigram BOWs, the index type affects the recovery rate. It is easiest to recover documents from count BOWs, somewhat harder to recover from indicator BOWs, and hardest of all from stopwords-removed count BOWS (Table 2, ``counts, stopwords, indicator'' columns). The fact that we must infer the document length from the BOW contributes to the difficulty of the latter two index types; when we artificially substitute the true original document length for our estimated document length, recovery improves, especially for short documents where each word is relatively more important (Table 2, ``

'' vs. `` $\hat{n}$ '' columns).

3. The domains vary in difficulty of recovery. The medical and stock domains seem the easiest (rows in Table 2). This may be because they are both more similar to general Web text than, for example, Switchboard, and our language model is trained on Web text.

4. Finally, Table 3 shows that our choice of heuristic and pruning strategy affects recovery. The empirical heuristic performs consistently better than the admissible heuristic. As for pruning strategies,

is substantially the worst, but the other two strategies, $\lambda l+g+h$ and

, yield more or less equally good results. However, it is worth noting that A

search is much faster using $\lambda l+g+h$ than it is using

. For example, producing the third column of Table 3 took about an hour, while the second column took over a day.

Table: BLEU 4 scores for synthetic subsets of the all domains and for each index type. In all cases, heuristic $h^{\mathrm{emp}}$ and pruning strategy $\lambda l+g+h$ are used.

domain	counts	baseline	stopwords,	stopwords, $\hat n$	indicator,	indicator, $\hat n$	bigram
medical	0.5774	0.0821	0.3998	0.3427	0.5590	0.3971	0.9294
	CIA	0.3704	0.1338	0.2525	0.2178	0.3541	0.2584
	email	0.4845	0.1340	0.3639	0.2575	0.4242	0.3393
	stock	0.5329	0.1023	0.4279	0.4282	0.4099	0.3329
	SW	0.4793	0.1280	0.2866	0.2314	0.4341	0.3534
medical	0.4389	0.0972	0.2957	0.2521	0.3683	0.2885	0.7336
	CIA	0.3348	0.0867	0.2334	0.2148	0.2689	0.2187
	email	0.4366	0.1949	0.3595	0.3304	0.3634	0.3216
	stock	0.4768	0.1125	0.3522	0.3240	0.3671	0.3313
	SW	0.3882	0.0700	0.2163	0.1923	0.2900	0.2751
medical	0.3363	0.0876	0.2413	0.2254	0.1997	0.2001	0.6902
	CIA	0.2381	0.0630	0.1803	0.1845	0.1631	0.1644
	email	0.2822	0.0934	0.1894	0.1742	0.1982	0.2021
	stock	0.3570	0.0787	0.2826	0.2624	0.2002	0.1896
	SW	0.1904	0.0815	0.1293	0.1135	0.0983	0.0965
medical	0.2852	0.0835	0.2216	0.2038	0.1514	0.1543	0.6739
	CIA	0.1835	0.0682	0.1570	0.1427	0.1048	0.1050
	email	0.2239	0.1104	0.1738	0.1621	0.1501	0.1513
	stock	0.2713	0.0891	0.2190	0.2129	0.1291	0.1192
	SW	0.1352	0.0617	0.1075	0.0997	0.0493	0.0449
medical	0.2823	0.0839	0.2200	0.2003	0.1531	0.1479	0.6687
	CIA	0.1737	0.0771	0.1515	0.1355	0.0866	0.0736
	email	0.1928	0.1005	0.1556	0.1458	0.1155	0.1181
	stock	0.2533	0.0878	0.2015	0.2090	0.1024	0.1148
	SW	0.1239	0.0602	0.0989	0.0937	0.0285	0.0189

Table: BLEU 4 scores for synthetic subsets of the medical domain, with count BOW, for each heuristic and pruning strategy presented in Section 4. These synthetic subsets are a different random sample than those in Table 2.

	Admissible			Empirical
			$\lambda l+g+h$			$\lambda l+g+h$
1	0.3302	0.5714	0.5971	0.4120	0.5975	0.5936
2	0.2149	0.4267	0.3861	0.2708	0.4673	0.4804
4	0.2099	0.2575	0.2264	0.2099	0.3586	0.3369
16	0.1510	0.1934	0.1858	0.1847	0.2729	0.2970