next up previous
Next: Conclusion and Discussions Up: Experiments Previous: Experimental procedure

Results

Examples of a few recovered $ N=1$ documents are shown in Table 1. Tables 2 and 3 show the results of our experiments on different combinations of index types, heuristics, pruning strategies, domains, and document lengths, in terms of the BLEU scores. We make the following observations:

1. The results show two conditions under which we can recover documents with good success: (i) if the original document is short (Table 2, ``$ N=1$'' row), or (ii) if the index is a bigram count vector (Table 2, ``bigram'' column). Long documents, given a unigram BOW, are much more difficult to recover. This makes intuitive sense: when the original document is short or the index preserves ordering constraints, the feasible set is small, which makes recovery easier.

2. Among unigram BOWs, the index type affects the recovery rate. It is easiest to recover documents from count BOWs, somewhat harder to recover from indicator BOWs, and hardest of all from stopwords-removed count BOWS (Table 2, ``counts, stopwords, indicator'' columns). The fact that we must infer the document length from the BOW contributes to the difficulty of the latter two index types; when we artificially substitute the true original document length for our estimated document length, recovery improves, especially for short documents where each word is relatively more important (Table 2, ``$ n$'' vs. ``$ \hat{n}$'' columns).

3. The domains vary in difficulty of recovery. The medical and stock domains seem the easiest (rows in Table 2). This may be because they are both more similar to general Web text than, for example, Switchboard, and our language model is trained on Web text.

4. Finally, Table 3 shows that our choice of heuristic and pruning strategy affects recovery. The empirical heuristic performs consistently better than the admissible heuristic. As for pruning strategies, $ g+h$ is substantially the worst, but the other two strategies, $ \lambda l+g+h$ and $ (g+h)/l$, yield more or less equally good results. However, it is worth noting that A$ ^*$ search is much faster using $ \lambda l+g+h$ than it is using $ (g+h)/l$. For example, producing the third column of Table 3 took about an hour, while the second column took over a day.


Table: BLEU 4 scores for synthetic subsets of the all domains and for each index type. In all cases, heuristic $ h^{\mathrm{emp}}$ and pruning strategy $ \lambda l+g+h$ are used.
$ N$ domain counts baseline stopwords, $ n$ stopwords, $ \hat n$ indicator, $ n$ indicator, $ \hat n$ bigram
medical 0.5774 0.0821 0.3998 0.3427 0.5590 0.3971 0.9294
  CIA 0.3704 0.1338 0.2525 0.2178 0.3541 0.2584
  email 0.4845 0.1340 0.3639 0.2575 0.4242 0.3393
  stock 0.5329 0.1023 0.4279 0.4282 0.4099 0.3329
  SW 0.4793 0.1280 0.2866 0.2314 0.4341 0.3534
  medical 0.4389 0.0972 0.2957 0.2521 0.3683 0.2885 0.7336
  CIA 0.3348 0.0867 0.2334 0.2148 0.2689 0.2187
  email 0.4366 0.1949 0.3595 0.3304 0.3634 0.3216
  stock 0.4768 0.1125 0.3522 0.3240 0.3671 0.3313
  SW 0.3882 0.0700 0.2163 0.1923 0.2900 0.2751
  medical 0.3363 0.0876 0.2413 0.2254 0.1997 0.2001 0.6902
  CIA 0.2381 0.0630 0.1803 0.1845 0.1631 0.1644
  email 0.2822 0.0934 0.1894 0.1742 0.1982 0.2021
  stock 0.3570 0.0787 0.2826 0.2624 0.2002 0.1896
  SW 0.1904 0.0815 0.1293 0.1135 0.0983 0.0965
  medical 0.2852 0.0835 0.2216 0.2038 0.1514 0.1543 0.6739
  CIA 0.1835 0.0682 0.1570 0.1427 0.1048 0.1050
  email 0.2239 0.1104 0.1738 0.1621 0.1501 0.1513
  stock 0.2713 0.0891 0.2190 0.2129 0.1291 0.1192
  SW 0.1352 0.0617 0.1075 0.0997 0.0493 0.0449
  medical 0.2823 0.0839 0.2200 0.2003 0.1531 0.1479 0.6687
  CIA 0.1737 0.0771 0.1515 0.1355 0.0866 0.0736
  email 0.1928 0.1005 0.1556 0.1458 0.1155 0.1181
  stock 0.2533 0.0878 0.2015 0.2090 0.1024 0.1148
  SW 0.1239 0.0602 0.0989 0.0937 0.0285 0.0189



Table: BLEU 4 scores for synthetic subsets of the medical domain, with count BOW, for each heuristic and pruning strategy presented in Section 4. These synthetic subsets are a different random sample than those in Table 2.
$ N$ Admissible $ h$ Empirical $ h$
  $ g+h$ $ (g+h)/l$ $ \lambda l+g+h$ $ g+h$ $ (g+h)/l$ $ \lambda l+g+h$
1 0.3302 0.5714 0.5971 0.4120 0.5975 0.5936
2 0.2149 0.4267 0.3861 0.2708 0.4673 0.4804
4 0.2099 0.2575 0.2264 0.2099 0.3586 0.3369
16 0.1510 0.1934 0.1858 0.1847 0.2729 0.2970



next up previous
Next: Conclusion and Discussions Up: Experiments Previous: Experimental procedure
Nathanael Fillmore 2008-07-18