next up previous
Next: Recovery from stopwords-removed count Up: Recovery from count BOWs Previous: Pruning strategies

Baseline

We compare our A$ ^*$ search procedure with a greedy baseline. The baseline constructs a document by repeatedly drawing a word, without replacement, from a count BOW and appending it to the partial document. At each iteration, the most likely word given the current partial document is drawn. Formally, each word $ w_i$ in the recovered document $ \mathbf{d}$ is drawn such that $ w_i = \mathrm{arg} \max _{v \in \mathbf{x}\backslash w_1^{i-1}} \log p(v\vert w_{i-4}^{i-1})$, where $ w_1^{i-1}$ is the current partial document.



Nathanael Fillmore 2008-07-18