Let be a document with words. We are interested in four types of indices:
1. Count BOW. The index is a vector , where is the number of times word occurs in and is the size of vocabulary .
2. Stopwords-removed count BOW. The index , where is 0 if , and the number of times occurs in otherwise.
3. Indicator BOW. The index , where is if occurs in , and 0 otherwise.
4. Bigram count vector. The index , where is the number of times the ordered pair occurs in .
For any index , the feasible set is defined as the set of documents that are consistent with , that is, the set of documents each having index : . The document recovery problem is to find the best document in the feasible set given an index: for some score function. For large , may be impractical to compute exactly. In what follows, we define the feasible sets, score functions, and methods of approximating for each index type.