Modeling User Preferences via Theory Refinement


Ben Geisler*, Vu Ha*


*Decision Systems and Artificial Intelligence Lab

Dept. of EE&CS

University of Wisconsin-Milwaukee

{bgeisler, vu}



Peter Haddawy=*


=Computer Science & Information Management

School of Advanced Technologies

Asian Institute of Technology

Bangkok, Thailand



The task of preference elicitation is characterized by the conflicting goals of eliciting a model as accurately as possible and asking as few questions of the user as possible.  To reduce the complexity of elicitation, traditional approaches involve making independence assumptions and then performing elicitation within the constraints of those assumptions.  Inaccurate assumptions can result in inaccurate elicitation.  Ideally, we would like to make assumptions that approximately apply to a large segment of the population and correct for inaccuracies in those assumptions when we encounter an individual to whom they do not apply.  We present an approach to encoding of soft assumptions using the KBANN technique.  We show how to encode assumptions concerning preferential independence and  monotonicity as Horn-clauses and how to encode these in a KBANN network.  We empirically compare the KBANN network with an unbiased ANN in terms of learning rate and accuracy for preference structures consistent and inconsistent with the domain theory. Finally, we demonstrate how we can learn a fine-grained preference structure from simple binary classification data.



User modeling, Product brokering, Neural networks, Decision theory



1. Introduction

Intelligent user interfaces increasingly require accurate representation of user interests and preferences.  Applications range from adaptable software and operating systems to Internet-based product brokering.  But accurately eliciting a preference model can be highly time consuming.  Thus we are faced with the conflicting goals of eliciting a preference model as accurately as possible and asking as few questions of the user as possible. 


Representation and elicitation of preferences have long been studied in Decision Theory. In Multi-Attribute Utility Theory (Keeyney  & Raiffa 1976) a set of items is typically described in terms of a set of attributes with an item being a complete assignment of values to all the attributes.  In domains containing no uncertainty, user preferences over a set of items can be represented by a value function that assigns a real number to each item, resulting in a total ordering over the items.  Given a set of items described in terms of a set of attributes {X1, …, Xn} a function v, which associates a real number v(x) with each point xÎ X, is said to be a value function representing the decision maker’s preferences provided that:

                (x1’,x2’,..,xn’) ~ (x1’’,x2’’,…,xn’’) Û v(x1’,x2’,..,xn’) = v(x1’’,x2’’,…,xn’’)

and         (x1’,x2’,..,xn’)  p (x1’’,x2’’,…,xn’’)  Û v(x1’,x2’,..,xn’) < v(x1’’,x2’’,…,xn’’),


where x' ~ x'' means that the decision maker is indifferent between two outcomes x’ and x'' and x' p x'' means that the decision maker prefers outcome x'' to outcome x' .  When the number of items is large, eliciting a high-dimensional value function can be impractical, so practitioners attempt to identify assumptions that permit a value function to be represented in terms of a combination of lower-dimensional sub-value functions. The most common assumption is preferential independence.  Given a set of attributes {X1, …, Xn} a subset of attributes Y is preferentially independent of its complement set Z if and only if the conditional preferences in the y space given a fixed value z¢ for the attribute set Z does not depend on z¢.  The attributes X1, …, Xn are mutually preferentially independent if every subset Y is preferentially independent of its complementary set of attributes.  Given a set of attributes {X1, …, Xn}, n³3, an additive value function v(x1, …, xn) = l1v1(x1) + … + lnvn(xn) exists if and only if the attributes are mutually preferentially independent.  Elicitation of an additive value function typically proceeds by determining the sub-value functions vi and then the scaling constants li.  The scaling constants can be interpreted as representing the tradeoffs between levels of the different attributes, e.g. cost vs time.  In cases of continuous attributes, a user will often have monotonic preferences over the values of the attributes, making the sub-value function easy to define.  But determining the tradeoff coefficients can require significant introspection. 


As a running example, we will use the domain of airline flight selection.  Suppose we have a user who wishes to find a flight from Milwaukee to the San Francisco.  We might describe flights by three attributes: time (T), cost (C), and whether or not there is a layover involved (L).  Time and cost are continuous, while layover is binary.  We can assume that the user prefers shorter flights, cheaper flights, and flights without layovers.  Thus we need only assess the tradeoff coefficients that indicate how much the decision making is willing to trade off the level of one attribute for a more desirable level of another.  A resulting value function might be v(t,c,l) = 100t + c + 50l.


As we have outlined the traditional approach to preference elicitation, the decision analyst (or agent) makes a set of assumptions concerning a decision maker’s preferences and then performs the elicitation within the constraints of those assumptions.  If the assumptions are incorrect, the elicited model will be inaccurate.  While such assumptions are clearly useful, it would be helpful to be able to detect and correct errors in our assumptions.  In particular, when designing product brokering systems, we may wish to make assumptions that at least approximately apply to a large segment of the user population and then correct for inaccuracies in those assumptions when we encounter an individual to whom they do not apply.  The problem of starting with an approximately correct domain theory and then correcting and refining it in the light of data has been studied in the area of theory refinement.  In this paper we examine use of one technique for theory refinement called Knowledge-Based Artificial Neural Networks (KBANN) (Shavlik & Towell 1989).  We show how assumptions concerning independence and monotonicity of preferences can be encoded as a KBANN domain theory and how examples of user preferences can then be used to refine the theory. We empirically compare elicitation speed and accuracy of a KBANN network containing this domain theory with a neural network containing no domain theory.  We use Ha and Haddawy’s similarity measure on preference structures (Ha & Haddawy 1998) to define a measure of the degree to which a preference structure violates the domain theory.  We use this to evaluate how well the KBANN network performs as the preference structure from which the training examples are drawn increasingly violates the domain theory.  Our results show rather robust and somewhat surprising behavior.  Finally, we demonstrate how we can learn a fine-grained preference structure from simple binary classification data.




The KBANN technique (Shavlik & Towell 1989) permits a domain theory expressed as an acyclic set of propositional Horn-clauses to be encoded in a back-propagation neural network.  The propositions become nodes and the logical relations between them become links.  By training the network on data an incomplete domain theory can be supplemented and an inaccurate domain theory can be corrected.  Given an approximately correct domain theory, KBANN has been shown to be require fewer examples to correctly classify a test set than an unbiased network (Towel 90).    When using an unbiased network, the weights are initialized randomly, so that the network starts out in a randomly chosen point in function space.  In contrast, initializing the network with an approximately correct domain theory starts it at a point that is close to the target function. 


3. Representing Preferences

In deciding how to represent user preferenes, several alternatives present themselves: discrete numeric ratings, a simple binary classification into interesting/uninteresting, a value function, or pairwise preferences among items.   We will evaluate these from the standpoint of representational adequacy, ease of gathering training examples, and ease of expressing and encoding assumptions concerning preference.  Numeric ratings are the predominant representation used in building recommender systems.  Note that the simple binary classification is a degenerate case of a numeric rating scheme.  Use of numeric ratings requires training instances to consist of individual items with an associated rating.  But notice that numeric ratings have the disadvantage of fixed granularity.  Either we choose a coarse granularity and cannot make fine distinctions in preference or we choose a fine granularity and are faced with deciding exactly what rating to assign to an item.  For example, a rating scheme with only five values can easily have a the top rating filled with items among which we have strong distinctions of preference.  On the other hand, if we use, say, 100 rating values then we may very well end assigning ratings that have little meaning to us, e.g. a 47 vs 48.  The direct use of a value function to represent preference has this latter problem of deciding exactly what value to assign to an item as well. 

Training examples consisting of items with numeric scores would require a neural network architecture in which we have the attributes for one instance as the input nodes and a single output node that provides the score.  This would the dictate that the assumptions encoded in the form of a domain theory say something about the relation of the score to the item attributes.  Notice how difficult it would be to come up with principled statements of that form. 

The alternative is to represent a user’s preferences in terms of pairwise preferences among items.  Training instances consist of pairs of items and an indication of whether the first is preferred to the second.  So using pairwise preferences would require the network to have input nodes for the attributes of two items and a binary output node that indicated preference or no preference.  If two items are input to the network in one order and then the other and the output indicates no preference for both cases, then the user is considered to be indifferent between the two items.  Notice that with this representation the domain theory can simply be written to state under what conditions the user would prefer one item to another. 



4. Encoding the Assumptions and Network Architecture

We can conveniently encode assumptions concerning preferential independence and monotonicity of preference through statements of dominance.  Consider our flight domain.  It is plausible that the user prefer cheaper flights to more expensive flights, shorter flights to longer flights, and flights without layover to those with.  These assumptions can be encoded with the following set of Horn-clauses:

                C1<C2 Ù T1£T2 Ù L1£L2 ® F1fF2

                C1£C2 Ù T1<T2 Ù L1£L2 ® F1fF2

                        C1£C2 Ù T1£T2 Ù L1<L2 ® F1fF2,

Text Box: Figure 1 Structure of preference elicitation network.
Every node hn  in the hidden layer is connected to the Output Node, every compare node is connected to every hidden layer node, and every input node th, ci, lj, dk is connected to every compare node.  All connections are low random weights except for those encoding the domain theory.  Sigmoid activations are used on each node except for the compare layer which uses specialized activation based on the individual logical operator.
where Ci and Ti are the cost and time of flight i, respectively.  Li is 0 if no layover and 1 otherwise. F1fF2 indicates that flight 1 is preferred to flight 2.  Notice that each of these sentences indicates that the attribute over which it is expressing directionality of preference is also preferentially independent of the other attributes.


Text Box: Figure 2 Mean of learning curves over all three value functions.

To accommodate this domain theory we have include an extra hidden layer that contains the logic for our comparison operators.  This layer is coded for the following operators: <,>,<=,>=,=,¹.  The layer is located between the input layer and the initial hidden layer.  The tails of the Horn-clauses in the domain theory are represented in the hidden layer and the heads are represented by the output node.  Figure 1 shows the general network architecture.  While every node at each layer is connected to every node at the next with low randomly chosen weights, the Horn-clauses are encoded by setting some weights to high values.  The figure shows the links that encode the first Horn-clause in the domain theory above.  While it would be possible to compute the comparisons outside the network and include them as separate inputs, representing them as nodes in the network allows for the possibility that they can be modified during training.  For example, it might be the case that the user only has preference among two different costs when they differ by at least some minimal amount.  Training could modify the weights to capture this.


5. Empirical Evaluation











Text Box: Figure 4 Comparison of accuracy in training on examples generated from a binary categorization vs those generated randomly.  Dom w/bin and ANN w/ bin represent the networks trained on the examples generated from the binary categorization, for biased and unbiased networks, respectively.  Dominance and ANN represent the networks previously trained on individually obtained preferences.



















Text Box: Figure 3 Accuracy of training on examples randomly generated from preference structures of varying DOVI.  The graph shows results for training the KBANN network and the unbiased network (ANN) on 30 examples and 50 examples each.


We ran a series of experiments to evaluate how well the network biased by the assumptions performed against one with no bias, i.e. random initial weights.  We created two networks: one with no domain theory (ANN), and one with the domain theory above (dominance).  We represented the preferences of a user with an additive value function over the three attributes T, C, L.  We tested the three networks by randomly generating examples of pairwise preferences from three different value functions: v(t,c,l) = 100t + 1c + 50l, v(t,c,l) = 50t + 1c + 100l, and v(t,c,l) = 1t + 100c + 50l.  Each value function experiment was run 10 times, with early stopping used on each run.  The test size was kept constant at 50 items.  After each of the 10 runs for each of the TrainSize possibilities, we computed a mean for each value function.  Similar results were observed with all three value functions.  Figure 3 shows the mean of the experiments with the three different value functions.  The network trained with the dominance relation theory consistently outperforms the unbiased network.  It achieves approximately 84% accuracy with only 30 examples.  The unbiased network requires between 50 and 75 examples to reach 84% accuracy.  This is a significant difference since the examples are generated by directly querying a user.





Robustness Analysis

One of the main advantages of the KBANN technique is its robustness, specifically, its ability to correct for inaccuracies in the domain theory.  To evaluate how well this works in our domain, we examine the performance of the networks for a number of value functions that violate the independence assumptions to various degrees. It is expected that the more a value function violates the domain theory, the worse the performance. The best we can hope for is that this decrease does not occur in a precipitous manner.


The first issue we have to address in this experiment is to define a measure of domain theory violation. Since we are dealing with domain theory composed of independence assumptions, we will call this measure degree of violating the independences or DOVI. Given a domain theory D and a value function v, what would be a good measure of v violating D? Note that we can view D as a set of value functions that satisfy D. If we have a measure of distance d between two value functions, then we can define a violation measure as the distance between v and the member of D that is closest to v. (Imagine this approach as an analogy to the definition of a distance from a point to a body of points in Euclidean geometry.) To define the distance between two value functions (which represent two preference orders), we use the probabilistic distance, proposed by Ha and Haddawy (1998). According to this measure, the distance between two preference orders is determined by the probability that they disagree in their relative rankings of two randomly chosen outcomes. The reason we picked this measure is because it is both analytically and computationally tractable for a wide range of representations of preference orders.   


The second issue we have to address is to generate preference orders of varying degrees of domain theory violation. To this end, we start with some value functions consistent with the domain theory, and subsequently randomly permute the orders of k randomly picked outcomes, with different values of k. It is clear that when k is small, the violation is close to zero, and as we increase k, the perturbed value function is likely to increasingly violate the domain theory.


The last issue, which is mainly technical, is the computation of the minimum distance. Since there are infinitely many value functions that satisfy the domain theory D, we can only approximate the violation measure by sampling from D. We achieve this by sampling additive value functions with randomly generated coefficients. Our experiment shows that this method of sampling settles to a measure point after about 2000 iterations.


Figure 3 shows the results of our robustness analysis. Using 2000 iterations, we identify several preference orders with  DOVI varying from 0.04 to 0.5. Notice that the performance of the biased network remains essentially the same (85% and 92% accuracy for KBANN-30 and KBANN-50, respectively) even for a preference order whose DOVI is 0.19. Only when the DOVI increases to 0.36 does the accuracy drop to a (still-respectable) 76% and 78% respectively.  When the DOVI goes up to 0.5 (which is about as high as it can go, due to the fact that value functions consistent with the domain theory intuitively “cover” the space of preference orders very well), the accuracy of the biased networks becomes roughly the same as that of the unbiased one. The performance of the unbiased network is dominated in all cases, and not surprisingly stays almost constant over different DOVI’s.



6. Reducing the User Input

The best domain theory requires about 30 examples to reach 84% accuracy, which requires the user to provide pairwise comparisons among 60 flights.  This is a large number of flights for a user to evaluate.  One way to easily obtain a large number of examples of pairwise preferences is to have the user indicate preference between classes of items rather than just individual items.  If we have n items in each class, we then have n2 pairwise preferences implicitly represented.  But it is not clear whether examples generated in this way will have as much information content as those generated randomly since each item will appear in n examples.  So we empirically evaluated this approach on one value function.

We simulated a user’s binary categorization of flights into “like” and “dislike” groups.   We scored  flights according the first value function (1t+100c+50l) and selected the top one third of our flights as the "like" set and the bottom one third as the "dislike" set.  We then used these sets to generate examples of pairwise preferences.


The results are shown in Figure 3.  Both the biased and unbiased networks are able to achieve significantly higher accuracy using fewer flights than when using randomly generated examples.  Using only 14 flights, the KBANN network achieves an accuracy of 89%, while the unbiased ANN achieves an accuracy of 72%.  14 flights yields 49 training examples.  With that number of randomly generated examples, the KBANN network achieved an accuracy of 94%.  So generating the training examples from the binary categorization yields an accuracy about 5% worse, but the number of flights required is cut from 96 to 14 – a reduction of 85%.

Text Box: Figure 3 Comparison of number of example flights needed for 90% accuracy on testing examples vs. DOVI ratings.


We next evaluated how the technique of generating examples from a binary categorization would perform as a function of the degree to which the preference structures from which the examples are drawn violate the domain theory.  We generated training examples using the preference structures from the experiments shown in Figure 4 and plotted the number of flights required to reach 90% accuracy.  The results are shown in Figure 5.  With a preference structure that conforms to the domain theory, 17 flights are required.  With a preference structure that violates the domain theory to degree 0.5, 28 flights are required.  The number of flights required increases smoothly as a function of the degree of violation of the domain theory.


7.  Related Work

The Wisconsin Adaptive Web Assistant (WAWA) (Shavlik, et al 1999) is a system that finds a recommends web pages to a user.  It uses KBANN to determine if a user would like to visit a given web page.  The attribute set is a localized bag-of-words in which words are grouped according to which part of the page they occur in: title, the url, window, etc.  WAWA can learn from bias and advice given to the system as well as from example classifications.  The ScorePage neural network outputs a rating measure.  This rating is then used to determined the top n scores, and then present these n pages to the user.  Note however, that pages are not compared within the neural net.


In WAWA instances are classified as ‘+’ or  ‘-‘ representing like and dislike, respectively. The domain theory is specified in the form of rules providing advice.  One such rule is

WHEN consercutiveWordsInHypertext

                                (intelligent user interface)


This advice leads to a highly a highly weighted edge between a hidden node and the input node for the value “intelligent user interface” within the WordsInHyperText attribute.


Other uses of KBANN have involved categorizing examples. For these application, the domain theory is typically a set of rules that indicate under what conditions an example falls into a particular class.  Examples of such applications include splice junction determination in DNS sequences (Noordwinder 1990), determining whether a DNA sequence is a promoter (Towell 1998), and action selection for PID water tank controllers (Scott 1992).


The Decision-Theoretic Interactive Video Advisor (DIVA) (Nguyen & Haddawy 1999) uses pairwise preferences to represent user preferences in the domain of movie recommendation.  Pairwise preferences are obtained by having the user classify movies into like, ok, and dislike categories.  Every movie in a higher category is considered preferred to every movie in a lower category.  DIVA is a collaborative filtering system that maintains a database of user preference structures.  The system makes recommendations by computing similarity between preference structures.  The approach in the current paper combines the representational ideas from DIVA with the theory KBANN learning technique.



8.  Conclusions and Future Research

In this paper we have presented a technique for encoding assumptions concerning preference in a neural network.  We experimented with one encoding and showed that in our experiments it clearly outperformed an unbiased ANN.  Much work remains to be done. The examples of pairwise preferences were generated from a very simple form of preference structure.  We need to determine how well this technique works with more complex preference structures.  One of the strong points of this approach is the ability to correct for inaccuracies in the domain theory.  Our preliminary experiments on this are encouraging but we need to perform more exhaustive experimentation.  We intend to explore this in the domain of movies, since preferences over attributes in that domain are typically not independent. 


The approach taken in this paper assumed that we have one domain theory that applies to a large segment of the population.  But this may not be true in all domains.  Furthermore, we may have some individuals whose preferences highly conflict with the preferences of the majority.  In such domains, we may with to work with multiple KBANN networks that encode a variety of domain theories.  A weight could be assigned to each network, depending on the frequency of occurrence of that preference pattern in the population.



We thank Jude Shavlik and David Page for helpful discussions.




Ha, V. & Haddawy, P. (1998) Toward case-based preference elicitation: Similarity measures on preference structures, Proc. UAI98, pp 193-201.


Keeney, R. L., and Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York.


Mitchell, T.M. (1997).  Machine Learning.  McGraw Hill, New York. (pp 340-344).


Nguyen, H. & Haddawy, P. (1999). The decision-theoretic interactive video advisor, Proc. UAI99, pp 494-501.


G. M. Scott, J. W. Shavlik & W. H. Ray (1992). Refining PID Controllers using Neural Networks. Advances in Neural Information Processing Systems, pp. 555-562, Denver, CO. Morgan Kaufmann.


Shavlik, J. & Towell, G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1(3): 233-255.


J. Shavlik, S. Calcari, T. Eliassi-Rad, & J. Solock (1999).  An Instructable, Adaptive Interface for Discovering and Monitoring Information on the World-Wide Web. Proceedings of the 1999 International Conference on Intelligent User Interfaces, Redondo Beach, CA.


G. G. Towell, J. W. Shavlik & M. O. Noordewier (1990).      Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks. Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 861-866, Boston, MA.