Young Wu's Homepage

Prev: L21, Next: L23

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.

📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.

📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.

# Lecture Notes

📗 AB Testing

➩ Two or more versions of the same page can be displayed to the visitor randomly to compare which one is better.

➩ The comparison is often based on click-through rates (CTR), which is computed as the number of clicks on a specific link on the page divided by the number of times the page is displayed to the visitor.

➩ CTR is also used for advertisement, computed as the number of clicks on the ad divided by the number of times the ad is shown to the visitor.

Click Through Rate Example

➩ Suppose the number of clicks are summarized in the following table, what are the click through rates, and is A statistically significantly better than B?

Version	click	no click
A	30	60
B	25	75

➩ The CTR for A is \(\dfrac{30}{30 + 60} = \dfrac{1}{3}\).

➩ The CTR for B is \(\dfrac{25}{25 + 75} = \dfrac{1}{4}\).

➩ Whether A is better can be determined by the p-value, which can be computed using scipy.stats.fisher_exact([[30, 60], [25, 75]]).pvalue. The output is 0.1075 and it means if A and B are the same then the probability of obtaining this data set is \(0.1075\). If a 5 percent threshold is used, then the hypothesis that A and B has the same CTR is rejects, meaning A is not statistically significantly better than B; if a 20 percent threshold is used (not a commonly used threshold), then A is statistically significantly better than B at the 20 percent level.

📗 Other Metrics

➩ Instead of clicks, scroll, subscribe, purchase, hover, shares, likes, comments can be measured to compare the effectiveness of the two versions of the same page.

➩ The two versions are called control (version A) and treatment (version B) in statistics, and the two versions can differ in wording, speed, font, size, color, icons, graphic design of the two pages. These differences are called treatment factors.

➩ Treatment factors can be introduced one at a time or displayed at the same time, i.e. the two versions of the page can differ in one or more factors.

Displaying Pages

➩ The choice of which page to display can be random, for example, if random.random() < 0.5: return "Version A" else return "Version B".

➩ It can also be displayed in a fixed ordering, for example, suppose count is a global variable to keep track of the number of visitors, then if count % 2 == 0: return "Version A" else return "Version B" count = count + 1 would alternate between the two versions.

➩ This can be done for a fixed number of times, and after that, only the one with higher click-through rate would be displayed to all visitors.

📗 Tracking Visitors

➩ IP address or user agent information can be used to figure out which version of the page is displayed to which visitors.

➩ Cookies can be used too, use flask.request.cookies.get(key, default) to get a cookie (and returns default if no such cookies exist), and flask.Response.set_cookie(key, value) to set a cookie. Cookies are stored on the visitors' computer as text files.

➩ Query strings can be used to figure out which version of a page the visitor came from, it is the string after "?" at the end of a URL: Link.

📗 Query Strings

➩ "index?x=1&y=2" is a URL specifying the path "index" with the query string x=1 and y=2.

➩ Use flask.request.args to get a dictionary of key-value pairs of the query string.

➩ To perform AB testing of a page with two versions, both contain a link to index: on version A, the URL can be <a href="index?from=A">Link<\a>, and on version B, the same URL can be <a href="index?from=B">Link<\a>.

➩ If version A URL is used, request.args["from"] would be "A" and if version B URL is used request.args["from"] would be "B".

AB Testing Example

➩ Build a website for AB testing and display the CTR table with the p-value.

➩ Use a cookie to make sure the visitor came from either page A or page B.

➩ Code to create the website: Notebook.

➩ Code to scrape that website: Notebook.

Two Armed Bandit Example

TopHat Discussion

ID:

Multi-Armed Bandit

➩ The previous design has experiments with alternatives for a fixed number of time then exploit the best alternatives.

➩ This design can lead to a large "regret", for example, if displaying bad version is costly. In other settings such as drug or vaccine trials and stock selection, experimenting with bad alternatives can be costly.

➩ The problem of optimal experimentation vs exploitation is studied in reinforcement learning as the multi-armed bandit problem: Link.

📗 Upper Confidence Bound

➩ Upper Confidence Bound (UCB) is a no-regret algorithm that minimizes the loss from experimentation.

➩ After every version is displayed once, the algorithm keeps track of the average value (for example click-through rate) from each version, say \(\hat{\mu}_{A}, \hat{\mu}_{B}\) and computes the UCBs \(\hat{\mu}_{A} + c \sqrt{\dfrac{2 \log\left(n\right)}{n_{A}}}\) and \(\hat{\mu}_{B} + c \sqrt{\dfrac{2 \log\left(n\right)}{n_{B}}}\), where \(n\) is the total number of visitors, \(n_{A}\) is the number of visitors of version A, \(n_{B}\) is the number of visitors of version B, and \(c\) is a constant, and always picks the version with a higher UCB.

➩ The UCB algorithm uses the principle of optimism under uncertainty and the UCBs are optimistic guesses: with high probability (the probability can be determined by \(c\)), the actual average is less than UCB.

UCB Testing Example

➩ Build a website that uses the UCB algorithm and display the mean estimates and the UCB.

➩ Use a cookie to make sure the visitor came from either page A or page B.

➩ Code to create the website: Notebook.

➩ Code to scrape that website: Notebook.

Additional Example

➩ Given the following table, which version of the page should be displayed next if the UBC algorithm is used? Suppose \(c = 1\).

Version	Click	No Click	Average	UCB
A	15	15	\(\dfrac{1}{2}\)	\(\dfrac{1}{2} + \sqrt{\dfrac{2 \log\left(60\right)}{15}}\)
B	10	10	\(\dfrac{1}{2}\)	\(\dfrac{1}{2} + \sqrt{\dfrac{2 \log\left(60\right)}{10}}\)
C	5	5	\(\dfrac{1}{2}\)	\(\dfrac{1}{2} + \sqrt{\dfrac{2 \log\left(60\right)}{5}}\)

The version with the highest UCB (optimistic guess of average) will be displayed, here it would be version C.
In this example, the averages (or estimated mean) are the same, so the version that is experimented the least (version C) is chosen.

➩ The pages represented by boxes have fixed but possibly different average values between 0 and 1. Click on one of them to view a random realization of the value from the box (i.e. a random number from a distribution with the fixed average). The goal is to maximize the total value (or minimize the regret) given a fixed number of clicks. You have clicks left. Your current total value is . Refresh the page to restart. Which box has the largest mean reward?

Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link

Last Updated: July 01, 2025 at 1:46 AM