📗 Flask will be used to create or modify web pages. It can be useful for collecting visitor data when interacting with the web pages and displaying them on the web pages: Link
➩ app = flask.Flask(...) to create a web app: Link
➩ @app.route("/") binds a function to the root URL (front page of the website).
➩ @app.route("/abc") binds a function to a specific URL path on the site (one page on the website, or a file).
➩ app.run(host="0.0.0.0", debug=False, threaded=False) to run the app. host="0.0.0.0" makes the server externally visible.
📗 @app.route("/index")def index()return "Hello World!" binds the index function to the page IP address/index, meaning it will display a web page that says "Hello World".
➩ "Hello World" can be replaced by any text or HTML string, which can be read from an HTML file and modified in the index() function.
➩ HTML string can be read from existing HTML files then modified, for example, with open("index.html") as f:return f.read().
➩ It can also be generated by packages such as pandas, for example, pandas.read_csv("data.csv").to_html().
📗 To bind multiple paths, variable rules can be added, @app.route("/index/<x>")def index(x)return f"Hello {x}" will display a web page that says "Hello World" when the path IP address/index/World is used.
➩ The variable x can also be converted to another type for the index(x) function.
📗 One use of such visitor information is for rate limiting: preventing visitors from loading the pages too often, for example, to prevent web scraping.
➩ In this case, the visitor's IP address and visit time can be stored in a list: in case the next visit time is too close to the previous one, the visitor can be redirected to another page, or more commonly, responded with a error message, for example, return flask.Response("...", status = 429, headers = {"Retry-After": "60"} tells the visitor to retry after 60 seconds.
➩ A list of response status and header fields can be found: LinkLink, here status = 429 says "Too Many Requests".
📗 Two or more versions of the same page can be displayed to the visitor randomly to compare which one is better.
➩ The comparison is often based on click-through rates (CTR), which is computed as the number of clicks on a specific link on the page divided by the number of times the page is displayed to the visitor.
➩ CTR is also used for advertisement, computed as the number of clicks on the ad divided by the number of times the ad is shown to the visitor.
Click Through Rate Example
📗 Suppose the number of clicks are summarized in the following table, what are the click through rates, and is A statistically significantly better than B?
Version
click
no click
A
30
60
B
25
75
➩ The CTR for A is \(\dfrac{30}{30 + 60} = \dfrac{1}{3}\).
➩ The CTR for B is \(\dfrac{25}{25 + 75} = \dfrac{1}{4}\).
➩ Whether A is better can be determined by the p-value, which can be computed using scipy.stats.fisher_exact([[30, 60], [25, 75]]).pvalue. The output is 0.1075 and it means if A and B are the same then the probability of obtaining this data set is \(0.1075\). If a 5 percent threshold is used, then the hypothesis that A and B has the same CTR is rejects, meaning A is not statistically significantly better than B; if a 20 percent threshold is used (not a commonly used threshold), then A is statistically significantly better than B at the 20 percent level.
📗 IP address or user agent information can be used to figure out which version of the page is displayed to which visitors.
➩ Cookies can be used too, use flask.request.cookies.get(key, default) to get a cookie (and returns default if no such cookies exist), and flask.Response.set_cookie(key, value) to set a cookie. Cookies are stored on the visitors' computer as text files.
➩ Query strings can be used to figure out which version of a page the visitor came from, it is the string after "?" at the end of a URL: Link.
➩ The pages represented by boxes have fixed but possibly different average values between 0 and 1. Click on one of them to view a random realization of the value from the box (i.e. a random number from a distribution with the fixed average). The goal is to maximize the total value (or minimize the regret) given a fixed number of clicks. You have clicks left. Your current total value is . Refresh the page to restart. Which box has the largest mean reward?
📗 The previous design can lead to a large "regret", for example, if displaying bad version is costly. In other settings such as drug or vaccine trials and stock selection, experimenting with bad alternatives can be costly.
➩ Upper Confidence Bound (UCB) is a no-regret algorithm that minimizes the loss from experimentation.
➩ After every version is displayed once, the algorithm keeps track of the average value (for example click-through rate) from each version, say \(\hat{\mu}_{A}, \hat{\mu}_{B}\) and computes the UCBs \(\hat{\mu}_{A} + c \sqrt{\dfrac{2 \log\left(n\right)}{n_{A}}}\) and \(\hat{\mu}_{B} + c \sqrt{\dfrac{2 \log\left(n\right)}{n_{B}}}\), where \(n\) is the total number of visitors, \(n_{A}\) is the number of visitors of version A, \(n_{B}\) is the number of visitors of version B, and \(c\) is a constant, and always picks the version with a higher UCB.
➩ The UCB algorithm uses the principle of optimism under uncertainty and the UCBs are optimistic guesses: with high probability (the probability can be determined by \(c\)), the actual average is less than UCB.
📗 Notes and code adapted from the course taught by Professors Gurmail Singh, Yiyin Shen, Tyler Caraza-Harter.
📗 If there is an issue with TopHat during the lectures, please submit your answers on paper (include your Wisc ID and answers) or this Google form Link at the end of the lecture.
📗 Anonymous feedback can be submitted to: Form. Non-anonymous feedback and questions can be posted on Piazza: Link