Prev:
W2 , Next:
W4
Zoom:
Link , TopHat:
Link (936525), GoogleForm:
Link , Piazza:
Link , Feedback:
Link , GitHub:
Link , Sec1&2:
Link
Slide:
Go All Prev Next
# Small Datasets
📗 Small files can fit in the memory at once.
➩ Open the file: f = open("?.csv").
➩ Read the file: d = pandas.read_csv(f).
➩ Remember to close: f.close().
# Large Dataset
📗 Large files cannot fit in the memory at once.
➩ Open a zip file: z = zipfile.ZipFile("?.zip") and f = z.open("?.csv") (but f is in bytes, not text).
➩ Convert to text: t = io.TextIOWrapper(f).
➩ Read files line by line: r = csv.DictReader(t) and for row in read: ...
➩ Remember to close everything: z.close() and f.close()
# Generator
📗 A generator is an iterator that produces the value one at a time, so they are not stored in the memory at the same time.
➩ Define the generator function def gen():
➩ Output one value at a time: for row in r: yield row[0].
➩ Store the iterator in a generator object: g = gen().
➩ Call next(g) to produce the next value.
📗 For loops can also loop over a generator object: for v in g.
# Streaming Algorithms
📗 Mean:
➩ Start with s = 0 and n = 0.
➩ Loop over the generator: for v in g: s = s + v and n = n + 1.
➩ Return the mean s / n.
📗 Median:
➩ Sort on external memory.
➩ Approximation methods: median of median; counting sort
📗 Mode:
➩ If the number of bins (unique elements) is small, keep track of the histogram.
➩ If the number of bins is too large to put on the memory, drop the low frequency items when histogram gets large.
➩ Try find the names that come up the most in the Epstein files:
Link .
➩ Use
pypdf or
PyPDF2:
Doc or
Doc to extract text from the PDF files.
# Slides and Notes
📗 From sections 1 and 2:
➩ Inheritance notes:
Link .
# Dictionaries vs Classes
📗 Similar uses, classes can represent more complex structures.
📗 Special methods for classes of obj:
➩ __init__: for obj = Class(...).
➩ __str__, __repr__, _repr_html_: for print(obj), str(obj) or display obj in Jupyter notebook.
➩ __eq__, __lt__: for obj == obj, obj < obj, or [obj, obj].sort().
➩ __len__, __getitem__: for len(obj), obj[0], obj[0.1] or for o in obj:.
➩ __enter__, __exit__: for with obj: ... (runs before and after the block).
# Inheritance
📗 Define class Child(Parent)::
➩ The class Child inherits all variables and methods.
➩ The class Child can overwrite (replace) methods in Parent.
📗 (Only in Python) There can be more than one parent: class Child(Parent1, Parent2, Parent3)::
➩ Child.__mro__ specifies the "method resolution order" in case multiple parents have the same method.
# Questions?
test q
📗 Notes and code adapted from the course taught by Professors Gurmail Singh, Yiyin Shen, Tyler Caraza-Harter.
📗 If there is an issue with TopHat during the lectures, please submit your answers on paper (include your Wisc ID and answers) or this Google form
Link at the end of the lecture.
📗 Anonymous feedback can be submitted to:
Form . Non-anonymous feedback and questions can be posted on Piazza:
Link
Prev:
W2 , Next:
W4
Last Updated: February 10, 2026 at 10:03 PM