# Schedule


Welcome to Data Science Programming II! In this course, we will learn object-oriented programming to create tree and graph data structures to represent hierarchical data and implement algorithms for efficiently searching these structures.

We'll often create our own datasets, using techniques like logging, benchmarking, web scraping, and A/B testing.

In the last third of the semester we'll explore some basic machine learning techniques, including regression, classification, clustering, and decomposition.

Week Date Lectures Labs Quiz Project
1 Sep 4 - (I1) - - -
- Sep 6 Reproducibility 1: Overview (I2) - - -
- Sep 8 Reproducibility 2: Versioning (I3) - - -
2 Sep 11 Reproducibility 3: Git (I4) L2 - -
- Sep 13 Performance 1: Steps (I5) - - -
- Sep 15 Performance 2: Complexity Analysis (I6) - - -
3 Sep 18 Performance 3: Large Data (I7) L3 - -
- Sep 20 OOP 1: Classes (I8) - - -
- Sep 22 OOP 2: Special Methods (I9) - - -
4 Sep 25 OOP 3: Inheritance (I10) L4 - P1
- Sep 27 Recursion (I11) - - -
- Sep 29 Trees and Graphs Intro (I12) - - -
5 Oct 2 Trees 1: BST (I13) L5 - -
- Oct 4 Trees 2: DFS (I14) - - -
- Oct 6 Graph Search 1: BFS (I15) - - -
6 Oct 9 Graph Search 2: Data Structures (I16) L6 - P2
- Oct 11 Section 1 Wrap-up (I17) - - -
- Oct 13 Exam 1 (I18) - - -
7 Oct 16 Web 1: Selenium (I19) L7 - -
- Oct 18 Web 2: Recursive Crawl (I20) - Q6 -
- Oct 20 Web 3: Flask (I21) - - -
8 Oct 23 Web 4: AB Testing and Dashboard (I22) L8 - P3
- Oct 25 Visualization 1: Basic Plots (I23) - Q7 -
- Oct 27 Visualization 2: Trees and Graphs (I24) - - -
9 Oct 30 Visualization 3: Shapes and Maps (I25) L9 - -
- Nov 1 Data Preprocessing 1: Regex (I26) - Q8 -
- Nov 3 Data Preprocessing 2: Text (I27) - - -
10 Nov 6 Data Preprocessing 3: Image (I28) L10 - -
- Nov 8 Section 2 Wrap-up (I29) - - -
- Nov 10 Exam 2 (I30) - X1, X2 -
11 Nov 13 Supervised Learning 1: Classification (I31) L11 - P4
- Nov 15 Supervised Learning 2: Linear Classifiers (I32) - Q9 -
- Nov 17 Supervised Learning 3: Nonlinear Classifiers (I33) - - -
12 Nov 20 Supervised Learning 4: Linear Algebra (I34) L12 - -
- Nov 22 Supervised Learning 5: Regression (I35) - - -
- Nov 24 - (I36) - -
13 Nov 27 Optimization 1: Gradient Methods (I37) L13 - P5
- Nov 29 Optimization 2: Linear Programming (I38) - Q10 -
- Dec 1 Unsupervised Learning 1: Hierarchical Clustering (I39) - - -
14 Dec 4 Unsupervised Learning 2: K Means Clustering (I40) L14 - -
- Dec 6 Unsupervised Learning 3: Dimensionality Reduction (I41) - Q11 -
- Dec 8 - (I42) - - -
15 Dec 11 Reproducibility 4: Stochastic Processes (I43) L15 - P6
- Dec 13 Reproducibility 5: Simulation and Parallelism (I44) - - -
- Dec 15-17 Section 3 Wrap-up (I45) - - -
16 Dec 19 Exam (I46) - X3 -


# Course Websites and Forms


📗 This webpage (for lecture notes).
📗 Canvas (for grades): Link.
📗 TopHat (for quizzes): Link Code: 741565.
📗 Piazza (for discussion): Link.
📗 Regrade requests: Projects Form, Labs Form, In-class Quizzes Form.
📗 Exam conflicts: 1 Form, 2 Form, 3 Form.
📗 Feedback: Form Thank you Form.
📗 Professor Yiyin Shen: 2023.
📗 Professor Tyler Caraza-Harter: 2010-2022.

# Grading Scheme


Component Frequency Number Drop Lowest Points Each Total
(Q) In-class Quizzes Every Lecture 38 12 - 5
(L) Labs Every Lab 14 5 - 7
(P) Projects - 6 0 8 48
(Z) Quizzes Weekly 11 1 1 10
(X) Exam - 3 0 10 30


From To Letter Grade
93 100 A
88 93 AB
80 88 B
75 80 BC
70 75 C
60 70 D


📗 We will NOT be rounding up scores at the end of the semester.

# Admin


📗 Instructor: Yiyin Shen
📗 Lectures (Section 001): MWF 8:50-9:40
📗 Lectures (Section 002): MWF 11:00-11:50
📗 Office hours: Calendar

📗 Instructor: Young Wu
📗 Lectures (Section 001): MWF 8:50-9:40
📗 Lectures (Section 002): MWF 11:00-11:50
📗 Lab: T 4:00-5:15 (CS1370)
📗 Office hours: MTWRF 12:30-1:30

📗 Head TA: Jinlang Wang
📗 Lab: T 4:00-5:15 (CS1370)
📗 Office hours: Calendar

(TAs Listed in Alphabetical Order)

📗 TA: Rahul Chunduru
📗 Lab: M 4:00-5:15 (CS1370)
📗 Office hours: Calendar

📗 TA: Luke Dotson
📗 Lab: M 11:00-12:15 (CS1370)
📗 Office hours: Calendar

📗 TA: Mario Ivan Jaen Marquez
📗 Lab: M 2:30-3:45 (CS1370)
📗 Office hours: Calendar

📗 TA: Daniel McNeela
📗 Lab: T 11:00-12:15 (CS1370), T 2:30-3:45 (CS1370)
📗 Office hours: Calendar

📗 TA: Elliot Pickens
📗 Lab: T 9:30-10:45 (CS1370)
📗 Office hours: Calendar

📗 TA: Karthik Suresh
📗 Lab: M 1:00-2:15 (CS1370)
📗 Office hours: Calendar

📗 TA: Leitian Tao
📗 Lab: T 1:00-2:15 (CS1370)
📗 Office hours: Calendar

(PMs Listed in Alphabetical Order)

📗 PM: Zaid Albazian
📗 Lab: T 11:00-12:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Khai Bui
📗 Lab: M 11:00-12:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Amber Deng
📗 Lab: T 8:00-9:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Rutuja Gupte
📗 Lab: T 9:30-10:45 (CS1370)
📗 Office hours: Calendar

📗 PM: Hai La
📗 Lab: M 4:00-5:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Trishika Mukkavall
📗 Lab: T 2:30-3:45 (CS1370)
📗 Office hours: Calendar 

📗 PM: Vrishank Paladugu
📗 Lab: T 4:00-5:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Aryan Permalla
📗 Lab: -
📗 Office hours: Calendar

📗 PM: Sydney Scalzo
📗 Lab: -
📗 Office hours: Calendar

📗 PM: Garrison Waugh
📗 Lab: T 1:00-2:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Luke Welsh
📗 Lab: M 1:00-2:15 (CS1370)
📗 Office hours: Calendar

📗 PM: Tianyi Xu
📗 Lab: M 2:30-3:45 (CS1370)
📗 Office hours: Calendar

📗 PM: David Zhou
📗 Lab: -
📗 Office hours: Calendar

# Project Details

📗 Submission: Everybody will individually upload either a .py file, a .ipynb, or a zip (as specified) file for each project with the submission tool.
📗 Late Day System: 
➭ You have a bank of 12 late days without penalty for the semester.
➭ For a given project, you may use 3 late days without penalty. After that, 10% deduction will be applied per late day for the next 2 days. Projects which are late by more than 5 days will not be accepted.
➭ After the 12-day bank runs out, 10% deduction will be applied per late day.
➭ You may not use any late days on the last project P6.
➭ Late days are calculated as whole days. That is, even if your project is late by 1 hour, that counts as 1 whole late day.
➭ For calculating late days, we will always consider your last possible submission. We will not be accepting requests to grade a prior submission for the same project.
➭ Late days are automatically applied and do not need to be requested.
📗 Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part to be optionally done with your assigned study group and (2) an individual part. For the group part, any form of help from anybody in your group is allowed; we recommend you find times for everybody on the group to work at the same time so you can help each other through coding difficulties in this part. You're also welcome to do the "group" part individually, or with a subset of your assigned study group. For the individual part, you may only receive help from course staff (instructors/TAs/peer mentors); you may not discuss this part with anybody else (in the class or otherwise) or get help from them.
📗 Code Review: TAs will give you comments on specific parts of your assignment. This feedback process is called a "code review", and is a common requirement in industry before a programmer is allowed to add her code changes to the main codebase. TAs will also include reasons for deductions in the comments. Read your code reviews carefully; even if you receive 100% on your work, we'll often give you tips to save effort in the future. 
📗 Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not following directions, or other issues. All possible deduction cases will be listed in the grading guidelines we provided to you within each project. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it. Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs to evaluate that).
📗 Auto-grader: The autograder will run hourly after the release of a project. Because of this, we expect you to try submitting your project early and make sure nothing crashes. However, this should not be a substitute for running tester.py locally. You should only try submitting after you pass the tests locally.
➭ Clearing the auto-grader is a mandatory part of the project submission process. Regular project deadlines will be applicable for autograder failures as well. That is, your project submission must clear auto-grader within the hard deadline for a project. If not, we are unable to grade your project submission.
➭ If your project fails auto-grader, it will be your responsibility to utilize office hours and make an appropriate resubmission. The resubmission will also be counted towards late day usage.
📗 Allowed Packages: anything that comes pre-installed with Python and any packages used during the lectures and listed in the projects are allowed. Using unapproved packages may result in a score of zero when submitted for grading because the autograder won't be able to run your code without those packages.

# Lab Details

📗 We'll post a weekly lab activities document. You can work through it individually, or with your assigned study group. TAs and peer mentors will walk around to answer questions and check your progress in finishing the lab activities. If you have extra time at the lab after completing the lab document, you can work  on projects with your assigned study group. 
📗 To obtain the point for a lab, you need to submit screenshots of the work (code and/or running results) you have done so far to Canvas within five minutes after the lab ends. You don't have to finish every lab activity, but sufficient working progress is needed.

# Quiz Details

📗 There will be a short Canvas quiz due at the end of most Fridays. Make sure you know the rules regarding what is allowed and what is not.
📗 Allowed
➭ however much time you need
➭ discussing answers with members of your assigned study group who are taking the quiz at the same time
➭ referencing texts, notes, or provided course materials
➭ searching online for general information
➭ running code
📗 NOT allowed
➭ discussing answers with anybody outside of your group
➭ discussing with members of your group who have already completed the quiz when you haven't completed it yourself yet
➭ posting anything online about the quizzes
➭ using such material potentially posted by other 320 students who broke the preceding rule

# Exam Details

📗 These will be multiple choice or short answer exams taken through Canvas - Quizzes with HonorLock.
➭ Exam 1: Friday, Oct 13th, your lecture time
➭ Exam 2: Friday, Nov 10th, your lecture time
➭ Exam 3: Tuesday, Dec 19th, from 7:45 AM - 9:45 AM (probably only use 8:55 to 9:45).

# Readings

📗 We'll sometimes assign readings from the following sources (all free):
➭ Think Python 2nd Edition by Allen B. Downey: Link.
➭ Automate the Boring Stuff with Python by Al Sweigart: Link.
➭ Principles and Techniques of Data Science by Sam Lau, Joey Gonzalez, and Deb Nolan: Link.
➭ Scipy Lecture Notes by many contributors: Read Online Link.

# Cheating

📗 Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:
📗 Acceptable
➭ any collaboration with your assigned study group members on the group part of a project
➭ copying code examples from online that is NOT specific to your project (if project solutions are leaked online, you may not use that). If you copy code, you must cite it in your code with a comment (think of it like citing a quote in a essay -- without the citation, you're plagiarizing). Here're some code citing template:
➭ # copied/adapted from ... (website name) ... (link to the post) ...
e.g., # copied/adapted from Stackoverflow: https://stackoverflow.com/questions/24101524/finding-median-of-list-in-pythonLinks to an external site.
➭ # copied/adapted from ... (Large Language Models name) ... (prompt used) ...
e.g., # copied/adapted from GPT4: "write a Python function to find the median of a list."
📗 NOT Acceptable
➭ getting project help of any kind for the group part from anybody who is not either (a) in your assigned study group or (b) 320 staff
➭ getting project help of any kind for the individual part from anybody who is not 320 staff
➭ using part or all of project solutions found online
➭ Copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.
➭ breaking any of the rules listed under the "Quizzes" section
➭ helping somebody else cheat
📗 Similarity Detection: We will use automated tools to looks for similarities across submissions. We take cheating detection seriously to make the course fair to students who put in the honest effort.





Last Updated: December 24, 2023 at 1:42 AM