Schedule

Readings in normal font should be completed and annotated ahead of lecture.
Readings in italic provide optional additional depth on the material.
Assignments are listed on the day when I suggest you begin working on them.

Reading sources:

PSC: Lecture notes I’ve written for this course, hosted here.
PDSH: The Python Data Science Handbook by Vanderplas (2016).
BHN: Fairness and Machine Learning: Limitations and Opportunities by Barocas, Hardt, and Narayanan (2023).

Week 1

Mon Feb. 10	Welcome!
Mon Feb. 10	We introduce our topic and discuss how the course works.
	Learning Objectives Getting Oriented	Reading Course syllabus	Notes Welcome slides Data, Patterns, and Models	Warmup Set up your software.	Assignments Math pre-assessment.
Wed Feb. 12	The Classification Workflow in Python
Wed Feb. 12	We work through a simple, complete example of training and evaluating a classification model on a small data set.
	Learning Objectives Navigation Experimentation	Reading PDSH: Data Manipulation with Pandas (through "Aggregation and Grouping")	Notes Lecture notes Live notes	Warmup Manual linear prediction	Assignments Blog Post: Penguins

Week 2

Mon Feb. 17	Linear Score-Based Classification
Mon Feb. 17	We study a fundamental method for binary classification in which data points are assigned scores. Scores above a certain threshold are assigned to one class; scores below are assigned to another.
	Learning Objectives Theory Experimentation	Reading Linear Classifiers from MITx.	Notes Lecture notes Live notes	Warmup Decision Boundaries
Wed Feb. 19	Statistical Decision Theory and Automated Decision-Making
Wed Feb. 19	We discuss the theory of making automated decisions based on a score function. We go into detail on thresholding, error rates, and cost-based optimization.
	Learning Objectives Theory Experimentation	Reading PDSH: Introduction to Numpy	Notes Lecture notes Live notes	Warmup Choosing a Threshold	Assignments Blog Post: Design and Impact of Automated Decision Systems

Week 3

Mon Feb. 24	Auditing Fairness
Mon Feb. 24	We introduce the topics of fairness and disparity in automated decision systems using a famous case study.
	Learning Objectives Social Responsibility Experimentation	Reading BHN: Introduction Machine Bias by Julia Angwin et al. for ProPublica.	Notes Lecture notes Live notes	Warmup Experiencing (Un)Fairness
Wed Feb. 26	Statistical Definitions of Fairness in Automated Decision-Making
Wed Feb. 26	We offer formal mathematical definitions of several natural intuitions of fairness, review how to assess them empirically on data in Python, and prove that two major definitions are incompatible with each other.
	Learning Objectives Social Responsibility Theory	Reading BHN: Classification (ok to skip "Relationships between criteria" and below)	Notes Lecture notes Live notes	Warmup BHN Reading Check	Assignments Blog Post: Auditing Bias OR Blog Post: Bias Replication Study

Week 4

Mon Mar. 03	Normative Theory of Fairness
Mon Mar. 03	We discuss some of the broad philosophical and political positions that underly the theory of fairness, and connect these positions to statistical definitions.
	Learning Objectives Social Responsibility	Reading BHN: Relative Notions of Fairness	Notes Discussion guide shared on Canvas	Warmup COMPAS and Equality of Opportunity
Wed Mar. 05	Critical Perspectives
Wed Mar. 05	We discuss several critical views that seek to move our attention beyond the fairness of algorithms and towards their role in sociotechnical systems. We center two questions: who benefits from a given data science task? What tasks could we approach instead if our aims were to uplift the oppressed?
	Learning Objectives Social Responsibility	Reading Data Feminism: The Power Chapter by Catherine D'Ignazio and Lauren Klein "The Digital Poorhouse" by Virginia Eubanks "Studying Up: Reorienting the study of algorithmic fairness around issues of power" by Barabas et al.	Notes Discussion guide shared on Canvas	Warmup Power, Data, and Studying Up	Assignments Blog Post: Limitations of the Quantitative Approach

Week 5

Mon Mar. 10	No class
Mon Mar. 10	Phil is giving a talk at Michigan State

Wed Mar. 12	Introduction to Model Training: The Perceptron
Wed Mar. 12	We study the perceptron as an example of a linear model with a training algorithm. Our understanding of this algorithm and its shortcomings will form the foundation of our future explorations in empirical risk minimization.
	Learning Objectives Theory	Reading No reading today, but please be ready to put some extra time into the warmup. It may be useful to review our lecture notes on score-based classification and decision theory when completing the warmup.	Notes Lecture notes Live notes	Warmup Linear Models, Perceptron, and Torch	Assignments Blog Post: Implementing Perceptron

Break

Mon Mar. 17	Spring Break!
Mon Mar. 17

Wed Mar. 19	Spring Break!
Wed Mar. 19

Week 6

Mon Mar. 24	Convex Empirical Risk Minimization
Mon Mar. 24	We introduce the framework of convex empirical risk minimization, which offers a principled approach to overcoming the many limitations of the perceptron algorithm.
	Learning Objectives Theory	Reading Convexity Examples by Stephen D. Boyles, pages 1 - 7 (ok to stop when we start talking about gradients and Hessians).	Notes Lecture notes Live notes	Warmup Practice with Convex Functions
Wed Mar. 26	Gradient Descent
Wed Mar. 26	We study a method for finding the minima of convex functions using techniques from calculus and linear algebra.
	Learning Objectives Theory	Reading No reading today, but please budget some extra time for the warmup.	Notes Lecture notes Live notes	Warmup A First Look at Gradient Descent	Assignments Blog Post: Implementing Logistic Regression

Week 7

Mon Mar. 31	Feature Maps and Regularization
Mon Mar. 31	We re-introduce feature maps as a method for learning nonlinear decision boundaries, and add regularization to the empirical risk minimization problem in order to control the complexity of our learned models.
	Learning Objectives Theory Experimentation	Reading No reading today -- please think hard about your project pitches!	Notes Lecture notes Live notes	Warmup Project Pitches
Wed Apr. 02	Linear Regression
Wed Apr. 02	We introduce regression (prediction of numerical outcomes) and study the ridge regression model for linear regression.
	Learning Objectives Theory Experimentation	Reading No reading today.	Notes Lecture notes Live notes	Warmup Ordinary Least-Squares Linear Regression

Week 8

Mon Apr. 07	Bias-Variance Tradeoff
Mon Apr. 07	We explore the bias-variance tradeoff in regression and connect it to the phenomenon of overfitting.
	Learning Objectives Theory Experimentation		Notes Lecture notes Live notes	Warmup Variance of a Random Variable and Prediction	Assignments Blog Post: Double Descent
Wed Apr. 09	Vectorization and Feature Engineering
Wed Apr. 09	We illustrate the interplay of vectorization and feature engineering on image data.
	Learning Objectives Experimentation Implementation	Reading Image Kernels Explained Visually by Victor Powell	Notes Lecture notes Live notes	Warmup Small break today: no warmup.

Week 9

Mon Apr. 14	Kernel Methods
Mon Apr. 14	We introduce kernel methods for using high-dimensional feature maps in linear empirical risk minimization without the need to explicitly form feature vectors.
	Learning Objectives Theory Experimentation	Notes Lecture notes Live notes	Warmup Project Update	Assignments Blog Post: Kernelized Logistic Regression
Wed Apr. 16	The Problem of Features and Deep Learning
Wed Apr. 16	We motivate deep learning as an approach to the problem of learning complex nonlinear features in data.
	Learning Objectives Theory Experimentation	Notes Lecture notes Live notes	Warmup Nonlinear Fitting and Convexity

Week 10

Mon Apr. 21	Contemporary Optimization
Mon Apr. 21	We briefly introduce two concepts in optimization that have enabled large-scale deep learning: stochastic first-order optimization techniques and automatic differentiation.
	Learning Objectives Theory Experimentation		Notes Lecture notes Live notes	Warmup Project Update	Assignments Blog Post: Advanced Optimization
Wed Apr. 23	Deep Image Classification
Wed Apr. 23	We return to the image classification problem, using deep learning and large-scale optimization to optimize convolutional kernels as part of the training process.
	Learning Objectives Theory Experimentation	Reading Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning.	Notes Lecture notes Live notes	Warmup What needs to be learned?

Week 11

Mon Apr. 28	Text Classification and Word Embedding
Mon Apr. 28	We briefly study the use of word embeddings for text classification.
	Learning Objectives Theory Experimentation	Reading Efficient Estimation of Word Representations in Vector Space by Mikolov et al. (sections 1, 4, 5)	Notes Lecture notes Live notes	Warmup Project Update	Assignments Deep Music Classification
Wed Apr. 30	Unsupervised Learning and Autoencoders
Wed Apr. 30	We introduce unsupervised learning through the framework of autoencoders.
	Learning Objectives Theory Experimentation	Reading K-Means Clustering from PDSH	Notes Lecture notes Live notes	Warmup Compression factor of k-means

Week 12

Mon May. 05	Neural Autoencoders and Dimensionality Reduction
Mon May. 05	We use neural autoencoders to learn low-dimensional structure in more complex data sets.
	Learning Objectives Theory Experimentation	Notes Lecture notes Live notes	Warmup Project Update
Wed May. 07	Project presentations!
Wed May. 07	We celebrate your projects and learn about what you've done!
	Learning Objectives Project

References

Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. Cambridge, Massachusetts: The MIT Press.

Vanderplas, Jacob T. 2016. Python Data Science Handbook: Essential Tools for Working with Data. First edition. Sebastopol, CA: O’Reilly Media, Inc.