Welcome to CSCI 0451!

Prof. Phil Chodrow
Department of Computer Science
Middlebury College








Machine learning is the theory and practice of algorithmically learning patterns in data.







Machine learning is used for…

…automated consumer recommendations for content and shopping.












Machine learning is used for…

…generating realistic synthetic text, images, and code.







Machine learning is used for…

…predictions and recommendations for life-changing decisions: housing, healthcare, criminal justice.







Machine learning is used for…

…search engines, smart homes, computer vision, speech-to-text, scientific discovery, driver assistance systems…








Can you list the times in which you interacted with a machine learning system yesterday?

Big Messages



This class is about something that is already impacting your life, and is likely to do so more in the future.

We are going to grow in math, coding, technical writing, and critical awareness.

This class works by giving you opportunties to push yourself.

This class is fun and rewarding but not easy.









What are we going to learn in this class?

CSCI 0451 is….

Coding
  • Numerical array programming
  • Object-oriented interfaces
  • Experiments and visualization
Math
  • Linear algebra
  • Optimization (\(\implies\) calculus)
  • A bit of probability
Reading, writing, discussion
  • Technical methods
  • Bias, fairness, and impact of ML




NYT, 1957



What We Are Actually Talking About

\[\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + \mathbb{1}(y_i \langle \mathbf{w}^{(t)}, \mathbf{x}_i \rangle < 0)y_i \mathbf{x}_i\]

NYT, 2022

What We Are Actually Talking About


xkcd

My Approach

I want you to learn stuff in this class that is hard to learn from the internet.

LR = LogisticRegression()
LR.fit(predictors, target)
LR.predict(new_predictors)

We are going to learn this workflow in a day, then do more interesting things.

Special Focus: Disparity, Fairness, and Impact

Automated decision systems have a history reproducing structural privilege and oppression, especially in relation to race, gender, class, and sexuality.

What does it mean for an automated decision system to be fair? This is a hard question which we will discuss from multiple perspectives.








Rough, tentative plan for the semester

Fundamentals of Prediction (~2 weeks)
  • Data science workflow.
  • Score-based prediction, linear models, decision theory.
Fairness in Machine Learning (~2 weeks)
  • Legitimacy of automated decision-systems
  • Formal definitions of bias and fairness.
  • Limitations of formal methods.
Theory and Algorithms (~5 weeks)
  • Empirical risk minimization, convexity, optimization.
  • Controlling features: regularization and kernels.
Deep Learning (~2 weeks)
  • Image classification, text classification, word embedding.
  • We are not doing generative language models – take 457.









Ok…so what are we going to do?








Most Days



Warmup Activity
  • Complete ahead of time and submit on Canvas for effort.
  • Reinforces content from readings and connects them to lecture.
  • Present in groups of 5-6.
  • Randomly-selected presenter presents to the group.
Lecture
  • Math
  • Live-coding + experiments
  • Your questions and ideas!








Activities and assignments

Blog Posts
  • Substantial projects! Usually require >5 hours.
  • Involves implementation, experiments, and discussion.
  • Published on your blog.
Daily Warmup Activities
  • Relatively quick when you’ve done the readings.
  • One (random) person each day will present to your team.
  • Connects readings to lecture.
Project
  • In groups of your choosing.
  • Work on it throughout the semester, presentations in last week.
  • We’ll have activities etc. to help you pick a path.



Blog Posts

  • Perform experiments in Jupyter notebooks.
  • Create figures, add expository prose, etc.
  • (Sometimes) Implement algorithms in source (.py) files.
  • Render your notebooks into a blog using the Quarto publishing engine.
  • Host source code and rendered blog on GitHub.

Blog Post Feedback

  • E: You have demonstrated excellent and thorough learning in this blog post. You should definitely move on.
  • M: You have demonstrated learning in this blog post, but may have missed some opportunities. You could learn either by moving on or by revising this post.
  • R: You have demonstrated some learning, but have missed some important ideas or techniques. I recommend that you focus your learning on revising this assignment.
  • N: You didn’t really demonstrate learning with the material you submitted. I recommend that you focus your learning on revising this assignment.



Readings and Warmups

Do them!

Readings should be completed ahead of time.

Notes are for in-class.

Let’s practice a warmup activity

Your Affinity Vegetable



1. Split into teams

2. Go around and share your name and:

If you were a vegetable, which vegetable would you be and why?

Your Affinity Vegetable



3. Team leader: lead your team in finding a delicious dish that incorporates all of your vegetables.

Be ready to share!









Grading





35 Possible Points

  • A: At least 30 points.
  • B: At least 26 points.
  • C: At least 22 points.
  • D: At least 18 points.
  • F: Fewer than 18 points.
Blog Posts (up to 21 points)

E = 3 points, M = 2 points, R = 1 point, N = 0 points

You can earn up to 21 points with any combination of Es, Ms, and Rs.

Warmups (up to 5 points)

Start with 5 points. You can miss 3 warmups with no penalty. After you miss 3, you lose 1 point for each additional missed warmup.

Project (up to 9 points)
  • 3 points: technical achievement of project.
  • 3 points: quality of deliverables (report, github repo)
  • 3 points: individual contribution to project (commit history)

The Wisdom Of Those Who Have Gone Before

Stay on top of the blog posts and do the daily warmups. also go to office hours if you are confused, Phil is helpful and there will likely be CS0451 peers there to talk through assignments with.

Review after each class using lecture notes so that you have a solid understanding of the concepts taught in class.

get to know quarto blogs and watch threeblueonebrown essence of linear algebra on Youtube to review some ideas

Be realistic in your goal setting and focus on what you want to get out of the course.

Focus on learning and growing instead of the grade. Be curious and think hard.






What is something that makes you feel excited or empowered about what you know of the class so far?

What is something that makes you feel nervous or confused about what you know of the class so far?