Project

The course project for CSCI 0451 is an opportunity for you to demonstrate your learning against one or more of the course’s six learning objectives on a topic of your choosing. Here’s the big picture:

Deliverables

There are three deliverables associated with your project:

A project proposal. The purpose of the proposal is for you and your group to carefully outline what you want to work on and explain why it’s feasible. Your proposal will be in the form of a README.md file in a shared GitHub repository that will house your software.
The project software, (aka the GitHub repository itself) due at the end of finals week.
A project report in the form of an extended Google Colab notebook in which you explain what you did, relate it to existing work, and show your experiments or other findings. The report is due at the end of finals week.
A project presentation. The presentation will be 7-8 minutes and executed as a group. It should involve a visual aid, usually slides.

Additionally, there will be regular updates on your progress during the warmup period throughout the remainder of the semester.

I’ll share more detailed information on each of these deliverables as we go.

Group Work

I expect that most students will complete their projects in teams of 2-3 students. Individual projects and groups of 4 students should seek my permission prior to submitting their project proposal and explain the reason for such a small (or large) group.

Learning Objectives

Remember that we have six learning objectives in this course. The project is actually its own objective—that is, part of the course goal is for you to have the experience of initiating and pursuing an idea that you design. The other five objectives are:

Theory
Implementation
Navigation
Experimentation
Social Responsibility

In general, I expect most projects to address at least two of these learning objectives. For example, a project in which you implement and test a new algorithm would address Theory, Implementation, and Experimentation. A project in which you work with a data set that you care about on a learning task using existing tools could address Navigation and Experimentation. A project in which you replicated the findings of a recent study on algorithmic bias could address Experimentation and Social Responsibility. There are lots of valid possibilities. Your project proposal will address which of these learning objectives your project will address, and your final report will describe what you learned under each objective.

What Makes a Good Project?

Big Picture

There’s a lot more detail on this topic below, but there are two simple questions that you should ask yourselves when envisioning your project:

Will I learn something by completing this project?
Will I be proud of this project once it’s done?

If the answer to both questions is “yes,” then your overall project idea is likely pretty good good. Feel free to approach me early if you want to talk over whether your project idea is suitable for the course.

A Few Examples of Good Project Prompts

Train an image classification model to detect whether an image of a tumor is malignant or benign. Implement a custom model and training loop which reflects the fact that false negatives in this setting are much more costly than false positives. Compare different levels of model complexity and training strategies.
Implement from scratch a model which is completely different from any of the ones we’ve discussed in class, such as a decision tree, random forest, or support vector machine. Compare the performance of your model to a linear or logistic regression model across several data sets.
Gather a novel data set with interesting or unusual structure (e.g. multimedia data, data with a lot of missing values, data with a lot of categorical variables, etc.), prepare it, and compare the performance of algorithms with different levels of complexity on the data.
Use machine learning tools from this class and beyond to construct an agent which achieves some kind of task, such as playing a game or navigating a virtual environment.

Banned and High-Scrutiny Projects

Stock price prediction is banned as a project in this course. It’s formulaic and never beats the market, I promise.

Some projects are occasionally eugenics-adjacent, in the sense that they involve predicting something about a person based on immutable characteristics such as race, gender, or sexual orientation. These projects are not necessarily banned, but they will be subject to high scrutiny and must include a critical discussion of the ethical implications of the work. I’ll let you know if your idea falls into this category when you submit your proposal.

Critical Discussion

One thing that should be incorporated into both your proposal and your project writeup is a critical discussion of incentives and impacts in your model.

Incorporate a critical discussion of incentives and impacts in your work.

What is the impact that you would like to have on the world by doing this project? What are the incentives that you have for doing this project?
If someone were paying you to develop this model, who would be paying and why? Why might someone want this model to be built? Are you comfortable with that?
Who are the users of your work? Who could be affected by your work? Are these populations the same?
Are there risks of substantial bias or harm associated with the work you produce?

Other Requirements

There are a few restrictions on your project, designed to challenge you to think creatively about data and algorithms.

Your project data set may not come from Kaggle or from the UCI Machine Learning Repository. You’ll be required to show a link to a data source or provide code for your own custom data generation in your project submission.
Your project must use at least one machine learning model which you implement yourself, in PyTorch. This model:
- May not come from scikit-learn, statsmodels, or any other library that provides pre-implemented models.
- May not be simply least-squares linear or logistic regression – if you do use these models, you must compare them to at least one other, more complex model that you implement yourself.

It’s ok if your data set also appears on Kaggle or the UCI Repository, but you must provide a link to the original source and use the data as it appeared in its original source (sometimes Kaggle data sets are modified or simplified.)

Alternative Projects

I expect that most of you will pursue applied projects that involve studying the performance of machine learning models you implement on one or more data sets. That said, I’m open to proposals for creative alternatives, such as:

Write and present a research essay on a topic related to algorithmic bias or the social impacts of automated decision systems.
Do a theoretical/mathematical study of a particular machine learning algorithm or class of algorithms.
Other?…