Course Project

In the final project you will apply what you have learned about data analysis, machine learning and generative AI to implement a model or analysis of moderate scope. The project will culminate in a session at the end of the semester. You will ideally work in teams of 2-3.

Timeline

Project pitches, 2026-01-20 to 2026-01-23 8:00AM
Project proposal, due 2026-01-23 8:00AM
Poster session, 2026-01-28 10:00AM-Noon (class time)
Final due date: 2026-01-30 4:15PM (end of semester)

Picking a project

Your project must:

Incorporate data in some way, either by collecting your own data or using an existing dataset.
Use or create at least one machine learning model, which could include using a pre-trained generative model.
Incorporate quantitative evaluation of a relevant metric, e.g., measuring accuracy of classifier.

Within those very broad requirements there any many potential projects. Some ideas are listed at the end of the assignment, but you are encouraged to explore whatever topics are of interest to you. We are happy and eager to help you brainstorm ideas!

You are not expected to tackle a “big” dataset or achieve state-of-the-art (or even “positive”) results. We have limited time and computational resources. A successful project satisfies the above requirements and demonstrates your understanding of the course material. It does not need to be “groundbreaking”, novel (in the research sense of the word), or even “work” at its intended goal. Replicating previous work with different data or a slightly different approach is OK. Clearly documenting your results and why you hypothesize the model didn’t work as intended is a successful project. Your project does need to be independent work, i.e., simply rehashing an existing tutorial or blog post is not sufficient.

Deliverables:

Project proposal

A brief description of the goals of your project and your plan to achieve those goals. See the details below. This proposal is not a contract, you are permitted (and expected) to deviate from this proposal as your project evolves.

Project diary

We expect you to work approximately 4 hours each workday on this project. For each hour you spend working, write exactly two sentences: one describing what you set out to do in that hour and a second describing what you accomplished in that hour. These are individual not team progress reports, every student will submit their own. The purpose of these reports are to encourage you to make steady and substantive progress and practice breaking your project down into small, manageable tasks. That doesn’t mean your effort has to be evenly distributed (I recognize you have other responsibilities), and you may have extended debugging periods where it doesn’t feel like you are making any progress. Both are OK, and the latter is normal!

Project code

Either submit a link to a publicly viewable Github repository, or zip up your code and submit it to the relevant assignment. Your code should include a README file with instructions on how to run your code and reproduce your results. Other artifacts, such as data/model cards, would be also be included here.

Poster

There is no formal final report, instead you will prepare and present a poster at a class-wide poster session. Check out the poster template for additional instructions about size, fonts, etc. You will likely want to use Google Slides/PowerPoint or Illustrator to make the poster. Whatever software you use, please submit a PDF of your poster to the relevant assignment.

Your poster should describe your project and results in a story that traces from the upper left to lower right. The necessary components are:

Goal: In just a sentence or two describe the goal of your project
Background: In a few paragraphs (and a figure if relevant), provide the necessary background information so a classmate could understand your project
Methods: What did you do to solve your problem. Describe the data you used, including how you collected/generated it (if relevant), the ways you prepared the data and the modeling or analysis techniques you used to achieve your goal.
Results: Present your results with relevant figures (typically at least 2 figures) and quantitative metrics. This section should include a synthesis of results, i.e., some discussion of what you could conclude from those results, not just the results alone.
Ethical, Social, and Legal Implications: Consider, and if relevant, address any ethical, societal, or environmental implications of your project.

Your poster only has so much space, so you will need to be concise and make deliberate choices about what is most important to include. Use figures and bullet points where possible (a poster is not a report and so does not need to be written in prose). In many cases, what appears on the poster will be a distillation of more detailed explanations found in your notebooks, data/model cards, etc.

Some additional notes about your poster:

Make sure your poster clearly communicates your results. As a practical matter, multiple groups will be presenting at one time during the poster * session, thus we can’t meaningfully take in your quantitative results orally, in the moment. Instead we will review your poster and other materials in depth afterwards.
Aim for a generic scientific audience, (i.e., don’t reference our class) with a similar level of background knowledge as you and your classmates. Recall that while your audience is familiar with the topics from class (e.g., you don’t need to explain what a neural network is), they likely don’t have any specific knowledge about your project topic (i.e., the specific algorithm or data you used).
Watch out for and eliminate weasel words (like “interestingly” and other beholder words), that sound quantitative without actually conveying information. Concision is key. Wherever relevant, aim for a “just the facts” style.
Use inline citations, i.e., numbers indexing into a references list in the corner of your poster.

For inspiration check out the posters in 75SHS.

Evaluation

Your project will be evaluated based on the following attributes using the “EMRN” rubric (Exemplary, Meets expectations, Revision needed, Not assessable). An exemplary project will have the following attributes:

Methodology: Approach is methodologically sound and sufficiently comprehensive to solve the intended problem (within resource constraints). Any software artifacts, such as notebooks, are high-quality with appropriate use of libraries, clear and mechanistically solid text, effective visualizations, and appropriate citations.
Results: Obtained specific results relevant to the intended problem. Derived thoughtful, insightful and carefully qualified conclusions from those results.
Poster: Clearly and effectively visually presents problem, data, methods, results and implications for responsible computing. Poster is high quality with effective figures, clear and mechanistically solid text and appropriate citations.
Responsible computing: Thoughtfully considered and, where possible, acted to mitigate ethical, societal, and/or environmental implications of the project. Any data/model cards are high quality.

The evaluation will be weighted in part by difficulty. When choosing a project you should aim to balance ambition with the likelihood of successfully achieving your goals. Perfect execution of a very limited project would not meet expectations, while imperfect execution of a more ambitious project could meet expectations or even be considered exemplary. But an impossible problem will be impossible to execute. Meeting your proposed goals is not a requirement - we can’t always predict the obstacles we will face - but you are expected to make an appropriate effort to achieve realistic goals within the constraints of the course.

You will receive feedback on your initial project submission, and have the opportunity to revise and resubmit before the final deadline.

Project ideas

Here are some potential project ideas to get you started. These are just suggestions, we encourage you to pursue whatever topics interest you! Please come talk to us about your ides, we are eager to help you brainstorm and appropriately scope your project.

Identify interesting or relevant Middlebury data sources such as campus environmental data, the course catalog, IRS filings, etc. The specific analysis would depend on the data, but could include building predictive models of quantitative data, embedding-based analysis or search tools for text sources, etc. For example, predict future energy demand, build a course recommendation system, compare Middlebury IRS filings to peer institutions, etc.
Create and quantitatively optimize prompts for LLMs to automate or augment an existing “manual” workflow, e.g., interpreting/classifying historical documents, performing content moderation, making auditing decisions, etc.
Train simple generative models on small datasets of interest, e.g., plays, then quantitatively evaluate the impact of different modeling choices (e.g., tokenization, architectures) performance.
Generate synthetic data to protect privacy. Train a model to create synthetic data in a domain of interest (e.g., images, text, tabular data), then evaluate how predictive models trained on synthetic data perform on real data.
Identify how problems from other domains, e.g., genomics, can be mapped to use “off-the-shelf” text or image analysis techniques. Quantitatively evaluate how well these techniques perform compared to domain-specific approaches.

Project Pitch

The first “mini-assignment” for your project is a project pitch, which is designed to help you get thinking and ultimately connected with similarly interested teammates. Your project pitch does not necessarily represent the project you’ll ultimately work on; we expect many of you will fuse your project ideas or may join in with someone else’s pitch. That’s OK! We expect everyone to submit a pitch regardless.

Prompt

Please pitch a project idea with a post in the “Project Pitches” category on Campuswire. Your pitch should include:

One or two sentences about the big picture: what problem would you like to address? How does data science, machine learning, or generative AI fit into your approach to tackle this problem?
A list of required resources. If your project requires data (almost all will), please give a link to a data set to which you have access or state a plan for gathering the data you’ll need. If your project requires working with a model or API, please link to documentation demonstrating that you have access.
- Note: we do not intend any student to spend money for enhanced access to APIs or models. If you are having trouble accessing a resource you need for free, please come talk to us.
State the kind of task or problem your pitch involves. Is it a classification or regression problem? Does it involve generating text or images? Etc.
Describe how you will judge whether your project is successful. What are you hoping to produce or achieve by the time the project is complete?
Close your pitch by letting us all know: why are you excited this topic?

Project Proposal

Your project proposal is a written description of your project as you and your group currently envision it. This proposal is not a binding contract. The purpose is to set you up for success by encouraging you to think through your project in advance and create an opportunity for feedback about the feasibility of your idea.

To work on your project proposal, please fill out the template which contains all the required sections. You can download the notebook as a .ipynb file and upload it on Gradescope. Only one member of a group needs to submit, but they must add their group members to the submission. Everyone is responsible for making sure they are linked to a submission.

Proposal Format

Your project proposal should include approximately one (brief) paragraph under each of the following headers:

Team Members

No paragraph needed here, just a list of team members.

Abstract

A one-paragraph overview of the entire proposal. What are you going to do, why is it important, and what will be the result?

Motivation

Please describe your motivation for your project. Why are you (and your team) excited to do this project?

Planned Deliverables

What’s the thing that your project aims to produce? A written data analysis, a small app, a custom-trained model, etc? Please be specific about what your ideal deliverable will accomplish and how users might interact with it.

Resources Required

Your project is very likely to describe a data set, and may also require computational power, a model or API, or access to other resources. Please describe what resources you will need to complete your project, and how you will get access to them. Include links or other documentation to demonstrate that you have the access you’ll need.

What Skills You’ll Demonstrate

Please briefly describe what skills from this class you expect to demonstrate in your project. Some examples might include data wrangling and visualization, predictive machine learning, interacting with LLMs via APIs, etc.

Risk Statement

Every project entails some risk of something not working out exactly as planned. Maybe your model isn’t as accurate as you hoped; your data set is missing a key feature; or your code doesn’t run fast enough to power a usable app. Please describe what you see as the biggest risks to your project succeeding, and how you plan to mitigate those risks.

Ethics Statement

In this section, please reflect on a simple question: if your project was successful and its outputs widely shared, would the world become a better place? Better for whom? Any potential negative consequences?

Tentative Timeline

Our timeline in this course is incredibly compressed. Please outline a rough, tentative timeline for your project, day-by-day, assuming that you and your group have approximately 5 hours each weekday to work on it. You should account for:

Thursday 1/22
Friday 1/23
Monday 1/26
Tuesday 1/27
Wednesday 1/28
Thursday 1/29

For each day, please describe your best guess for what you and your group will focus on and accomplish that day. This timeline is not binding, but it should help you and your group plan your work and set expectations.

(Please keep in mind that whatever your first guess is for time to finish a task, the actual amount is likely to be at least double that. Plan accordingly!)