import yaml
with open("_data/schedule.yml") as f:
= yaml.safe_load(f)
schedule_data =schedule_data) ojs_define(scheduleData
CS 457: Natural Language Processing
Tuesday/Thursday, 12:45PM-2:00PM
All class meetings take place in 75 Shannon Street room 202
Instructor: Professor Laura Biester
You can call me Laura, Professor Laura, or Professor Biester, whichever you are more comfortable with
Email: lbiester@middlebury.edu
Office: 75 Shannon Street Room 214
Drop-In Hours: Tuesday 2-3
Wednesday 10-11
Thursday 2-3
One-on-one Appointments: please email me to schedule a one-on-one meeting
Lunch Appointments: book here at least 24 hours in advance
purpose: casual discussion in the dining hall of topics related to CS but not directly related to the course, including but not limited to NLP research, the CS major, working in tech, and graduate school
Anonymous course feedback form
Course Description
From the course catalog: In this course we will explore computational models for processing natural (human) language. We will introduce statistical and algorithmic techniques that are used to classify, generate, and understand language at the syntactic and semantic levels. We will explore applications such as parsing, information extraction, language modeling, and sentiment analysis. Assignments will involve constructing and modifying systems and will incorporate a variety of large corpora. We will also discuss the ethical concerns associated with current methods for collecting and labeling large corpora, and how language technologies might reflect and reinforce social hierarchies. This course fulfills the Responsible Computing requirement for the Computer Science major.
Prerequisites
The prerequisites for this course are CS 200 and CS 201. To be successful in this course, you should:
- Be a self-sufficient programmer. We will focus on learning new NLP algorithms, and will not spend significant time reviewing basic Python programming.
- Have some knowledge of conditional probability and bayes rule.
Additional mathematics, computer science, and linguistics courses might supplement your knowledge in ways that are helpful in CS 457, but knowledge of topics covered in those courses is not assumed.
If you are concerned about your preparation for this course, please come talk to me!
Learning Objectives
By the end of this course, you will:
- Be familiar with NLP methods in three key areas: text classification, text generation, and language understanding
- Be able to effectively use python libraries that are part of the large ecosystem of tools for NLP
- Explore various NLP applications, including applications with positive societal impact
- Be able to identify pros and cons of various data collection and labeling practices that are commonly used in NLP, including data scraping and crowdsourcing
- Be exposed to ways in which language technology can perpetuate stereotypes and biases related to minoritized groups, and inequity among different languages
- Demonstrate your ability to engage with recent research papers in NLP
Course Work and Grading
Your grade will be determined based on your performance on assignments that evaluate your skills in three core areas of the course:
- Methods: Applying, implementing, and evaluating core NLP methods on structured assignments
- Research: Engaging with NLP research papers and completing a final project
- Responsible Computing: Reflecting on the ethics of NLP models, datasets, and data collection practices
There are four types of assignments throughout the course:
- Homework (21 points available): the homework will generally involve implementing NLP algorithms discussed in the first 7 weeks of class and reflection on your implementation
- Exam (8 points available): the exam will cover content related to the first 7 weeks of class, similar to what you have seen on homework assignments and worksheets
- Supplemental Reading (5 points available): you will submit reading responses on NLP papers and/or present in class
- Final Project (17 points available): you will complete a final project of your choosing applying NLP techniques and research methods
The table below shows how your performance in each of the three core areas will be mapped to a final grade:
Methods | Research | Responsible Computing | |
---|---|---|---|
Set of Assignments | homework and exams | supplemental reading and final project | points from any assignment that are pre-designated with an RC tag |
To earn an A | 25 | 20 | 6 |
To earn a B | 21 | 16 | 4 |
To earn a C | 17 | 13 | 2 |
To earn a D | 13 | 10 | 0 |
You must complete the required number of points in each area to earn a given grade. The “+” and “-” modifiers will be applied by the instructor to the base grade above when the work completed falls in between bundles, e.g., an “A-” would be assigned for work that is close to but not does meet all the requirements for the “A” bundle and “B+” would be assigned for work that meaningfully exceeds the “B” bundle requirements but is not close to the “A” bundle.
While some flexibility is afforded, I expect the average A student to earn 18 homework points, 7 exam points, 5 supplemental reading points and 15 final project points, with 7 of those designated RC.1 Details on the grading for each assignment type are available below.
Homework
Homework will be assigned most weeks in weeks 1-8 and due one week later on Thursday evening. The purpose of the homework assignments is to give you experience implementing the algorithms that we discuss during class and evaluating NLP techniques. Most homework assignments will include both a programming component and a written reflection.
Grading
All homework assignments will have both an autograded component and a manually graded component. The autograder might do something as simple as confirming that you submitted the correct files, but it may also test your code on unseen data in cases where the algorithm you are implementing is fully deterministic. Each homework is worth 3 points, with points typically allocated as follows:
- 1 point: pass all autograder tests
- 1 point: pass all manual checks of code
- 1 point: satisfactory written reflection
Partial points are not given; if you do not initially meet the requirements to earn credit, you can revise your work before the final deadline.
Leaderboard
A small number of homework assignments will include an option to compete with your classmates on a leaderboard to see who can create the best model! The purpose of this competition is primarily to deepen your understanding of the topic and to gain bragging rights. However, the winner of each competition2 will receive one extra credit point on the homework assignment. The leaderboard competition always has a built-in tie-breaker!
Deadlines and Revision Policy
Each homework assignment has three deadlines:
- Initial feedback deadline: Submissions received by this date will receive manual feedback within a week. Between this deadline and the next deadline, you will be able to resubmit to the autograder as frequently as you would like. After receiving manual feedback, you will have a week to incorporate that feedback into your submission to receive one more round of manual feedback.
- Final autograder deadline (2 weeks later): This is the final date to make revisions to your code to pass the autograder.
- Final revision deadline (2 weeks later): No work on the assignment will be accepted after this deadline, and this final dealine is intended exclusively to address manual feedback given after the final autograder deadline. This feedback may necessitate changes to your code and/or your written report.
You should always attempt to complete the assignment by the initial feedback deadline. The feedback you receive will give you guidance on how to improve your work so that you eventually receive full credit.
Extensions
There are no extensions for homework assignments barring extenuating circumstances. While I strongly encourage you to attempt to complete the full assignment to the best of your ability prior to the initial feedback deadline to stay on track, there is no penalty to your grade for failing to do so; however, you lose the ability to work with a partner and the opportunity to receive multiple rounds of feedback, both of which will likely increase your chance of receiving the full 3/3 points on the assignment!
Working with a Partner
You must pass all autograder tests on at least two assignments to unlock the opportunity to complete future homework assignments with a partner of your choice. Typically, this will be homeworks 1 and 2, which are individual assignments for all students, as is homework 7. The purpose of this policy is to ensure that you have sufficient programming skills and understanding of some basic concepts in the course before you work with a partner. Partnerships must be established and shared with me by the initial feedback deadline; this means that even if you haven’t written any code, you must at least submit a report indicating who you will partner with by the initial feedback deadline if you want to work with a partner.
Any work submitted with a partner that violates this policy will receive 0/3 points.
Exam
The exam is an opportunity for you to demonstrate your general knowledge about training and evaluating NLP models, and in some cases to apply algorithms that we have learned in class in a paper-and-pencil setting. There will be two exams covering the material from weeks 1-7 of the course: a midterm week 9 and a re-test week 11.
Final Project
To synthesize your knowledge, you will complete a NLP project with a group during the second half of the semester. There are two main categories of projects you can complete: a text classification-based project using an existing dataset, and a more open-ended research-oriented project. There are pre-set guidelines for a classification-based project, while I’ll work with your team to develop appropriate guidelines for a research-oriented project. Your team will submit both your project code and a written report, and you will give a presentation of your project (which may be either a live demo or a poster presentation) during our final exam slot.
Deadlines and Revision Policy
There will be deadlines for the project proposal, a draft of your literature review, and drafts of your full report on Thursdays starting in the middle of the semester. This is to ensure that you are on the right track to finish your project by the end of the semester; it will also give you an opportunity to receive feedback and revise your work if necessary.
The final version of your project is due during the finals period on May 16th. There will be an earlier feedback deadline, which is the latest day you can submit your final project for feedback and the opportunity to revise your work.
Reading Assignments
Required Reading
Reading assignments are available on the course schedule page. You should complete the reading each day; I expect most students will benefit from completing the reading prior to class, but you may choose to read it after class to review material from the lecture if you find that doing so is most beneficial for your learning.
Supplemental Reading
Most days in weeks 2-8 will have assigned supplemental reading in addition to required reading. Supplemental reading assignments are typically recent papers published in NLP or adjacent fields. You are expected to engage with this reading by writing reading responses throughout the semester.
You may submit up to three reading responses, which are graded credit/no-credit and worth one point. To earn the full five points available for supplemental reading, you must present a supplemental reading to the class, which you will do with a partner. You can earn up to two points for the presentation.
Revision Policy
A reading response with a small error may be revised for credit; if there are large errors, you will be asked to submit a response for a different paper.
If your presentation is not satisfactory, you will be asked to discuss points of confusion with me and complete a write-up prior to receiving credit.
Expectations of Students
You should expect to spend up to 10 hours per week on work outside of class to be successful in this course. If you find that you are regularly spending more time than 10 hours per week on the class, send me an email or stop by drop-in hours to chat.
You are expected to complete all required reading prior to the class in which we will discuss each topic. This will allow us to spend more of our class time on activities related to the topic instead of lecture.
Course Materials
Textbook
The main textbook for this course is the draft of the 3rd edition of Speech and Language Processing by Dan Jurafsky and James H. Martin.
Additional Materials
All additional materials assigned, which may include PDFs, blogs, audio recordings, or videos, will be freely available.
Python Environment and Computational Resources
All programming assignments for this course should be completed in Python. We are using Python in this course due to the large ecosystem of Python libraries available for NLP and machine learning.
The first few homework assignments for this course will not require external libraries; any Python workflow that you are comfortable with is appropriate, although I recommend that you use vscode.
Later assignments may require (a) training models for a long period of time, (b) external libraries, or (c) GPUs. I will provide guidance on how to complete these assignments using Middlebury’s Ada HPC cluster and/or github codespaces. You may also use the cluster for your final project, however, please remember that the cluster is a shared resource. I will ask you to reconsider project ideas that require training a week of GPU-time, for instance.
Course Policies
Resources Available to You
We have many resources that can make the learning process easier throughout the course:
- Professor Drop-In Hours: My drop-in hours are a great place to ask questions! You can ask questions about your homework, the lecture, the CS major, CS research, working in tech… even your general experience at Midd!
- Course Assistant Drop-In Hours: We have a course assistant who will also be available to help students with their coursework. Their schedule will be available early in the semester.
- ASI Drop-In Hours: Smith (our Laboratory Instructor) and Noah (our ASI) are available to help with python questions during their drop-in hours. They are not able to help with domain-specific questions. See their hours on their personal websites (Smith, Noah).
- CampusWire Message Board: Ask questions about course content and assignments on the CampusWire message board. Asking questions here allows your classmates to see answers to frequently asked questions. Do not share code for any assignments publicly on the message board.
- Email: If you have a question that cannot be asked on a public message board, please send an email to lbiester@middlebury.edu. I will commit to responding to emails from students within 1 business day; I will not respond to emails on the weekend.
Laptops
You are expected to bring a charged laptop to all class sessions. If you don’t have access to a laptop (even if for just a single class period), please contact me to ask about the availability of the department’s loaner laptops. The CS Department maintains a set of loaner laptops, preinstalled with relevant course tools, for both short-term and longer-term use. Given the small number of machines available (approximately 10), if you anticipate needing a laptop for a longer period (e.g., the entire semester or more), I encourage you to also inquire with the library about loaner equipment and/or Elaine Orozco Hammond about an Opportunity Grant, which can help you to purchase a laptop. Our department commits to meeting the needs of every student, so please don’t hesitate to contact Smith (our ASI) if you need a computer (in any way) for this course.
Collaboration and Outside Resources
Some assignments may or must be completed with a partner/group. You may also discuss your general approach with other classmates, but the code that you write is expected to be written by yourself and your partner.
With proper attribution, you are allowed to use online resources such as StackOverflow and ChatGPT to answer basic python questions that lead to a few lines of code, for instance “how do you get the key corresponding to the largest value in a dictionary.”3 You may not use online resources to solve the main problem posed by any assignment. ChatGPT, StackOverflow, and other online resources are not allowed on the exam.
Disability Access and Accommodation
Every class is made up of learners with different access needs. My goal is for each student in our class to succeed, and to create an accessible learning environment for everyone. Students who have Letters of Accommodation in this class are encouraged to contact me as early in the semester as possible to ensure that such accommodations are implemented in a timely fashion.
For those without Letters of Accommodation, assistance is available to eligible students through the Disability Resource Center (formerly called Student Accessibility Services). All discussions will remain confidential.
Please contact one of the ADA Coordinators at ada@middlebury.edu for more information.
Academic Integrity
As an academic community devoted to the life of the mind, Middlebury requires that every student complete intellectual honesty in the preparation and submission of all academic work. Details of our Academic Honesty, Honor Code, and Related Disciplinary Policies are available in Middlebury’s handbook.
Honor Code Pledge
The Honor Code pledge reads as follows: “I have neither given nor received unauthorized aid on this assignment.” It is the responsibility of the student to write out in full, adhere to, and sign the Honor Code pledge on all examinations, research papers, and laboratory reports. Faculty members reserve the right to require the signed Honor Code pledge on other kinds of academic work.
Tentative Schedule and Topics
= require("marked");
marked = import("https://cdn.jsdelivr.net/npm/date-fns@4.1.0/+esm")
dateFns
function md2html(text) {
const template = document.createElement("template");
.innerHTML = marked.parseInline(text);
templatereturn template.content.cloneNode(true);
}
function buildWeek(week) {
let tuesday = week.days[0];
let thursday = week.days[1];
return htl.html`<strong>Week ${week.week}:</strong> ${tuesday.topic} (<strong>Tu</strong>), ${thursday.topic} (<strong>Th</strong>)<br>`
}
function buildSchedule(scheduleData) {
return htl.html`${scheduleData.map(buildWeek)}`;
}
buildSchedule(scheduleData);
Note that some topics for weeks 10-12 are intentionally TBD! The topics for these class periods will be determined by student interests and needs, and may include:
- Overflow from weeks 1-9
- NLP topics that we didn’t have time to cover in weeks 1-9 (e.g., topic modeling, parsing)
- Exploring topics in greater depth (e.g., transformers, RLHF)
- Guest lectures on special topics
- Time to work on final projects in class
- Interactive demos of tools that may be useful for your final projects
See the detailed schedule for more!
Footnotes
RC points are available on designated homework assignments, supplemental reading, and on the final project. You do not need to submit any extra assignments to earn RC points, but you do need to ensure that the set of assignments that you complete have enough RC tags to achieve your desired grade.↩︎
In most cases, the winner will be determined exclusively by scores on the leaderboard. Any evidence of an honor code violation related to a leaderboard submission, including evidence that a student has searched for the test set online, will result in disqualification.↩︎
Of course, you could pretty easily write your own function to do this!↩︎