Homework 7: Responsible Text Classification

Text classification is a prevalent technique in NLP that can be applied in many settings. Some that we have seen so far include sentiment analysis, hate speech detection, and language identification. Most would argue that both of these tasks are not innately harmful; in theory, hate speech detection models can be used to remove harmful content from the internet, and (perhaps more controversially) LLM detection models can be used to enforce course policies and ensure that students are submitting their own work without forbidden use of LLMs.

As you have seen throughout the semester, all text classification models will have some false negatives and false positives. When text classification is applied to solve these real-world problems, those are important to consider. Furthermore, it is of the utmost importance to ensure that models do not discriminate based on protected attributes. However, research in NLP has shown that they do.

In parts 1 and 2, you will reflect on multiple questions related to our two topics. The minimum requirements for each of the first two parts are that you write at least 300 words,1 answer each question, and write in full, coherent sentences. Work that does not meet these minimum requirements will need to be revised; you will also, of course, be graded on the content of your answers.

To prepare to complete this assignment, please read the following two papers closely:

Finally, in part 3 of the assignment, you will think about what you might want to explore next in relation to this topic by listing a few papers you might want to read. Therefore, I suggest keeping track of some of the citations as you read the papers listed above!

Part 1: Hate Speech Detection Reflection (1 point)

Sap et al. demonstrated that hate speech detection models exhibit racial bias. While these models have clear real-world utility for automating content moderation, the results might cause you to think twice about deploying them at a large scale. In this part of the assignment, you will reflect on how hate speech detection models could be built more responsibly, and how they can effectively be used in conjunction with human content moderators.

Question 1:

How do Sap et al. suggest mitigating bias in annotations of hate speech? What are some pros and cons of their method?


Question 2:

Imagine you wanted to build a model for hate speech detection that is inclusive of different language varieties, including AAE and English dialects outside of the US. How would you go about doing so? Which stakeholders would you involve? How would you go about collecting and annotating data?


Question 3:

Imagine you work for a company that is grappling with the following concerns with respect to classification systems that are used in their content moderation pipeline:

  1. False positives that often flag non-offensive speech written in dialects that differ from White-aligned English as offensive.
  2. False negatives that expose users to offensive content.
  3. The harms caused to annotators who must read offensive content to create training data for models.

How would you suggest that the company balances these concerns?


Part 2: LLM Detection Reflection (1 point)

You are all well-aware that language models like ChatGPT have recently become prevalent, and their use expands to educational settings. Courses often disallow the use of LLMs to generate text or code, as doing so may conflict with the course’s learning objectives. However, enforcement of this policy is difficult, as humans can have a hard time distinguishing text written by an LLM from text written by a human. This has led to the creation of numerous LLM detection models like GPTZero, which are simply text classification models that predict whether text was written by a human or a large language model.

Question 1

With the knowledge that you have gained in this course, how might you implement a LLM detection model?


Question 2

Imagine a LLM detection model does not have any disparate performance based on social groups. In that case, what precautions should instructors take when applying a LLM detection model to student work? What thesholds for precision and recall should be required? What type of data should these models be tested on before use in an educational setting?


Question 3

The 2023 paper GPT detectors are biased against non-native English writers shows that GPT detection models (including those available commercially) frequently mis-classify TOEFL essays written by non-native English speakers as GPT-generated (61.3% of the time), while they only mis-classify essays written by US 8th graders 5.1% of the time.2 How do the findings in te paper affect your thinking on how these tools should be used?


Part 3: Further Exploration (1 point)

After completing the previous parts of this assignment, you will have read at least two papers related to responsible computing in NLP.3 Please list three papers that you would be interested in exploring in the future that are related to this topic (they may or may not come from the citations of the papers you have read).

Please include the title, author, and year for each paper, as well as a URL to access the paper. For each paper, write at least two sentences about why it interests you. While it is not a requirement, you might want to look for papers that seem relevant to your final project!

Footnotes

  1. The 300 words can be distributed as you see fit across the questions. If a certain question speaks to you, feel free to write a longer answer, as long as you answer each question with at least a full sentence.↩︎

  2. It is worth noting that GPTZero has considered this issue and written a blog post in response.↩︎

  3. Many of you have read additional papers as supplemental reading assignments↩︎