Homework Five

Published

March 13, 2025

Due 2025-03-13 at the start of class

Download Starter Files Submit on Gradescope

Objectives

  • Demonstrate your ability to use lists
  • Demonstrate your ability to use dictionaries
  • Demonstrate your expanding ability to use loops

Getting started

For this assignment, we have another four functions for you to write. Please pay attention to all of these instructions. Even this front matter, which may look the same (and many of you skip over) has changed.

For this assignment, we have provided a starter file so that we can give you a extra helper function. Please do all of your work in homework05.py. This is the only file you need to submit to Gradescope.

Feel free to work through the functions in any order. Problems 3 and 4 are linked and it makes sense to do them in order, but it isn’t required. Whatever order you use, we suggest that you take the time to test each one as you go rather than trying to write all four out like they were part of an essay and then testing at the very end. Treat these are four totally separate and distinct problems (which they are).

Submit your solution on Gradescope using the button above. Feel free to resubmit as your complete each problem to check your progress. To repeat – you only need to submit homework05.py.

Python subset

For this assignment, the allowed subset varies from problem to problem. Be sure to read each problem carefully.

Satisfactory vs. Excellence

A solution for these questions that is excellent will have the all of the following qualities

Style

An excellent function will have a docstring that is formatted in the way shown in the lectures. It should include:

  • the purpose of the function
  • the type and purpose of each parameter (if any)
  • the type and meaning of the output (if any)

In addition, you should follow some of the PEP8 guidelines on whitespace. The ones we will be looking at are:

  • no whitespace between names and ( or [ (e.g., f (5) should be f(5) and s [3:5] should be s[3:5])
  • there should be a single space around operators (e.g., x=4+1 should be x = 4 + 1 and y = 3 -2 should be y = 3 - 2)
  • there should be a space after commas, but not before (e.g., f(4 , 5) or f(4,5) should be f(4, 5))

Special cases and requirements

For some of the problems, we have identified special cases or requirements that we have deemed potentially more challenging and not essential to a satisfactory solution. An excellent solution will cover all cases. These cases will be identified in the autograder by tests with * at the end of the title (so make sure you submit frequently as you are working).

Problem 1: min_max

Write a function called min_max which takes in a list (values) of one or more integers and returns the difference between the largest and smallest values in the list. The only external functions you are allowed (but not required) to call are len, range, and the type conversion functions.

min_max
Parameters
values list
Return type int
>>> min_max([1, 2, 3])
2
>>> min_max([5])
0
>>> min_max([7, 3, 5, 9])
6

Problem 2: Finding primes

Write a function call sieve that takes in a positive integer n and returns a list of the prime numbers up to an including n.

To do this, you should use the sieve of Eratosthenes, which is a very old algorithm for finding all of the prime numbers less than some target number n. There are a number of variations, but here is the basic idea.

We start with a list of all of the numbers from 2 to n (inclusive). We then visit each number in turn starting at 2. If the number has not been cleared, it is a prime, so we save it. Then we clear all of the multiples of that number by setting them to zero. Continue this way through the list until we hit n.

So, for example, if we want all of the primes up to and including 10, we would start with the list [2,3,4,5,6,7,8,9,10]. The first number we encounter is a 2, so we save that as a prime. Then, stepping by 2s we clear out all of the multiples ([2,3,0,5,0,7,0,9,0]). The next number is 3. We save that as a prime and then clear the multiples ([2,3,0,5,0,7,0,0,0]). The next number is cleared, so we move on. The next number is 5, so we clear the multiples (which were already cleared). Th next is cleared, so we skip it. Then we have the 7, so we save that as a prime and clear the multiples (which we don’t have in the list anyway). The final three are all cleared already so we skip them all. Then we return [2,3,5,7].

sieve
Parameters
n int
Return type list
>>> sieve(5)
[2, 3, 5]
>>> sieve(15)
[2, 3, 5, 7, 11, 13]
>>> sieve(42)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41]
>>> sieve(1)
[]

Problem 3: Word Frequency

A common component of text analysis is looking at word frequency (how many times words appears in the text). For this problem you are to write a function called count_words that takes in a string containing the name of a text file and returns a dictionary containing the frequency of every word in the file. The keys of the dictionary should be the words and the associated values should be the frequencies. This structure is sometimes called a concordance (though this is a simplified form since concordances usually also provide information about where the word occurs).

To help you out, we have provided you with the load_words(path) function which will read in the file at path, strips it of all punctuation, lowercases it and returns a list of individual words. You should use this to read in the file.

We have provided you with two text files, “herecomes.txt” “something_new.txt”. The first contains the lyrics of the Beatles’ “Here Comes the Sun”, which has the benefit of being short but with a lot of repetition. The second is a novel by P.G. Wodehouse, which will give you a lot more text to work with. You are welcome to use your own text files for testing. You can create new ones (make sure you are saving as plain text files), or you can find content online. Project Gutenberg is a great source of literature that has gone out of copyright (just make sure the download the plain text version).

To be eligible for a score of excellent, your function should not use any external functions other than load_words and the dictionary’s get method.

count_words
Parameters
path str
Return type dictionary

The second output has been shortened for clarity. In addition, dictionaries are unordered, so while the key value pairs should be the same, the ordering may differ.

>>> count_words("herecomes.txt")
{'here': 17, 'comes': 15, 'the': 12, 'sun': 25, 'doodoodoodoo': 5, 'and': 4, 'i': 5, 'say': 4, 'its': 10, 'all': 6, 'right': 6, 'little': 6, 'darling': 6, 'been': 4, 'a': 1, 'long': 1, 'cold': 1, 'lonely': 1, 'winter': 1, 'it': 8, 'feels': 1, 'like': 3, 'years': 3, 'since': 3, 'smiles': 1, 'returning': 1, 'to': 1, 'faces': 1, 'seems': 2, 'feel': 1, 'that': 1, 'ice': 1, 'is': 1, 'slowly': 1, 'melting': 1, 'clear': 1}
>>> >>> count_words("something_new.txt")
{'something': 96, 'new': 37, 'by': 221, 'pelham': 1, 'grenville': 1, 'wodehouse': 1, 'chapter': 13, 'i': 1060, 'the': 3987, 'sunshine': 4, 'of': 2171, 'a': 1942, 'fair': 13, 'spring': 16, 'morning': 57, 'fell': 20, 'graciously': 1, 'on': 577, 'london': 40, 'town': 14, 'out': 155, 'in': 1233, 'piccadilly': 8, 'its': 166, 'heartening': 2, 'warmth': 2, 'seemed': 50, 'to': 2149, 'infuse': 1, 'into': 143, 'traffic': 1, 'and': 1685, 'pedestrians': 1, 'alike': 8, 'novel': 9, 'jauntiness': 2, ... }

Problem 4: Top N words

Write a function called top_n_words(concordance, n) which will return a list of the n most frequently used words found in a concordance (such as the one you produced in the pervious problem). The output should be a list of tuples, containing the word and the count in that order. The list should contain the n most frequent words in descending order. If n is greater than the number of available words the function should just return all available words

To be eligible for a score of excellent, your solution should handle two words with the same frequency by ranking them in reverse alphabetical order. In other words, (‘a’, 5) should be placed higher in the list than (‘z’:5) (this is “reverse” because ‘a’ < ‘z’).

top_n_words
Parameters
concordance dict mapping word to frequency
Return type list of tuples of the form (word, count)
>>> top_n_words({"a": 5, "b": 3, "c": 1, "d": 4, "e": 50}, 3)
[('e', 50), ('a', 5), ('d', 4)]
>>> hcts_concordance = count_words("herecomes.txt")
>>> top_n_words(hcts_concordance, 4)
[('sun', 25), ('here', 17), ('comes', 15), ('the', 12)]
>>> sn_concordance = count_words("something_new.txt")
>>> top_n_words(sn_concordance, 10)
[('the', 3987), ('of', 2171), ('to', 2149), ('a', 1942), ('and', 1685), ('he', 1603), ('was', 1250), ('in', 1233), ('it', 1133), ('that', 1091)]
>>> top_n_words({"a": 5, "m": 5, "g": 6, "z": 5}, 3)
[('g', 6), ('a', 5), ('m', 5)]