Homework Eight
Due 2025-04-24 at the start of class
Download Starter Files Submit on GradescopeObjectives
- Demonstrate your ability to read text files
- Demonstrate your ability to perform string manipulation for data cleaning
- Demonstrate your ability to work with CSV and JSON file formats
- Demonstrate your ability to work with two-dimensional data and nested loops
Getting started
For this assignment, we have another four functions for you to write. Please pay attention to all of these instructions. Even this front matter, which may look the same (and many of you skip over) has changed.
For this assignment, we have provided starter code. This primarily consists of the data files used in the examples below. These do not need to be submitted with your code to Gradescope.
Feel free to work through the functions in any order. Whatever order you use, we suggest that you take the time to test each one as you go rather than trying to write all four out like they were part of an essay and then testing at the very end. Treat these are four totally separate and distinct problems (which they are).
Submit your solution on Gradescope using the button above. Feel free to resubmit as your complete each problem to check your progress. To repeat – you only need to submit homework08.py
.
Images
One of these problems involves images. Make sure you have read through Working with Images and have the midd_media
package installed.
Python subset
For this assignment, we have not placed any restrictions on what you may use.
Satisfactory vs. Excellence
A solution for these questions that is excellent will have the all of the following qualities
Style
An excellent function will have a docstring that is formatted in the way shown in the lectures. It should include:
- the purpose of the function
- the type and purpose of each parameter (if any)
- the type and meaning of the output (if any)
In addition, you should follow some of the PEP8 guidelines on whitespace. The ones we will be looking at are:
- no whitespace between names and
(
or[
(e.g.,f (5)
should bef(5)
ands [3:5]
should bes[3:5]
) - there should be a single space around operators (e.g.,
x=4+1
should bex = 4 + 1
andy = 3 -2
should bey = 3 - 2
) - there should be a space after commas, but not before (e.g.,
f(4 , 5)
orf(4,5)
should bef(4, 5)
)
Special cases and requirements
For some of the problems, we have identified special cases or requirements that we have deemed potentially more challenging and not essential to a satisfactory solution. An excellent solution will cover all cases. These cases will be identified in the autograder by tests with *
at the end of the title (so make sure you submit frequently as you are working).
Problem 1: temp_extremes
Write a function called temp_extremes
. It should take in a string parameters (filename
) containing the path to a valid CSV file.
The CSV file will have columns for “date”, “min_temp”, “max_temp”, and “rainfall”. The date will be in the form “YYYY-MM-DD”. The other three columns will contain float data. You can expect the file to contain a header row. The file is expected to contain data for every day for a year, though it is not guaranteed that this is the case. It is also not guaranteed that every row will have every value filled in.
Your function should return a dictionary with four keys: “coldest month”, “coldest temp”, “warmest month”, and “warmest temp”. The values should be the month that was, on average, the coldest, the average temperature for that month, the month with the highest average temperature, and the average temperature for that month. If there is a tie, the first month should be returned. Since you have minimum and maximum temperatures for each day, you should average together the the extremes of the day to get the average for the day. If you don’t have both then you should use the one you have.
temp_extremes
|
|||
---|---|---|---|
Parameters |
|
||
Return type | dict (keys: “coldest month”, “coldest temp”, “warmest month”, “warmest temp”) |
Examples
We have provided a file that produces the following result. You are encouraged to make your own examples
>>> temp_extremes("mbh_daily_data_2022.csv")
'warmest month': '07', 'warmest temp': 73.55645161290325, 'coldest month': '01', 'coldest temp': 13.32419354838709} {
Problem 2: toc
Write a function called toc
that takes a filename html_filename
as input and returns a string representing a table of contents for the file. As you may have guessed from the name, the input file will be a HTML (Hypertext Markup Language) file. HTML is the markup language that is used to create webpages.
Your job is to read the html file and find all of the header tags, which are tags <h1>
, <h2>
, …, <h6>
. You will then extract the text between the opening and closing tags. For instance, you would extract the text “Course Description” from the HTML <h1>Course Description</h1>
.
The input document will be an arbitrary HTML file, which may contain other tags as well; your goal is exclusively to find the header tags, extract their contents, and make a table of contents.
Imagine the following HTML file:
<h1>This is an example file</h1>
<p>Your function should create a <i>table of contents</i></p>
<h2>There is some nesting</h2>
<h3>And maybe even more nesting</h3>
<h2>The basic idea is you need to find all of the headers</h2>
<h1>And return a string with your TOC</h1>
Your function should read the file and return the following string;1 each header should be separated by a newline, and headers should be indented with tabs to demonstrate nesting; text extracted from <h1>
tags should not be indented, while text extracted from <h2>
tags should be indented with one tab, text extracted from <h3>
tags should be indented with two tabs, etc.
This is an example file
There is some nesting
And maybe even more nesting
The basic idea is you need to find all of the headers
And return a string with your TOC
You can assume that the document is a well-formed HTML file; all opening tags should have a corresponding closing tag. You can also assume that there is only text between the opening and closing header tags, and that there are no other tags nested within them.
An excellent solution will correctly process header tags with attributes, e.g., <h1 class="title">
, which should be treated as a <h1>
tag. You can assume that the <
and >
characters always indicate the start and end of a tag.
toc
|
|||
---|---|---|---|
Parameters |
|
||
Return type |
str
|
Examples
The examples below use the files provided in the starter code; one is the example given in this assignment, while the other is a slightly edited version of the syllabus page from our course website.
>>> toc("example_toc.html")
'This is an example file\n\tThere is some nesting\n\t\tAnd maybe even more nesting\n\tThe basic idea is you need to find all of the headers\nAnd return a string with your TOC'
>>> toc("syllabus.html") # behavior for excellent that handles attributes
'Syllabus\nCourse Description\nLearning Objectives\nAssessment\n\tDeliverables\n\t\tLabs\n\t\tHomework\n\t\tChallenges\n\t\tExams\n\tGrading\nCourse Policies\n\tExtensions\n\tAttendance\n\tGetting help\n\tHonor code and collaboration\n\t\tPolicies:\n\t\tGenerative AI\n\tLaptops\n\tFostering an inclusive environment\n\tDisability Access and Accommodation'
Problem 3: clean_up
Write a function called clean_up
that takes two string parameters: input_file
and output_file
. This reads in a file in a semi-structured format and then outputs it as a structured JSON file.
The data file holds book records that look like this:
Title The Hitchhiker's Guide to the Galaxy
Author Douglas Adams
Genre Science Fiction, Humor
Year 1979
ISBN 978-0-330-25864-7
Summary Ford Prefect saves Arthur from Earth’s destruction.\nDon’t panic.
Notes First edition\tSigned by author
- Each line holds a key, value pair separated by a tab
- Each record should have values Title, Author, Genre, Year, and ISBN (but it is not guaranteed)
- The Author and Genre values can be single values, or comma separated lists
- Records can have additional fields (like Notes)
- Records are separated in the file by one or more blank lines
Your function should read in this data, clean it up and output it in a more structured JSON file.
- The JSON file should hold a list of dictionaries, one per book
- Each dictionary must have fields for Title, Author, Genre, Year, and ISBN
- If there is no value for a required field, the value should be set to UNKNOWN
- The Author and Genre fields should be strings for single values and lists for multiples
- If there are extra fields, the record should have an Extended field that holds a dictionary of the extra fields and their values
[{"Title": "The Hitchhiker's Guide to the Galaxy",
"Author": "Douglas Adams",
"Genre": ["Science Fiction", "Humor"],
"Year": "1979",
"ISBN": "978-0-330-25864-7",
"Extended": {
"Summary": "Ford Prefect saves Arthur from Earth\u2019s destruction.\\nDon\u2019t panic.",
"Notes": "First edition\\tSigned by author"
}
}]
This example was formatted to make it easier to read – json.dump()
will leave it in a more compact format.
Hint, you might want to look at the help()
entry on split()
clean_up
|
|||||
---|---|---|---|---|---|
Parameters |
|
||||
Return type | None |
Examples
In the starter packet, you will find two files: clean_up_example_input.txt
and clean_up_example_output.json
. You can look at these to get a sense of how the code runs. You are encouraged to make your own examples however.
Problem 4: img_to_ascii
Write a function called img_to_ascii
. It should take two parameters. The first parameter (img
) should be an image. The second parameter should be a string (filename
), which will be the name of a file to write the output into.
The purpose of the function is to read in an image and convert it to a piece of “ASCII art”, a text file where the greyscale levels have been converted to ASCII characters that approximate the luminance of the original pixels. For our version, we are going to use a 10 character scale: “@%#*+=-:. “, where”@
” is the darkest and “” is the lightest (this is backwards if you use light text/dark background, but we will target black text/white background).
Your function should look at each pixel, convert it to a luminance value and then pick the appropriate character from our scale above. You then will write the result out to a file.
Since characters tend to be a little taller than they are wide, you should skip every other row of pixels.
As a side note, you should use quite small images (around 150 pixels wide) for this. If you don’t the resulting text file will be hard to display.
img_to_ascii
|
|||||
---|---|---|---|---|---|
Parameters |
|
||||
Return type | None |
Examples
***********#################################***********+++=---::-:::::::::::::::
**************#########%#=+@%#%%%%%%@%%%#####**++===..==------------------------
-----==++*******#######@+--:-@@@@@@@@@%#%%%+=====+=-=+===========---------------
--------------==*@%@@@%@*+=--+#@@@@@@@##%#%*+++=#=:-=*+================-----====
----------------=@%%@@@@%#**+=*##@@@@@%#%%%*=++%+:::-*+========+++++++++++=*-=*+
----------------=@@@@@@@@%%#+-:+@#@%%@##%#%*=+%---:=********++++++++***++++#-=++
---------------==*@%%@%%####+===*%#%#%%#*###@%*=-=-=+#######************+++*:.-=
-------------===-+@%%%%###%%@@%%#***%%%#+*%#%**###*+=#%@%*+++++++++***###%@#:.--
---------------=-=%%@####*%*%@%%@#*+*%%*=*##*+*%@%##*##@@@@@@@@@@@@@@@@@@@@*:.--
-----------------=%%#***++*@%@@@@%%#@%@=.==#@@@@%#%%@##@@@@@@@@@@@@@@@@@@@@*:.--
-----------------=%##%%@@@@@@@@@@@@@@%@: .=*@%@@@%%@@@@%%@@@@@@@@@@@@@@@@@+..--
-----------------=#@@@@@@@@@@@@@@#@%*@@*. -*-@:-@%@%@++@@@@@@@@%%@@@@@@@%@+..=:
=--------------===*@%%%%%#%@@@@@@#%%%@#=.. =++%@@@%%@++@@@@@@@@#%@@@@@@@%@=.:=:
+------------=====*@%%%%%%%@@@@@@%#+-=*+:.. :+*###*%@@@@@@@@@@@%@@@@@@%%@=:-=
+------------=====+@%@@@@@@@@@%%%@*+++*%=::. .++*#*%@%@@@@@@@#%%%@%%%@%%@#%@@
*-------------====+@%%@@@@@%#%%%#*++++*-:+: :+=:-*#@@@@@@@@@@%@@@@@@@@=+@%#
#=------------=====%@@@@%%@#******+++*+=-+-.. =###@@@@@@@@%@@@@@@@@@+%@*+
#=------------=====#%#%@@###*++++***++**=-:... =**#@%#%@%%%@@@%@@@@@%=-+@@
%+------------=====#@@%##%##++***+++++===-.... .%#**######@#%%%%#%%%%@@@@#
%*------------=====*@#%#*##+***++++++==-:.... -##@%@@@@@@@@@@@@@@@@@%@@%
@#=-----------=====*#***%*#*++++++=-==::.... +@#@@@@@@@@@@@@@@@@%@@@%#
@%=-----------=====+@@@@%@*+=====+-:=:.::.... =##@@@@@@@@@@@@@@@@@@@@@%
@%+------------=====@@@@@%*+=-:-::--::..:... -@%@@@@%@@@@@@@@@@@@#%@@@
@%*------------=====%@@@@%*+=--:::::........ .=%#%@@@@@@@@@@@@@@%@@##@@
@@#=-------------==+@@@@@@#+=-:::::..:....... .=+#####@@@@@@@@@@@@@@#%%%@
@@%=----------=*#**+++==+*#+++=--=:::::..... -++##%###%@@@@@@@@@@@*=**%@
@@%+---------==+%@%@@%@%%@%*+=+==---::::...... .-++*%%%%%##@@@@@@@@@@@@@@@@
#@%*--------=@%%%%%%@@@@@%%#++===----=-::...........=++**#%@@@%#%%%@%%@%%%%@%%%#
#@@%=-------=*#@%@@@@@@@@@@%*++++++==--:::::....... .=*+##%@@@@##@%%%%%%%%%%%%%@
+@@%=--------#@%%%%@@@@@@@@%##*++==----:::::::.....-+**+*#%@@@@%*%@@@%%%%%%%%%@@
@@@%+--------*@%@@@@@@@@**@%@%#*++++=-::--::.::::-+#%###*##@@@%@##@%%%%%%%##%%@@
@@@@*----=---+@%%@@@@@@%%%@@%@%**++==-::----::---+%##%%%%##@@%%%%##@%%%######%%%
%@%%*+==**+***+#@%@@@@@%%@@@%%#++++==--::-==---++*%%%%%####@@#+#@##@%%#####%%@@@
#%%#%+-*#+%%%@*%@@@@@@@@@@%%%%@%*+++++=-:-==+==#@%@%%%@##%##*-=%%%#%%%%%###%%%@@
+++#*%@@%@@%@*++#@*+--::::+*#@@@@%*+*++=--=++++*%@@@@@@%%%*#::**#%#@@@@@@@@@@@@@
*=*#*+=-*@@*.==-------:--=+**%@%@@%*+++=--=++***%%%@@%%%%#*-:=+%%##@@@@@@@@@@@@@
++##**+*+==========-=----=+**#%%%%@@#+++===++***%@@@%%@@%#.:+#%%#%#@%%%%%%%@@@@@
+==+====+================++*+**#%%%%++++==++***##%%%#%##+.=@%@@@@%#@@@@@@@@@@@@=
==+==++++++++++++++++++++++*****+-::-=++++++***+*=++**+: -##@%%%%##@%%@@@@@@%#=-
+++++++++++*+**************+++**+-::--=*+++****=-:.. .#***+***-:-=+##@@%#%**#
++++++++++************#*####****+=:::-=*#****#+=:.. :::::... .::::::::.:-+*@
==+=++++++++*****#*#####%@@@#****=:::-=+####*#+=:.. .:.... .:-::::::::::::
=++==+++++++*****######%@@@@##**#*=-:--+######+-.. :-.... .::::::::-:::::-
========+=+++++***####%@@@@@@@@##%*=---+#@###*=:.. :=-.. .::::::-:::-:::-:
-=--==--=====++++***#%%@@@@@%+=+*%#=---=*@###+-:. .:+=-.... .:.::::::-:-::::::
-----=---======+++**#%@@@@@@*-:-+*#=---=+#@%#=:..:+*=:.. ..:::.::::::::::-::-:-
-:--==-=----=+==++*##%@@@@@@%:.-**+=---=+#@%+-:..=**=. .:::::::::::-:::--::-::
-----::----==--==+**#%@@@@@@@%*%%+:::::-+#%+-:. .**. -::::-::::-::::-=-:-:::
---:----:---:--==++*#%%@@@@@@@%#%*:..::+#-:-:. :=. -+::-:::---::--:---::-::-
-:-::::------:---==*##%@@@@@@@@@++*==#%#=:... . .=--::::::-::-:-:-:-=::--:-:-:
------:-:-:-----=---=*##@@@@@@@@@#==**%#-.. . --=--------::----=---:=--::--=
:-=-=----:-:-::--:::--=+*#%@@@@@@%@#-=**=+=:::..:::--:::::---:-:------:-:-:::-:-
=--=:-:-::::-:::::-::-:--*=*%@@@@@@@*-=:-:::::-:::-:::-::+::=-:::----:---:::--::
==--=--:--:::.::.:::::::-::--===+++---:-:-:::-::::::.-=------==-:--:----::-::=-:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@%###**********###%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@%##*=----------------------=*##%@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@%#*----------------------------------*#%@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@%#=----------------------------------------=#%@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@%#=----------------------------------------------=#%@@@@@@@@@@@@@@
@@@@@@@@@@@@%*----------------------------------------------------*%@@@@@@@@@@@@
@@@@@@@@@@%*--------------------------------------------------------*%@@@@@@@@@@
@@@@@@@@%#------------------------------------------------------------#%@@@@@@@@
@@@@@@@#=--------------------------------------------------------------=#@@@@@@@
@@@@@%#----------------+===+=---------------------=+===+-----------------#%@@@@@
@@@@%*---------------+ :+-----------------== ==---------------*%@@@@
@@@%*--------------=- +---------------+. .+---------------*%@@@
@@%*---------------= +-------------=: :=---------------*%@@
@@#---------------+ =%@@# =-------------+ +@@@* +----------------#@@
@#=---------------= + #@@@% .=-----------=: - @@@@# -----------------=#@
@*----------------= :@@@@@@@* +-----------=. -@@@@@@@= :=----------------*@
#=----------------= +@@@@@@@# .+-----------=: *@@@@@@@* -=----------------=#
#-----------------+ =@@@@@@@# --------------= +@@@@@@@+ +------------------#
*-----------------=- %@@@@@@- +-------------+. %@@@@@%. .+------------------*
*------------------+. %@@@@= +---------------+ .%@@@%: +-------------------*
*-------------------=- . +-----------------+. . :+--------------------*
*---------------------=+: .=+---------------------+=. .++----------------------*
*------------------------------------------------------------------------------*
#------------------------------------------------------------------------------#
%+----------------------------------------------------------------------------+%
@#----------------------------------------------------------------------------#@
@%*--------------------------------------------------------------------------*%@
@@#=---------#@@@@@@@@@@@@@%%%%%%%%#########*#****+++=====------------------=#@@
@@@#---------=@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@+---------#@@@
@@@@#---------=@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@+---------#@@@@
@@@@@#=---------#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@%---------=#@@@@@
@@@@@@%*----------#@@@@@@@@@@@@%%%%#########%%%%@@@@@@@@@@@@@%=---------*%@@@@@@
@@@@@@@%#-----------+%@@@@@%***********************%@@@@@@%*-----------#%@@@@@@@
@@@@@@@@@%*-------------*%@%#**********************%@@%#=------------*%@@@@@@@@@
@@@@@@@@@@@#*----------------+#%%%%%%#####%%%%%%%#+----------------*#@@@@@@@@@@@
@@@@@@@@@@@@@%#--------------------------------------------------#%@@@@@@@@@@@@@
@@@@@@@@@@@@@@@%#*--------------------------------------------*#%@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@%#*--------------------------------------*#%@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@%#*=----------------------------=*#%@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@%##*+=--------------=+*##%@@@@@@@@@@@@@@@@@@@@@@@@@@@
Footnotes
This example is what would show if the string was printed↩︎