Lab 8
Objectives
- Learn to read data from a file
- Practice working with complex data structures
- Write data to csv and json files
Part 0: Getting started
To begin, download the file lab08.zip
by clicking the blue “Download Starter Files” button at the top of the lab, and extract the files from the zip file into a directory called lab08
inside your labs
directory. The starter code contains two data files: example.yaml
and state_population.yaml
. The examples on this page refer to example.yaml
because it is shorter; however, your code should all work when tested on state_population.yaml
as well! The data was downloaded from the US Census Bureau and reformatted for this assignment.
Part 1: Reading a YAML file
In the starter code, we have provided you with a .yaml
file. YAML is a text file format that is commonly used for configuration files. Like Python, indentation is important in a yaml file, as it is used to indicate nesting. Take the following .yaml
file as an example, which is example.yaml
in the starter code:
- name: North Dakota
pop_2020: 779563
pop_2024: 796568
- name: Vermont
pop_2020: 642977
pop_2024: 648493
The indentation indicates to us that the 2024 population specified on line 3 of the file is the population of North Dakota rather than the population of Vermont.
Your task for the first part of the assignment is to read the YAML file example.yaml
and return a dictionary structure that looks like this:
{
"North Dakota": {
"pop_2020": 779563,
"pop_2024": 796568
},
"Vermont": {
"pop_2020": 642977,
"pop_2024": 648493
}
}
You should also test your code on state_population.yaml
. The goal of this assignment is not to write a complete yaml parser, but to accurately parse a file with the format shown above.
Steps
- Step 1: Define the function
read_yaml
and make sure it takes a list argumentyaml_filename
. Don’t forget your docstring! - Step 2: Following the process we demonstrated in class, use
with
to open a file and print each of the lines in the file using a loop (for line in f
). - Step 3: Create an empty dictionary before your loop; you will add data to this dictionary as you go.
- Step 4: We’ll start by thinking about what the keys of our dictionary should be: the state names. In your loop, identify lines that start with
"- name: "
using a conditional. Then, use slicing to print out just the state names. Remember to remove the newline at the end if necessary. - Step 5: Add a variable
current_state
that keeps track of the most recent state read from the file. When you find a new state, add it as a key in your dictionary, with an empty dictionary as the value. - Step 6: You’ve written an
if
block that handles when the lines are states; now, write anelse
block for the other case. If a line doesn’t refer to a state, we can assume that it’s a population2.- First, use
.strip()
and.split(sep)
to split the line into a key (e.g.,pop_2020
) and a value (e.g.,642977
). It’s left as an exercise for you to figure out what thesep
should be. - Then, add the key-value pair to the inner dictionary. Remember that the population should be an integer, so make sure to complete any necessary type conversions!
- First, use
- Step 7: Return your dictionary. Test your function and ensure that it works as expected before moving on!
Part 2: Computing Population Change
Your next goal will be to compute the change in population between the two years specified in the yaml file. Your function will take in the dictionary that you created in the last part as input; remember a smaller version of that dictionary looks like this:
{
"North Dakota": {
"pop_2020": 779563,
"pop_2024": 796568
},
"Vermont": {
"pop_2020": 642977,
"pop_2024": 648493
}
}
You should return the following dictionary:
Steps
- Step 1: Define the function
compute_population_change
and make sure it takes a dictionary argumentyearly_populations
. Don’t forget your docstring! - Step 2: Create a new empty dictionary that will contain the population changes
- Step 3: Write a loop that goes through the key-value pairs in your dictionary. Set the value for each state to be the difference in
pop_2024
andpop_2020
for that state. - Step 4: Return your dictionary. Test your function and ensure that it works as expected before moving on!
Part 3: Outputting a CSV file
We talked about the CSV (comma separated values) file format in class; remember that a CSV is a tabular data format. A CSV file has a header row specifying columns, and records are separated by newlines, with fields separated by commas.
For this part of the lab, you are going to take the dictionary that you produced in the last part as input, and output a CSV file that looks like this:
You could probably do this by just writing to the file directly, but we’d like you to practice using the csv
module. You must use csv.writer
as described below to complete this part of the lab (rather than csv.DictWriter
).
Steps
- Step 1: Import the
csv
module at the top of your script. - Step 2: Define the function
write_population_csv
and make sure it takes a dictionary argumentpopulation_changes
and a string argumentcsv_filename
. Don’t forget your docstring! - Step 3: Remember that
csv.writer
allows us to write data to a CSV file line-by line by specifying records in lists, e.g.,["Vermont", 5516]
. Unfortunately, that’s not the format our data is in - we have a dictionary.3 So, we’ll start by massaging our data into a format that’s more like what’s expected bycsv.writer
- First, create a list that will contain our CSV data. Add the header row (
["state", "pop_change"]
) to the list. - Then, add each of the key-value pairs to your list as lists of the format [STATE, POPULATION]
- Finally, print out your list - confirm that it has the expected contents before continuing. For our small example, it should look like this:
- First, create a list that will contain our CSV data. Add the header row (
- Step 4: Open the file specified by the argument
csv_filename
in write mode, and create acsv.writer
to write to that file - Step 5: Iterate through each row in your list that you created in step 2, and write it using
writer.writerow
- Step 6: Test your function by calling it in the shell. Your function shouldn’t return anything, but you should find an output file in the same directory as
lab08.py
with the correct data.
Part 4: Outputting a JSON file
The other file format we talked about in class was JSON (JavaScript Object Notation); remember that the JSON format looks much more like Python dictionaries or lists. We’re going to write a JSON file, and because of the similarities to Python data structures, we’ll have to do less work than we did to create our CSV file. In the end, your output should be a JSON file that looks like this (newlines provided only for readability):4
Steps
- Step 1: Import the
json
module at the top of your script. - Step 2: Define the function
write_population_json
and make sure it takes a dictionary argumentpopulation_changes
and a string argumentjson_filename
. Don’t forget your docstring! - Step 3: Open the file specified by the argument
json_filename
in write mode, usejson.dump
to writepopulation_changes
to that file. - Step 4: That’s it! Test your function by calling it in the shell. Your function shouldn’t return anything, but you should find an output file in the same directory as
lab08.py
with the correct data.
Turning in your work
One you are done, submit lab08.py
to gradescope and ensure that you have passed all of the tests. Your file should contain four functions: read_yaml
, compute_population_change
, write_poulation_csv
, and write_population_json
. Each function should have a docstring.