Lab 8

Published

April 18, 2025

Download Starter Files Submit on Gradescope

Objectives

  • Learn to read data from a file
  • Practice working with complex data structures
  • Write data to csv and json files

Part 0: Getting started

To begin, download the file lab08.zip by clicking the blue “Download Starter Files” button at the top of the lab, and extract the files from the zip file into a directory called lab08 inside your labs directory. The starter code contains two data files: example.yaml and state_population.yaml. The examples on this page refer to example.yaml because it is shorter; however, your code should all work when tested on state_population.yaml as well! The data was downloaded from the US Census Bureau and reformatted for this assignment.

Part 1: Reading a YAML file

In the starter code, we have provided you with a .yaml file. YAML is a text file format that is commonly used for configuration files. Like Python, indentation is important in a yaml file, as it is used to indicate nesting. Take the following .yaml file as an example, which is example.yaml in the starter code:

- name: North Dakota
  pop_2020: 779563
  pop_2024: 796568
- name: Vermont
  pop_2020: 642977
  pop_2024: 648493

The indentation indicates to us that the 2024 population specified on line 3 of the file is the population of North Dakota rather than the population of Vermont.

Your task for the first part of the assignment is to read the YAML file example.yaml and return a dictionary structure that looks like this:

{
    "North Dakota": {
        "pop_2020": 779563,
        "pop_2024": 796568
    },
    "Vermont": {
        "pop_2020": 642977,
        "pop_2024": 648493
    }
}

You should also test your code on state_population.yaml. The goal of this assignment is not to write a complete yaml parser, but to accurately parse a file with the format shown above.

External Libraries

pyyaml is an external library1 for parsing yaml files. You may not use it on this assignment.

Steps

  • Step 1: Define the function read_yaml and make sure it takes a list argument yaml_filename. Don’t forget your docstring!
  • Step 2: Following the process we demonstrated in class, use with to open a file and print each of the lines in the file using a loop (for line in f).
  • Step 3: Create an empty dictionary before your loop; you will add data to this dictionary as you go.
  • Step 4: We’ll start by thinking about what the keys of our dictionary should be: the state names. In your loop, identify lines that start with "- name: " using a conditional. Then, use slicing to print out just the state names. Remember to remove the newline at the end if necessary.
  • Step 5: Add a variable current_state that keeps track of the most recent state read from the file. When you find a new state, add it as a key in your dictionary, with an empty dictionary as the value.
  • Step 6: You’ve written an if block that handles when the lines are states; now, write an else block for the other case. If a line doesn’t refer to a state, we can assume that it’s a population2.
    • First, use .strip() and .split(sep) to split the line into a key (e.g., pop_2020) and a value (e.g., 642977). It’s left as an exercise for you to figure out what the sep should be.
    • Then, add the key-value pair to the inner dictionary. Remember that the population should be an integer, so make sure to complete any necessary type conversions!
  • Step 7: Return your dictionary. Test your function and ensure that it works as expected before moving on!

Part 2: Computing Population Change

Your next goal will be to compute the change in population between the two years specified in the yaml file. Your function will take in the dictionary that you created in the last part as input; remember a smaller version of that dictionary looks like this:

{
    "North Dakota": {
        "pop_2020": 779563,
        "pop_2024": 796568
    },
    "Vermont": {
        "pop_2020": 642977,
        "pop_2024": 648493
    }
}

You should return the following dictionary:

{
    "North Dakota": 17005,
    "Vermont": 5516
}

Steps

  • Step 1: Define the function compute_population_change and make sure it takes a dictionary argument yearly_populations. Don’t forget your docstring!
  • Step 2: Create a new empty dictionary that will contain the population changes
  • Step 3: Write a loop that goes through the key-value pairs in your dictionary. Set the value for each state to be the difference in pop_2024 and pop_2020 for that state.
  • Step 4: Return your dictionary. Test your function and ensure that it works as expected before moving on!

Part 3: Outputting a CSV file

We talked about the CSV (comma separated values) file format in class; remember that a CSV is a tabular data format. A CSV file has a header row specifying columns, and records are separated by newlines, with fields separated by commas.

For this part of the lab, you are going to take the dictionary that you produced in the last part as input, and output a CSV file that looks like this:

state,pop_change
North Dakota,17005
Vermont,5516
The CSV module

You could probably do this by just writing to the file directly, but we’d like you to practice using the csv module. You must use csv.writer as described below to complete this part of the lab (rather than csv.DictWriter).

Steps

  • Step 1: Import the csv module at the top of your script.
  • Step 2: Define the function write_population_csv and make sure it takes a dictionary argument population_changes and a string argument csv_filename. Don’t forget your docstring!
  • Step 3: Remember that csv.writer allows us to write data to a CSV file line-by line by specifying records in lists, e.g., ["Vermont", 5516]. Unfortunately, that’s not the format our data is in - we have a dictionary.3 So, we’ll start by massaging our data into a format that’s more like what’s expected by csv.writer
    • First, create a list that will contain our CSV data. Add the header row (["state", "pop_change"]) to the list.
    • Then, add each of the key-value pairs to your list as lists of the format [STATE, POPULATION]
    • Finally, print out your list - confirm that it has the expected contents before continuing. For our small example, it should look like this:
    [['state', 'pop_change'], ['North Dakota', 17005], ['Vermont', 5516]]
  • Step 4: Open the file specified by the argument csv_filename in write mode, and create a csv.writer to write to that file
  • Step 5: Iterate through each row in your list that you created in step 2, and write it using writer.writerow
  • Step 6: Test your function by calling it in the shell. Your function shouldn’t return anything, but you should find an output file in the same directory as lab08.py with the correct data.

Part 4: Outputting a JSON file

The other file format we talked about in class was JSON (JavaScript Object Notation); remember that the JSON format looks much more like Python dictionaries or lists. We’re going to write a JSON file, and because of the similarities to Python data structures, we’ll have to do less work than we did to create our CSV file. In the end, your output should be a JSON file that looks like this (newlines provided only for readability):4

{
    "North Dakota": 17005,
    "Vermont": 5516
}

Steps

  • Step 1: Import the json module at the top of your script.
  • Step 2: Define the function write_population_json and make sure it takes a dictionary argument population_changes and a string argument json_filename. Don’t forget your docstring!
  • Step 3: Open the file specified by the argument json_filename in write mode, use json.dump to write population_changes to that file.
  • Step 4: That’s it! Test your function by calling it in the shell. Your function shouldn’t return anything, but you should find an output file in the same directory as lab08.py with the correct data.

Turning in your work

One you are done, submit lab08.py to gradescope and ensure that you have passed all of the tests. Your file should contain four functions: read_yaml, compute_population_change, write_poulation_csv, and write_population_json. Each function should have a docstring.

Footnotes

  1. you’d need to install it like midd_media↩︎

  2. In this simplified yaml subset that we are working with↩︎

  3. And it differs from the format expected by csv.DictWriter too↩︎

  4. Yes, this does look exactly like the dictionary from the previous part↩︎