Homework 8

due: Thursday April 24 at 11:59pm (submit on Gradescope here).

Goals:

In this assignment you'll start by using Java's internal TreeMap to solve a problem (Part 1), and then replace the use of the TreeMap with your own binary search tree called DIYTreeMap (Part 2). Finally, you'll balance your own DIYTreeMap so that it is an AVL Tree (Part 3).

Note that solving Part 1 is worth 10 points and Part 2 is worth 8 points. Part 3 is challenging and is only worth 2 points. Be sure to test your code after each part.

The starter code can be downloaded here.

Part 1: Practice using TreeMap [10 points]

Your main task is to find the words in some text that occur more than twice. Plural words are considered distinct from their singular versions. For example, shipping ship shipping shipping ships has a single word that occurs 3 times, and two other words that each appear once. Complete the getRepeatedWords (static) method to return an ArrayList<String> with the words that appear more than twice (i.e. at least three times). In this first example, the result would be [shipping].

Assume the input text is all in lower-case.

You can split text into individual words by saving the result of text.split(" ") to an array of Strings called words.

Start by using Java's TreeMap<String, Integer> to find the frequency of each word. Then find all words that occur more than twice, returned as an ArrayList<String> of words. These words don't need to be in any particular order. Hint: recall in Lecture 3W (slide 14) we used a HashMap to construct a frequency map. You can do something very similar here.

You can use the importText method to load the pale-blue-dot.txt file to test your algorithm. This file contains some text from Carl Sagan's book called "Pale Blue Dot". You can also try your method on other files! Maybe song lyrics? (though this might require generalizing how punctuation is processed).

Part 2: Replace any use of TreeMap with DIYTreeMap [8 points]

Replace all use of Java's TreeMap with your own binary search tree (BST) called DIYTreeMap (not necessarily balanced). You'll need to implement any BST methods your algorithm from Part 1 needs. At the very least, you should provide the following methods (so your DIYTreeMap can be tested on Gradescope):

One of the test cases uses the following tree (the nodes are printed as (key:value)):

└──(10:0)
│    └──(5:1)
│    │    └──(2:3)
│    │    │    └──(3:7)
│    │    └──(9:4)
│    └──(15:2)
│    │    └──(12:5)
│    │    └──(20:6)
│    │    │    └──(17:8)
│    │    │    │    └──(18:10)
│    │    │    └──(30:9)
│    │    │    │    └──(25:11)

In this example, the value corresponds to the order in which a key-value pair is added. This tree is not balanced (by the balancing requirements of an AVL tree), but the tree structure doesn't matter for the tests (it will only test if a key-value pair exists in the tree).

Part 3: Balance your DIYTreeMap [2 points]

Balance your tree map as necessary whenever an insertion is made (via put) to maintain an AVL tree. Review the notes from Lecture 9W for how this should be done. It would also be a good idea to review the worksheet from Lab 7, and also try the optional RotateRight puzzle. This might help with implementing the rotateRight and rotateLeft methods.

Use the steps from slide 14 in the 9W notes to determine which type of rotation should be performed (if any) after inserting a key-value pair into the map (via put).

One of the tests will add the nodes of the tree described in Part 2 in the order of each node value. Note that this should produce the same tree structure obtained from the insertions in Lab 7 (before removing the node with key 9).

Submission

Upload DIYTreeMap.java to Gradescope. Pre-submission check-list: