In this assignment you'll start by using Java's internal TreeMap to solve a problem (Part 1, 10 points), and then replace the use of the TreeMap with your own binary search tree called DIYTreeMap (Part 2, 8 points). Finally, you'll balance your own DIYTreeMap so that it is an AVL Tree (Part 3, 2 points).
Part 1: Practice usingTreeMap [10 points]
Your main task is to find the words in some text that occur more than twice. Plural words are considered distinct from their singular versions. For example, shipping ship shipping shipping ships has a single word that occurs 3 times, and two other words that each appear once. Complete the getRepeatedWords (static) method to return an ArrayList<String> with the words that appear more than twice (i.e. at least three times). In this first example, the result would be [shipping].
Assume the input text is all in lower-case.
You can split text into individual words by saving the result of text.split(" ") to an array of Strings called words.
Start by using Java’s TreeMap<String, Integer> to find the frequency of each word. Then find all words that occur more than twice, returned as an ArrayList<String> of words. These words don’t need to be in any particular order. Hint: recall in Lecture 3R (page 30) we used a HashMap to construct a frequency map. You can do something very similar here.
You can use the importText method to load the pale-blue-dot.txt file to test your algorithm. This file contains some text from Carl Sagan’s book called “Pale Blue Dot”. You can also try your method on other files! Maybe song lyrics? (though this might require generalizing how punctuation is processed).
Part 2: Replace any use of TreeMap with DIYTreeMap [8 points]
Replace all use of Java’s TreeMap with your own binary search tree (BST) called DIYTreeMap (not necessarily balanced). You’ll need to implement any BST methods your algorithm from Part 1 needs. At the very least, you should provide the following methods (so your DIYTreeMap can be tested on Gradescope):
public void put(K key, V value): adds a key-value pair to the tree. You can modify the signature to return the previously mapped value (if any) similar to what Oracle says the TreeMapput method should be, but void is okay for our purposes.
public V get(K key): retrieves the value at the Node with a particular key. Returns null if the key is not in the tree.
One of the test cases uses the following tree (the nodes are printed as (key:value)):
In this example, the value corresponds to the order in which a key-value pair is added. This tree is not balanced (by the balancing requirements of an AVL tree), but the tree structure doesn’t matter for the tests (it will only test if a key-value pair exists in the tree).
Part 3: Balance your DIYTreeMap [2 points]
Balance your tree map as necessary whenever an insertion is made (via put) to maintain an AVL tree. Review the notes from Lecture 9R for how this should be done. It would also be a good idea to review the worksheet from Lab 7.
Use the steps from Lecture 9R to determine which type of rotation should be performed (if any) after inserting a key-value pair into the map (via put).
One of the tests will add the nodes of the tree described in Part 2 in the order of each node value. Note that this should produce the same tree structure obtained from the insertions in Lab 7 (before removing the node with key 9).
Note that part 3 is challenging, and only worth a couple of points. Be sure to test your code after each part.
Test Cases
I strongly recommend that you write comprehensive test cases for your code in main. The test cases should cover part 1 (using a TreeMap) and part 2 (implementing your own DIYTreeMap). There are no automated tests to evaluate your test cases for this assignment, but I suggest that your tests include at least the following:
getRepeatedWords is called on a string that has at least one repeated word and one non-repeated word
a DIYTreeMap is initialized
get is called on a DIYTreeMap
put is called at least twice on a DIYTreeMap. One of the calls should be putting an key that is already present with a new value. You likely want to call put more than twice to ensure that your items are being inserted in the correct order.
Documentation, Style, and Design
Documentation and Style
Checkstyle will run automatically when you submit to gradescope; up to 10% will be deducted by the autograder from the programming portion of your grade if you have > 2 types of checkstyle problems. Remember you can see more details on checkstyle on this page, including how to run it in vscode.
Note that because starter code is not provided for this assignment, you will need to write all of your javadoc comments from scratch!
Design
I will do a manual review for design. Up to 5% may be deducted from your final score due to design problems. This will also include checking the quality (rather than just the form) of your javadoc comments!