Hash Set as Hanging Folders

Programming Language

imperative / Java

Form

Analogy

Attribution — Origin / Source

Collected by Colleen Lewis — Own practice

Conceptual Advantage

Explains need for hashCode() and equals() methods when using HashSet

Mapping

PLNM
HashSet Box of hanging folders
bucket Single file folder
key Piece of paper in a file folder
value N/A (HashSet, not HashMap)
add(key) method Find correct folder, check if the folder already contains the key by comparing the key to all contents of the folder

Draws Attention To

Need for .hashCode() and .equals(Object o) methods

Use When

Teaching students to use the built-in HashSet Java library class

Cost

Used in class for ~10 minutes, but requires having a file folder bin

Details

Instructional Context: I teach a post-secondary course in which students learn about data structures and learn to program in Java and Racket. My lectures are twice a week for 75 minutes. My enrollment ranges from 50 to 150 students in each lecture section. I typically lecture for 5 to 10 minutes and then have students work in pairs to try to apply the newly introduce idea.

Use Why: I find that students think Hash Tables are baffling. It isn't that they have trouble using them, but they report not understanding them. I don't have a perfect explaination for what they think they don't understand, but since I've started using this technique, I've stopped getting the complaints that "Hash Tables make no sense!" Maybe the problem was just that I wasn't explaining them very well before!

Learning Goal: This explanation is focused on helping students to understand why we need to provide a hashCode() and equals(Object o) when they work with the library class HashSet in Java. In getting at why we need these it also introduces how a HashSet in Java works (assuming chaining), but I don't test students on how a HashSet works.

Why HashSet and not HashMap?: I just start with HashSet because it is simpler. I find that students don't have trouble translating what they learned to understand HashMap in Java.

Instructional Sequence Overview: I trace through what would happen with the following code that attempts to add 7 Strings to a HashSet. First, the code will add 3 Strings ("Bear" "Zebra" "Tiger") that each go in different buckets (i.e., folders). Second, we'll try to add something that's already in the HashSet. Finally, then we'll add three Strings ("Orange" "Dog" "Zoo") that all go in the same bucket (i.e., folder). This allows students to see a collision where the item is .equals and two where they are not.

import java.util.HashSet;
public class FileFolderHashSet {
  public static void main(String[] args) {
    HashSet folders = new HashSet();
    folders.add("Bear");
    folders.add("Zebra");
    folders.add("Tiger");
    folders.add("Bear");
    folders.add("Orange");
    folders.add("Dog");
    folders.add("Zoo");
  }
}

Tracing one call to the add method: This involves going through the following numbered steps. The first time I trace add I say all of the bullets under each step. Tracing through the add method after the first time, I still do each step, but I don't say all of the bullets.

  1. Read a line of code from the class FileFolderHashSet above (e.g., folders.add("Bear");).
    • I'm a HashSet named folder and someone wants to add the String "Bear".
  2. Write the String on a piece of paper.
    • Here's the String that I need to store, but I need a reliable way to remember where I put it.
    • First, I need to know what hashCode() returns for the String "Bear".
  3. Modify the code in the class HashCodeTest below (e.g., String s = "Bear";).
    • The add method in the HashSet class always calls hashCode() on the object it was passed.
    • We'll use the HashCodeTest class to figure out what hashCode() would return for the String "Bear".
  4. Run the code in the class HashCodeTest to get a value for the call to hashCode().
    • The program printed: "hashCode of "Bear" is 2066388"
    • Remember, I want to store the String "Bear", but I need to remember where I put it.
    • I have 10 folders, so I'll use the last digit of the hashCode to decide what folder to put it in.
    • I could have used a different scheme as long as I was consistent!
  5. Pull out the relevant folder (e.g., folder 8 for "Bear").
    • I want to store the String "Bear" here, but I need to make sure I haven't already stored the String "Bear".
  6. If the folder is empty, add the piece of paper with the String written on it.
    • Cool - I don't have anything in this folder. I can just add it.
    • What's most important is that I have a repeatable process so I coudl find it again.
  7. If the folder is not empty, pull out all of the papers from the relevant folder.
    • Okay - I need look at each paper and call .equals with it to see if I have already added it. If it’s no already there, I add the new paper.

public class HashCodeTest {
	public static void main(String[] args) {
		String s = "Bear";
		System.out.println("hashCode of \"" + s + "\" is " + s.hashCode());
	}
}

HashCode values Here are the values returned for each String as printed by the class HashCodeTest:

Comments or Feedback?

Do you have feedback on this notional machine? Did you find a mistake, or do you have a request for improvement? You can create an Issue on GitHub, where the description is hosted. This way we can see your feedback and address it.

For this, you need a GitHub account. Then follow this link to see the source file of this page. In there, click the ... left of the highlighted line, then pick "Reference in a new issue".

Create an Issue on GitHub