A Guide to HashSet in Java

1. Overview

In this article, we’ll dive into HashSet. It’s one of the most popular Set implementations as well as an integral part of the Java Collections Framework.

2. Intro to HashSet

HashSet is one of the fundamental data structures in the Java Collections API.

Let’s recall the most important aspects of this implementation:

  • It stores unique elements and permits nulls
  • It’s backed by a HashMap
  • It doesn’t maintain insertion order
  • It’s not thread-safe

Note that this internal HashMap gets initialized when an instance of the HashSet is created:

public HashSet() {
    map = new HashMap<>();
}

If you want to go deeper into how the HashMap works, you can read the article focused on it here.

3. The API

In this section, we’re going to review most commonly used methods and have a look at some simple examples.

3.1. add()

The add() method can be used for adding elements to a set. The method contract states that an element will be added only when it isn’t already present in a set. If an element was added, the method returns true, otherwise – false.

We can add an element to a HashSet like:

@Test
public void whenAddingElement_shouldAddElement() {
    Set<String> hashset = new HashSet<>();
 
    assertTrue(hashset.add("String Added"));
}

From an implementation perspective, the add method is an extremely important one. Implementation details illustrate how the HashSet works internally and leverages the HashMap’s put method:

public boolean add(E e) {
    return map.put(e, PRESENT) == null;
}

The map variable is a reference to the internal, backing HashMap:

private transient HashMap<E, Object> map;

It’d be a good idea to get familiar with the hashcode first to get a detailed understanding of how the elements are organized in hash-based data structures.

Summarizing:

  • HashMap is an array of buckets with a default capacity of 16 elements – each bucket corresponds to a different hashcode value
  • If various objects have the same hashcode value, they get stored in a single bucket
  • If the load factor is reached, a new array gets created twice the size of the previous one and all elements get rehashed and redistributed among new corresponding buckets
  • To retrieve a value, we hash a key, mod it, and then go to a corresponding bucket and search through the potential linked list in case of there’s more than a one object

3.2. contains()

The purpose of the contains method is to check if an element is present in a given HashSetIt returns true if the element is found, otherwise false.

We can check for an element in the HashSet:

@Test
public void whenCheckingForElement_shouldSearchForElement() {
    Set<String> hashsetContains = new HashSet<>();
    hashsetContains.add("String Added");
 
    assertTrue(hashsetContains.contains("String Added"));
}

Whenever an object is passed to this method, the hash value gets calculated. Then, the corresponding bucket location gets resolved and traversed.

3.3. remove()

The method removes the specified element from the set if it’s present. This method returns true if a set contained the specified element.

Let’s see a working example:

@Test
public void whenRemovingElement_shouldRemoveElement() {
    Set<String> removeFromHashSet = new HashSet<>();
    removeFromHashSet.add("String Added");
 
    assertTrue(removeFromHashSet.remove("String Added"));
}

3.4. clear()

We use this method when we intend to remove all the items from a set. The underlying implementation simply clears all elements from the underlying HashMap.

Let’s see that in action:

@Test
public void whenClearingHashSet_shouldClearHashSet() {
    Set<String> clearHashSet = new HashSet<>();
    clearHashSet.add("String Added");
    clearHashSet.clear();
    
    assertTrue(clearHashSet.isEmpty());
}

3.5. size()

This is one of the fundamental methods in the API. It’s used heavily as it helps in identifying the number of elements present in the HashSet. The underlying implementation simply delegates the calculation to the HashMap’s size() method.

Let’s see that in action:

@Test
public void whenCheckingTheSizeOfHashSet_shouldReturnThesize() {
    Set<String> hashSetSize = new HashSet<>();
    hashSetSize.add("String Added");
    
    assertEquals(1, hashSetSize.size());
}

3.6. isEmpty()

We can use this method to figure if a given instance of a HashSet is empty or not. This method returns true if the set contains no elements:

@Test
public void whenCheckingForEmptyHashSet_shouldCheckForEmpty() {
    Set<String> emptyHashSet = new HashSet<>();
    
    assertTrue(emptyHashSet.isEmpty());
}

3.7. iterator()

The method returns an iterator over the elements in the SetThe elements are visited in no particular order and iterators are fail-fast.

We can observe the random iteration order here:

@Test
public void whenIteratingHashSet_shouldIterateHashSet() {
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while(itr.hasNext()){
        System.out.println(itr.next());
    }
}

If the set is modified at any time after the iterator is created in any way except through the iterator’s own remove method, the Iterator throws a ConcurrentModificationException.

Let’s see that in action:

@Test(expected = ConcurrentModificationException.class)
public void whenModifyingHashSetWhileIterating_shouldThrowException() {
 
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while (itr.hasNext()) {
        itr.next();
        hashset.remove("Second");
    }
}

Alternatively, had we used the iterator’s remove method, then we wouldn’t have encountered the exception:

@Test
public void whenRemovingElementUsingIterator_shouldRemoveElement() {
 
    Set<String> hashset = new HashSet<>();
    hashset.add("First");
    hashset.add("Second");
    hashset.add("Third");
    Iterator<String> itr = hashset.iterator();
    while (itr.hasNext()) {
        String element = itr.next();
        if (element.equals("Second"))
            itr.remove();
    }
 
    assertEquals(2, hashset.size());
}

The fail-fast behavior of an iterator cannot be guaranteed as it’s impossible to make any hard guarantees in the presence of unsynchronized concurrent modification.

Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it’d be wrong to write a program that depended on this exception for its correctness.

4. How HashSet Maintains Uniqueness?

When we put an object into a HashSet, it uses the object’s hashcode value to determine if an element is not in the set already.

Each hash code value corresponds to a certain bucket location which can contain various elements, for which the calculated hash value is the same. But two objects with the same hashCode might not be equal.

So, objects within the same bucket will be compared using the equals() method.

5. Performance of HashSet

The performance of a HashSet is affected mainly by two parameters – its Initial Capacity and the Load Factor.

The expected time complexity of adding an element to a set is O(1) which can drop to O(n) in the worst case scenario (only one bucket present) – therefore, it’s essential to maintain the right HashSet’s capacity.

An important note: since JDK 8, the worst case time complexity is O(log*n).

The load factor describes what is the maximum fill level, above which, a set will need to be resized.

We can also create a HashSet with custom values for initial capacity and load factor:

Set<String> hashset = new HashSet<>();
Set<String> hashset = new HashSet<>(20);
Set<String> hashset = new HashSet<>(20, 0.5f);

In the first case, the default values are used – the initial capacity of 16 and the load factor of 0.75. In the second, we override the default capacity and in the third one, we override both.

A low initial capacity reduces space complexity but increases the frequency of rehashing which is an expensive process.

On the other hand, a high initial capacity increases the cost of iteration and the initial memory consumption.

As a rule of thumb:

  • A high initial capacity is good for a large number of entries coupled with little to no iteration
  • A low initial capacity is good for few entries with a lot of iteration

It’s, therefore, very important to strike the correct balance between the two. Usually, the default implementation is optimized and works just fine, should we feel the need to tune these parameters to suit the requirements, we need to do judiciously.

6. Conclusion

In this article, we outlined the utility of a HashSet, its purpose as well as its underlying working. We saw how efficient it is in terms of usability given its constant time performance and ability to avoid duplicates.

We studied some of the important methods from the API, how they can help us as a developer to use a HashSet to its potential.

As always, code snippets can be found over on GitHub.

Related posts:

Java Program to Perform integer Partition for a Specific Case
Java Program to Find Whether a Path Exists Between 2 Given Nodes
Java Program to Implement Pairing Heap
Hướng dẫn kết nối cơ sở dữ liệu với Java JDBC
Simplify the DAO with Spring and Java Generics
How to Read a Large File Efficiently with Java
Creating Docker Images with Spring Boot
Java Program to Find All Pairs Shortest Path
Java Program to implement Priority Queue
Java Program to Implement Hamiltonian Cycle Algorithm
Filtering and Transforming Collections in Guava
Java Program to Perform Preorder Non-Recursive Traversal of a Given Binary Tree
Lập trình đa luồng trong Java (Java Multi-threading)
Spring WebFlux Filters
Working with Tree Model Nodes in Jackson
Hướng dẫn Java Design Pattern – Transfer Object
A Guide to WatchService in Java NIO2
The Modulo Operator in Java
The Dining Philosophers Problem in Java
How to Use if/else Logic in Java 8 Streams
Java Program to Search Number Using Divide and Conquer with the Aid of Fibonacci Numbers
Java Program to implement Dynamic Array
Java Program to Find k Numbers Closest to Median of S, Where S is a Set of n Numbers
Spring Cloud – Adding Angular
Performance Difference Between save() and saveAll() in Spring Data
Java Program to do a Depth First Search/Traversal on a graph non-recursively
Introduction to Spring Cloud CLI
Error Handling for REST with Spring
Updating your Password
Simple Single Sign-On with Spring Security OAuth2
Java InputStream to Byte Array and ByteBuffer
The HttpMediaTypeNotAcceptableException in Spring MVC