Introduction to Spliterator in Java

1. Overview

The Spliterator interface, introduced in Java 8, can be used for traversing and partitioning sequences. It’s a base utility for Streams, especially parallel ones.

In this article, we’ll cover its usage, characteristics, methods and how to create our own custom implementations.

2. Spliterator API

2.1. tryAdvance

This is the main method used for stepping through a sequence. The method takes a Consumer that’s used to consume elements of the Spliterator one by one sequentially and returns false if there’re no elements to be traversed.

Here, we’ll take a look at how to use it to traverse and partition elements.

First, let’s assume that we’ve got an ArrayList with 35000 articles and that Article class is defined as:

public class Article {
    private List<Author> listOfAuthors;
    private int id;
    private String name;
    
    // standard constructors/getters/setters
}

Now, let’s implement a task that processes the list of articles and adds a suffix of “– published by Baeldung” to each article name:

public String call() {
    int current = 0;
    while (spliterator.tryAdvance(a -> a.setName(article.getName()
      .concat("- published by Baeldung")))) {
        current++;
    }
    
    return Thread.currentThread().getName() + ":" + current;
}

Notice that this task outputs the number of articles processed when it finishes the execution.

Another key point is that we used tryAdvance() method to process the next element.

2.2. trySplit

Next, let’s split Spliterators (hence the name) and process partitions independently.

The trySplit method tries to split it into two parts. Then the caller process elements, and finally, the returned instance will process the others, allowing the two to be processed in parallel.

Let’s generate our list first:

public static List<Article> generateElements() {
    return Stream.generate(() -> new Article("Java"))
      .limit(35000)
      .collect(Collectors.toList());
}

Next, we obtain our Spliterator instance using the spliterator() method. Then we apply our trySplit() method:

@Test
public void givenSpliterator_whenAppliedToAListOfArticle_thenSplittedInHalf() {
    Spliterator<Article> split1 = Executor.generateElements().spliterator(); 
    Spliterator<Article> split2 = split1.trySplit(); 
    
    assertThat(new Task(split1).call()) 
      .containsSequence(Executor.generateElements().size() / 2 + ""); 
    assertThat(new Task(split2).call()) 
      .containsSequence(Executor.generateElements().size() / 2 + ""); 
}

The splitting process worked as intended and divided the records equally.

2.3. estimatedSize

The estimatedSize method gives us an estimated number of elements:

LOG.info("Size: " + split1.estimateSize());

This will output:

Size: 17500

2.4. hasCharacteristics

This API checks if the given characteristics match the properties of the Spliterator. Then if we invoke the method above, the output will be an int representation of those characteristics:

LOG.info("Characteristics: " + split1.characteristics());
Characteristics: 16464

3. Spliterator Characteristics

It has eight different characteristics that describe its behavior. Those can be used as hints for external tools:

  • SIZED  if it’s capable of returning an exact number of elements with the estimateSize() method
  • SORTED – if it’s iterating through a sorted source
  • SUBSIZED – if we split the instance using a trySplit() method and obtain Spliterators that are SIZED as well
  • CONCURRENT – if source can be safely modified concurrently
  • DISTINCT – if for each pair of encountered elements x, y, !x.equals(y)
  • IMMUTABLE – if elements held by source can’t be structurally modified
  • NONNULL – if source holds nulls or not
  • ORDERED – if iterating over an ordered sequence

4. A Custom Spliterator

4.1. When to Customize

First, let’s assume the following scenario:

We’ve got an article class with a list of authors, and the article that can have more than one author. Furthermore, we consider an author related to the article if his related article’s id matches article id.

Our Author class will look like the this:

public class Author {
    private String name;
    private int relatedArticleId;

    // standard getters, setters & constructors
}

Next, we’ll implement a class to count authors while traversing a stream of authors. Then the class will perform a reduction on the stream.

Let’s have a look at the class implementation:

public class RelatedAuthorCounter {
    private int counter;
    private boolean isRelated;
 
    // standard constructors/getters
 
    public RelatedAuthorCounter accumulate(Author author) {
        if (author.getRelatedArticleId() == 0) {
            return isRelated ? this : new RelatedAuthorCounter( counter, true);
        } else {
            return isRelated ? new RelatedAuthorCounter(counter + 1, false) : this;
        }
    }

    public RelatedAuthorCounter combine(RelatedAuthorCounter RelatedAuthorCounter) {
        return new RelatedAuthorCounter(
          counter + RelatedAuthorCounter.counter, 
          RelatedAuthorCounter.isRelated);
    }
}

Each method in the above class performs a specific operation to count while traversing.

First, the accumulate() method traverse the authors one by one in an iterative way, then combine() sums two counters using their values. Finally, the getCounter() returns the counter.

Now, to test what we’ve done so far. Let’s convert our article’s list of authors to a stream of authors:

Stream<Author> stream = article.getListOfAuthors().stream();

And implement a countAuthor() method to perform the reduction on the stream using RelatedAuthorCounter:

private int countAutors(Stream<Author> stream) {
    RelatedAuthorCounter wordCounter = stream.reduce(
      new RelatedAuthorCounter(0, true), 
      RelatedAuthorCounter::accumulate, 
      RelatedAuthorCounter::combine);
    return wordCounter.getCounter();
}

If we used a sequential stream the output will be as expected “count = 9”, however, the problem arises when we try to parallelize the operation.

Let’s take a look at the following test case:

@Test
void 
  givenAStreamOfAuthors_whenProcessedInParallel_countProducesWrongOutput() {
    assertThat(Executor.countAutors(stream.parallel())).isGreaterThan(9);
}

Apparently, something has gone wrong – splitting the stream at a random position caused an author to be counted twice.

4.2. How to Customize

To solve this, we need to implement a Spliterator that splits authors only when related id and articleId matches. Here’s the implementation of our custom Spliterator:

public class RelatedAuthorSpliterator implements Spliterator<Author> {
    private final List<Author> list;
    AtomicInteger current = new AtomicInteger();
    // standard constructor/getters

    @Override
    public boolean tryAdvance(Consumer<? super Author> action) {
        action.accept(list.get(current.getAndIncrement()));
        return current.get() < list.size();
    }

    @Override
    public Spliterator<Author> trySplit() {
        int currentSize = list.size() - current.get();
        if (currentSize < 10) {
            return null;
        }
        for (int splitPos = currentSize / 2 + current.intValue();
          splitPos < list.size(); splitPos++) {
            if (list.get(splitPos).getRelatedArticleId() == 0) {
                Spliterator<Author> spliterator
                  = new RelatedAuthorSpliterator(
                  list.subList(current.get(), splitPos));
                current.set(splitPos);
                return spliterator;
            }
        }
        return null;
   }

   @Override
   public long estimateSize() {
       return list.size() - current.get();
   }
 
   @Override
   public int characteristics() {
       return CONCURRENT;
   }
}

Now applying countAuthors() method will give the correct output. The following code demonstrates that:

@Test
public void
  givenAStreamOfAuthors_whenProcessedInParallel_countProducesRightOutput() {
    Stream<Author> stream2 = StreamSupport.stream(spliterator, true);
 
    assertThat(Executor.countAutors(stream2.parallel())).isEqualTo(9);
}

Also, the custom Spliterator is created from a list of authors and traverses through it by holding the current position.

Let’s discuss in more details the implementation of each method:

  • tryAdvance – passes authors to the Consumer at the current index position and increments its position
  • trySplit – defines the splitting mechanism, in our case, the RelatedAuthorSpliterator is created when ids matched, and the splitting divides the list into two parts
  • estimatedSize – is the difference between the list size and the position of currently iterated author
  • characteristics – returns the Spliterator characteristics, in our case SIZED as the value returned by the estimatedSize() method is exact; moreover, CONCURRENT indicates that the source of this Spliterator may be safely modified by other threads

5. Support for Primitive Values

The Spliterator API supports primitive values including doubleint and long.

The only difference between using a generic and a primitive dedicated Spliterator is the given Consumer and the type of the Spliterator.

For example, when we need it for an int value we need to pass an intConsumer. Furthermore, here’s a list of primitive dedicated Spliterators:

  • OfPrimitive<T, T_CONS, T_SPLITR extends Spliterator.OfPrimitive<T, T_CONS, T_SPLITR>>: parent interface for other primitives
  • OfInt: A Spliterator specialized for int
  • OfDouble: A Spliterator dedicated for double
  • OfLong: A Spliterator dedicated for long

6. Conclusion

In this article, we covered Java 8 Spliterator usage, methods, characteristics, splitting process, primitive support and how to customize it.

As always, the full implementation of this article can be found over on Github.

Related posts:

Java Program to Perform Deletion in a BST
Spring Boot - Cloud Configuration Server
Java Program to Use Boruvka’s Algorithm to Find the Minimum Spanning Tree
The Basics of Java Security
Spring Security 5 for Reactive Applications
Java Program to Perform Postorder Non-Recursive Traversal of a Given Binary Tree
Java Program to Implement Merge Sort Algorithm on Linked List
Adding Shutdown Hooks for JVM Applications
The Difference Between Collection.stream().forEach() and Collection.forEach()
Set Interface trong Java
Hướng dẫn Java Design Pattern – Interpreter
Binary Numbers in Java
Converting between an Array and a List in Java
Từ khóa throw và throws trong Java
Validate email address exists or not by Java Code
Java Program to Check Whether a Directed Graph Contains a Eulerian Cycle
Redirect to Different Pages after Login with Spring Security
Java Program to Implement Segment Tree
Using Spring ResponseEntity to Manipulate the HTTP Response
Configure a RestTemplate with RestTemplateBuilder
Java Program to Implement ConcurrentLinkedQueue API
ArrayList trong java
Simplify the DAO with Spring and Java Generics
Creating Docker Images with Spring Boot
Rest Web service: Filter và Interceptor với Jersey 2.x (P2)
Java Program to Perform Searching in a 2-Dimension K-D Tree
HTTP Authentification and CGI/Servlet
Java Program to Implement AttributeList API
Java Program to Implement Bubble Sort
Java Convenience Factory Methods for Collections
Filtering and Transforming Collections in Guava
Java – Write a Reader to File