Introduction to Apache Commons Text

1. Overview

Simply put, the Apache Commons Text library contains a number of useful utility methods for working with Strings, beyond what the core Java offers.

In this quick introduction, we’ll see what Apache Commons Text is, and what it is used for, as well as some practical examples of using the library.

2. Maven Dependency

Let’s start by adding the following Maven dependency to our pom.xml:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-text</artifactId>
    <version>1.1</version>
</dependency>

You can find the latest version of the library at the Maven Central Repository.

3. Overview

The root package org.apache.commons.text is divided into different sub-packages:

  • org.apache.commons.text.diff – diffs between Strings
  • org.apache.commons.text.similarity – similarities and distances between Strings
  • org.apache.commons.text.translate – translating text

Let’s see what each package can be used for – in more detail.

3. Handling Text

The org.apache.commons.text package contains multiple tools for working with Strings.

For instance, WordUtils has APIs capable of capitalizing the first letter of each word in a String, swapping the case of a String, and checking if a String contains all words in a given array.

Let’s see how we can capitalize the first letter of each word in a String:

@Test
public void whenCapitalized_thenCorrect() {
    String toBeCapitalized = "to be capitalized!";
    String result = WordUtils.capitalize(toBeCapitalized);
    
    assertEquals("To Be Capitalized!", result);
}

Here is how we can check if a string contains all words in an array:

@Test
public void whenContainsWords_thenCorrect() {
    boolean containsWords = WordUtils
      .containsAllWords("String to search", "to", "search");
    
    assertTrue(containsWords);
}

StrSubstitutor provides a convenient way to building Strings from templates:

@Test
public void whenSubstituted_thenCorrect() {
    Map<String, String> substitutes = new HashMap<>();
    substitutes.put("name", "John");
    substitutes.put("college", "University of Stanford");
    String templateString = "My name is ${name} and I am a student at the ${college}.";
    StrSubstitutor sub = new StrSubstitutor(substitutes);
    String result = sub.replace(templateString);
    
    assertEquals("My name is John and I am a student at the University of Stanford.", result);
}

StrBuilder is an alternative to Java.lang.StringBuilder. It provides some new features which are not provided by StringBuilder.

For example, we can replace all occurrences of a String in another String or clear a String without assigning a new object to its reference.

Here’s a quick example to replace part of a String:

@Test
public void whenReplaced_thenCorrect() {
    StrBuilder strBuilder = new StrBuilder("example StrBuilder!");
    strBuilder.replaceAll("example", "new");
   
    assertEquals(new StrBuilder("new StrBuilder!"), strBuilder);
}

To clear a String, we can simply do that by calling the clear() method on the builder:

strBuilder.clear();

4. Calculating the Diff Between Strings

The package org.apache.commons.text.diff implements Myers algorithm for calculating diffs between two Strings.

The diff between two Strings is defined by a sequence of modifications that can convert one String to another.

There are three types of commands that can be used to convert a String to another – InsertCommand, KeepCommand, and DeleteCommand.

An EditScript object holds the script that should be run in order to convert a String to another. Let’s calculate the number of single-char modifications that should be made in order to convert a String to another:

@Test
public void whenEditScript_thenCorrect() {
    StringsComparator cmp = new StringsComparator("ABCFGH", "BCDEFG");
    EditScript<Character> script = cmp.getScript();
    int mod = script.getModifications();
    
    assertEquals(4, mod);
}

5. Similarities and Distances Between Strings

The org.apache.commons.text.similarity package contains algorithms useful for finding similarities and distances between Strings.

For example, LongestCommonSubsequence can be used to find the number of common characters in two Strings:

@Test
public void whenCompare_thenCorrect() {
    LongestCommonSubsequence lcs = new LongestCommonSubsequence();
    int countLcs = lcs.apply("New York", "New Hampshire");
    
    assertEquals(5, countLcs);
}

Similarly, LongestCommonSubsequenceDistance can be used to find the number of different characters in two Strings:

@Test
public void whenCalculateDistance_thenCorrect() {
    LongestCommonSubsequenceDistance lcsd = new LongestCommonSubsequenceDistance();
    int countLcsd = lcsd.apply("New York", "New Hampshire");
    
    assertEquals(11, countLcsd);
}

6. Text Translation

The org.apache.text.translate package was initially created to allow us to customize the rules provided by StringEscapeUtils.

The package has a set of classes which are responsible for translating text to some of the different character encoding models such as Unicode and Numeric Character Reference. We can also create our own customized routines for translation.

Let’s see how we can convert a String to its equivalent Unicode text:

@Test
public void whenTranslate_thenCorrect() {
    UnicodeEscaper ue = UnicodeEscaper.above(0);
    String result = ue.translate("ABCD");
    
    assertEquals("\\u0041\\u0042\\u0043\\u0044", result);
}

Here, we are passing the index of the character that we want to start translation from to the above() method.

LookupTranslator enables us to define our own lookup table where each character can have a corresponding value, and we can translate any text to its corresponding equivalent.

7. Conclusion

In this quick tutorial, we’ve seen an overview of what Apache Commons Text is all about and some of its common features.

The code samples can be found over on GitHub.

Related posts:

Calling Stored Procedures from Spring Data JPA Repositories
Hướng dẫn sử dụng biểu thức chính quy (Regular Expression) trong Java
Spring Cloud Bus
Java Program to Create a Minimal Set of All Edges Whose Addition will Convert it to a Strongly Conne...
@DynamicUpdate with Spring Data JPA
A Guide to the finalize Method in Java
Using JWT with Spring Security OAuth (legacy stack)
Java Program to Solve a Matching Problem for a Given Specific Case
Pagination and Sorting using Spring Data JPA
Giới thiệu Java Service Provider Interface (SPI) – Tạo các ứng dụng Java dễ mở rộng
Jackson – Marshall String to JsonNode
Spring Cloud – Securing Services
Guide to the Fork/Join Framework in Java
Connect through a Proxy
RegEx for matching Date Pattern in Java
Spring WebFlux Filters
Configuring a DataSource Programmatically in Spring Boot
JWT – Token-based Authentication trong Jersey 2.x
Lớp Properties trong java
Java Program to Construct K-D Tree for 2 Dimensional Data
An Intro to Spring Cloud Zookeeper
Comparing Dates in Java
Spring WebClient vs. RestTemplate
Java Program to Implement the Checksum Method for Small String Messages and Detect
Tiêu chuẩn coding trong Java (Coding Standards)
Create Java Applet to Simulate Any Sorting Technique
Spring Boot - Service Components
Java Program to Implement Sorted Singly Linked List
Servlet 3 Async Support with Spring MVC and Spring Security
A Custom Data Binder in Spring MVC
JUnit 5 for Kotlin Developers
Java Program to Implement Sieve Of Eratosthenes