Encode a String to UTF-8 in Java

1. Overview

When dealing with Strings in Java, sometimes we need to encode them into a specific charset.

This tutorial is a practical guide showing different ways to encode a String to the UTF-8 charset; for a more technical deep-dive see our Guide to Character Encoding.

2. Defining the Problem

To showcase the Java encoding, we’ll work with the German String “Entwickeln Sie mit Vergnügen”.

String germanString = "Entwickeln Sie mit Vergnügen";
byte[] germanBytes = germanString.getBytes();

String asciiEncodedString = new String(germanBytes, StandardCharsets.US_ASCII);

assertNotEquals(asciiEncodedString, germanString);

This String encoded using US_ASCII gives us the value “Entwickeln Sie mit Vergn?gen” when printed, because it doesn’t understand the non-ASCII ü character. But when we convert an ASCII-encoded String that uses all English characters to UTF-8, we get the same string.

String englishString = "Develop with pleasure";
byte[] englishBytes = englishString.getBytes();

String asciiEncondedEnglishString = new String(englishBytes, StandardCharsets.US_ASCII);

assertEquals(asciiEncondedEnglishString, englishString);

Let’s see what happens when we use the UTF-8 encoding.

3. Encoding With Core Java

Let’s start with the core library.

Strings are immutable in Java, which means we cannot change a String character encoding. To achieve what we want, we need to copy the bytes of the String and then create a new one with the desired encoding.

First, we get the String bytes and, after that, create a new one using the retrieved bytes and the desired charset:

String rawString = "Entwickeln Sie mit Vergnügen";
byte[] bytes = rawString.getBytes(StandardCharsets.UTF_8);

String utf8EncodedString = new String(bytes, StandardCharsets.UTF_8);

assertEquals(rawString, utf8EncodedString);

4. Encoding With Java 7 StandardCharsets

Alternatively, we can use the StandardCharsets class introduced in Java 7 to encode the String.

First, we’ll decode the String into bytes and, secondly, encode the String to UTF-8:

String rawString = "Entwickeln Sie mit Vergnügen";
ByteBuffer buffer = StandardCharsets.UTF_8.encode(rawString); 

String utf8EncodedString = StandardCharsets.UTF_8.decode(buffer).toString();

assertEquals(rawString, utf8EncodedString);

5. Encoding With Commons-Codec

Besides using core Java, we can alternatively use Apache Commons Codec to achieve the same results.

Apache Commons Codec is a handy package containing simple encoders and decoders for various formats.

First, let’s start with the project configuration. When using Maven, we have to add the commons-codec dependency to our pom.xml:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.14</version>
</dependency>

Then, in our case, the most interesting class is StringUtils, which provides methods to encode Strings. Using this class, getting a UTF-8 encoded String is pretty straightforward:

String rawString = "Entwickeln Sie mit Vergnügen"; 
byte[] bytes = StringUtils.getBytesUtf8(rawString);
 
String utf8EncodedString = StringUtils.newStringUtf8(bytes);

assertEquals(rawString, utf8EncodedString);

6. Conclusion

Encoding a String into UTF-8 isn’t difficult, but it’s not that intuitive. This tutorial presents three ways of doing it, either using core Java or using Apache Commons Codec.

As always, the code samples can be found over on GitHub.

Related posts:

Overflow and Underflow in Java
Spring Security – Reset Your Password
Introduction to the Java NIO2 File API
The Difference Between map() and flatMap()
Java Program to Implement the String Search Algorithm for Short Text Sizes
Optional trong Java 8
Spring Boot Gradle Plugin
Java Program to Implement wheel Sieve to Generate Prime Numbers Between Given Range
Java Program to Generate Random Numbers Using Middle Square Method
Java Program to Implement Cubic convergence 1/pi Algorithm
Java – Reader to String
Removing all Nulls from a List in Java
Java Program to Implement Iterative Deepening
Injecting Prototype Beans into a Singleton Instance in Spring
Java Program to Describe the Representation of Graph using Incidence List
Create a Custom Auto-Configuration with Spring Boot
Split a String in Java
Implementing a Runnable vs Extending a Thread
Java Program to Generate Random Partition out of a Given Set of Numbers or Characters
Một số ký tự đặc biệt trong Java
Python String capitalize()
What is a POJO Class?
Java Program to Generate Random Numbers Using Multiply with Carry Method
A Guide to Java 9 Modularity
Hướng dẫn sử dụng Java Annotation
Các chương trình minh họa sử dụng Cấu trúc điều khiển trong Java
Java Program to Delete a Particular Node in a Tree Without Using Recursion
Python String upper()
Java Program to Implement Dijkstra’s Algorithm using Queue
Hướng dẫn Java Design Pattern – Template Method
Java Program to Implement Stack
Finding Max/Min of a List or Collection