Why isn’t String‘s .length() accurate?

Technology CommunityCategory: JavaWhy isn’t String‘s .length() accurate?
VietMX Staff asked 3 years ago

It isn’t accurate because it will only account for the number of characters within the String. In other words, it will fail to account for code points outside of what is called the BMP (Basic Multilingual Plane), that is, code points with a value of U+10000 or greater.

The reason is historical: when Java was first defined, one of its goal was to treat all text as Unicode; but at this time, Unicode did not define code points outside of the BMP. By the time Unicode defined such code points, it was too late for char to be changed.

The correct way to count the real numbers of characters within a String, i.e. the number of code points, is either:

someString.codePointCount(0, someString.length())
// or, with Java 8:
someString.codePoints().count()