Accepted
Terminal UI applications need to correctly measure text width for layout purposes. This requires splitting strings into grapheme clusters—what users perceive as single characters, even when composed of multiple Unicode code points (e.g., emoji sequences like "👨👩👧" which use Zero-Width Joiners).
Two options exist for grapheme cluster detection in the JVM:
- ICU4J (
com.ibm.icu.text.BreakIterator) - JDK (
java.text.BreakIterator)
Use the JDK's built-in java.text.BreakIterator.
- No external dependency - reduces JAR size by ~13MB
- Faster startup - no additional classes to load
- Simpler native-image compilation - no reflection configuration needed for ICU4J internals
- Sufficient for modern JDKs - JDK 17+ handles most grapheme clusters correctly, including emoji ZWJ sequences
- Unicode version tied to JDK - ICU4J is updated more frequently with latest Unicode standard
- Edge cases on older JDKs - JDK 11 and earlier may not handle all emoji sequences correctly
- Less control - cannot upgrade Unicode support independently of JDK version
- Latest Unicode support - frequently updated, independent of JDK releases
- Consistent behavior - same results across all JDK versions
- Additional features - locale-specific break iteration, more granular control
- Large dependency - ~13MB for icu4j alone
- Native-image complexity - requires reflection and resource configuration for GraalVM
- Overkill - most features unused when only grapheme clustering is needed
The decision can be revisited if:
- Users report grapheme clustering issues on supported JDK versions
- A need arises for locale-specific text segmentation
- Unicode sequences emerge that JDK handles incorrectly