conventional string length search methods seem to fail
They fail, the string report is longer than the number of Unicode characters [*]. If you need a different behavior, you need to clearly define what you mean by "string length".
If you are interested in line lengths for showing purposes, then you are usually interested in counting pixels (or some other logical / physical unit) and the responsibility of the display layer (for starters, you may have different widths for different characters, if the font not a monospace).
But if you're just interested in counting the number of graphemes ("the minimum distinguishing unit of a record in the context of a particular system record"), here is a good reference with code and examples. Copy-crop - paste the appropriate code from there, we will have something like this:
public static int getGraphemeCount(String text) { int graphemeCount = 0; BreakIterator graphemeCounter = BreakIterator.getCharacterInstance(); graphemeCounter.setText(text); while (graphemeCounter.next() != BreakIterator.DONE) graphemeCount++; return graphemeCount; }
Remember: the above example uses the default locale value. A more flexible and reliable method could, for example, get an explicit locale argument as an argument and instead of BreakIterator.getCharacterInstance(locale)
[*] To be precise, as pointed out in the comments, String.length() counts Java characters, which are actually UTF-16 encoded units. This is equivalent to counting Unicode characters only if we are inside BMP .
source share