I am trying to use libunibreak ( https://github.com/adah1972/libunibreak ) to mark possible line breaks in some specific Unicode text.
Libunibreak returns four possible options for each block of code in some text:
LINEBREAK_MUSTBREAK
LINEBREAK_ALLOWBREAK
LINEBREAK_NOBREAK
LINEBREAK_INSIDEACHAR
We hope that they themselves explain. I would expect MUSTBREAK to match newlines such as LF. However, for any given text, Libunibreak always indicates that the last character is MUSTBREAK
So, for example, with the string "abc", the output will be [NOBREAK, NOBREAK, MUSTBREAK]. For "abc \ n" the output will be [NOBREAK, NOBREAK, NOBREAK, MUSTBREAK]. I use the MUSTBREAK attribute to start a new line when drawing text, so the first case ("abc") creates an extra line that should not be there.
Is this behavior a Unicode specification or is it a quirk of implementing the library I'm using?
source
share