UTF-8 string array using graphemes and partitions

Is there any advantage to using graphemesover splitto create an array from a UTF-8 string?

For example, consider the following:

# Define a UTF-8 string with a bunch of multibyte characters
s = "{(-nβ†‘β΅Γ·βŠƒβŠ–β΅),⍨⍉1β†“β‰βˆ˜.=⍨⍳n←1-⍨≒⍡}"

# Create an array using split
split(s, "")

# Create an array using graphemes (v0.4+)
collect(graphemes(s))

Both approaches give the expected result. And indeed

split(s, "") == collect(graphemes(s))

returns true.

Both approaches seem to consistently produce equivalent results. Is one approach usually preferable to another, whether for performance, style, or otherwise?

(Note that graphemesiterator returns, not an array, therefore collect.)

+4
source share
1 answer

, . graphemes() , , ; , - . split().

a + β—Œ. split() , graphemes() .

+7

Source: https://habr.com/ru/post/1610654/


All Articles