Finding a Unicode Script Char in Haskell

I wanted to write a function that checks what Charthe Cyrillic alphabet represents, solely for pedagogical reasons. A simple approximation for a Russian is

isCyrillic c = 
    let lc = toLower c 
    in '' <= lc && lc <= ''

but I donโ€™t like it because it doesnโ€™t process other languages โ€‹โ€‹that use Cyrillic. I could hard set ranges:

U+0400โ€“U+04FF Cyrillic
U+0500โ€“U+052F Cyrillic Supplement
U+2DE0โ€“U+2DFF Cyrillic Extended-A
U+A640โ€“U+A69F Cyrillic Extended-B
U+1C80โ€“U+1C8F Cyrillic Extended-C

but this is not good practice either.

Ideally, the function would be simple

isCyrillic c = unicodeScript c == Cyrillic

but this assumes the existence of a type of enumerated Unicode scripts (Unicode ranges will also work). Is there anywhere?

+4
source share
1 answer

propertyfrom text-icu Data.Text.ICU.Charseems to match the score:

 import Data.Text.ICU.Char

isCyrilic c =    Block c ==   
+7

Source: https://habr.com/ru/post/1694549/


All Articles