R - regular expression error (PCRE version)

I am trying to use koRpusin R to perform lemmatization on a Linux server with RHEL6. Last week, when I installed MRO (Microsoft R Open) 3.2.3, the code below worked just fine:

library(koRpus)
lw = c("dancing","flying","flew")
res = treetag(lw,treetagger="manual",format="obj",TT.tknz = F, lang="en",
        TT.options=list(path="/usr/local/bin/TreeTagger",preset="en"))

Now when I run MRO 3.3.0, I get the following error:

Error in grepl("(^\\p{P}*\\p{L}\\p{M}*\\.)", tkn, perl = TRUE) :
  invalid regular expression '(^\p{P}*\p{L}\p{M}*\.)'
In addition: Warning message:
In grepl("(^\\p{P}*\\p{L}\\p{M}*\\.)", tkn, perl = TRUE) :
  PCRE pattern compilation error
        'support for \P, \p, and \X has not been compiled'
        at 'p{P}*\p{L}\p{M}*\.)'

OK, so my PCRE needs to be recompiled with Unicode support. In fact, when I run the code below, I see that this is the exact problem. I also see that I am running version 8.37.

pcre_config()
#>         UTF-8 Unicode properties                JIT
#>          TRUE              FALSE              FALSE

extSoftVersion()
#>                 zlib                     bzlib                        xz
#>              "1.2.8"      "1.0.6, 6-Sept-2010"                   "5.2.2"
#>                 PCRE                       ICU                       TRE
#>    "8.37 2015-04-28"                    "57.1" "TRE 0.8.0 R_fixes (BSD)"
#>                iconv
#>         "glibc 2.12"

Now I went ahead and installed 8.39 and with (hopefully) set the necessary flags.

./configure --enable-utf8 --enable-unicode-properties
make
make install

So now, when I run pcretest -C, I get

PCRE version 8.39 2016-06-14
Compiled with
  8-bit support
  UTF-8 support
  Unicode properties support
  No just-in-time compiler support
  Newline sequence is LF
  \R matches all Unicode newlines
  Internal link size = 2
  POSIX malloc threshold = 10
  Parentheses nest limit = 250
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack

But when I run R again, mine pcre_config()gives the same results, the call treetagfails, and extSoftVersion()still reports 8.37.

R, PCRE?

... R, -, PCRE ( https://mran.microsoft.com/news/#r330) 3.3.0, () PCRE . PCRE, ( ), R PCRE 8.37 2015-04-28, , MRO 3.3.0 RHEL 6 PCRE, , . , grepl , extSoftVersion.

+4

Source: https://habr.com/ru/post/1650357/


All Articles