Is there a word recognition function?

Is there a way to evaluate a string and see if it evaluates a word in English? Here is what i'm looking for

is.word("hello world")
[1] FALSE

is.word(c("hello", "world")
[1] TRUE TRUE

The above does not work, since there is no logical function is.word.

+4
source share
2 answers

As noted in the comments, you need an English dictionary to match. The object gradyAugmentedin the package qdapDictionaryis one such dictionary:

A dataset containing Grady Ward's word vector supplemented by "DICTIONARY", a list of names of Mark Kantrowitz, other nouns and abbreviations.

library(qdapDictionaries)
is.word  <- function(x) x %in% GradyAugmented
is.word(c("hello world"))
## [1] FALSE
is.word(c("hello", "world"))
## [1] TRUE TRUE
is.word(c("asfasdf"))
## [1] FALSE
+8
source

No, this function is not available in the R .

, 9 10 .

, "", . GNU - SCOWL (And Friends).

. , , ( , ). list.files() pattern grepl(), , .

# set path to extracted package
words.dir <- '/tmp/scowl-2015.08.24/final/'
words <- unlist(sapply(list.files(words.dir, pattern='[1-6][05]$', full.names=TRUE), readLines, USE.NAMES=FALSE))
# For some reason most frequent words are not in "final" dir…
words <- c(words, readLines(paste0(words.dir, '../r/special/frequent')))
length(words)
# [1] 143681

, , , , . , .

c("knight", "stack", "selfie", "l8er", "googling", "echinuliform") %in% words
# [1]  TRUE  TRUE  TRUE FALSE  TRUE  FALSE

- " ?". "googling" ? , 15 . " echinuliform"? , .

-, , .

+1

Source: https://habr.com/ru/post/1622112/


All Articles