Loop Vectorization in R

I have two vectors:

  • Vector text c('abc', 'asdf', 'werd', 'ffssd')
  • Vector Templates c('ab', 'd', 'w')

I would like to digitize the following for-loop:

for(p in 1 : length(patterns)){
    count <- count + str_count(texts, p);
}

I used the following commands, but both of them will not work.

> str_count(texts, patterns)
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

> str_count(texts, t(patterns))
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

I need a 2d matrix:

       |  patterns
 ------+--------
       |   1 0 0
 texts |   0 1 0
       |   0 1 1
       |   0 1 0
+4
source share
2 answers

You can use outer. I assume that you are using str_countfrom a package stringr.

library(stringr)

texts <- c('abc', 'asdf', 'werd', 'ffssd')
patterns <- c('ab', 'd', 'w')

matches <- outer(texts, patterns, str_count)

# set dim names
colnames(matches) <- patterns
rownames(matches) <- texts
matches
      ab d w
abc    1 0 0
asdf   0 1 0
werd   0 1 1
ffssd  0 1 0

EDIT

# or set names directly within 'outer' as noted by @RichardScriven
outer(setNames(nm = texts), setNames(nm = patterns), str_count)
+8
source

Using dplyrand tidyr(and stringr):

library(dplyr)
library(tidyr)
library(stringr)
expand.grid(texts, patterns) %>%
   mutate_each(funs(as.character(.))) %>%
   mutate(matches = stringr::str_count(Var1, Var2)) %>% 
   spread(Var2, matches)
   Var1 ab d w
1   abc  1 0 0
2  asdf  0 1 0
3 ffssd  0 1 0
4  werd  0 1 1
+3
source

Source: https://habr.com/ru/post/1619564/


All Articles