Match () vs% in% operator
From what I read in ?match()
"% in%" <- function (x, table) match (x, table, nomatch = 0)> 0
Why am I getting a different result with match(x, dict[["word"]], 0L)
vapply(strsplit(df$text, " "), function(x) sum(dict[["score"]][match(x, dict[["word"]], 0L)]), 1) #[1] 2 -2 3 -2 Unlike using dict[["word"]] %in% x
vapply(strsplit(df$text, " "), function(x) sum(dict[["score"]][dict[["word"]] %in% x]), 1) #[1] 2 -2 1 -1 Data
library(dplyr) df <- data_frame(text = c("I love pandas", "I hate monkeys", "pandas pandas pandas", "monkeys monkeys")) dict <- data_frame(word = c("love", "hate", "pandas", "monkeys"), score = c(1,-1,1,-1)) Update
After Richard’s explanation, I now understand my initial fallacy. The %in% operator returns a logical vector:
> sapply(strsplit(df$text, " "), function(x) dict[["word"]] %in% x) [,1] [,2] [,3] [,4] [1,] TRUE FALSE FALSE FALSE [2,] FALSE TRUE FALSE FALSE [3,] TRUE FALSE TRUE FALSE [4,] FALSE TRUE FALSE TRUE And match() returns location numbers:
> sapply(strsplit(df$text, " "), function(x) match(x, dict[["word"]], 0L)) [[1]] [1] 0 1 3 [[2]] [1] 0 2 4 [[3]] [1] 3 3 3 [[4]] [1] 4 4 match() returns an integer position vector for the first match, which will be greater than 1 if this position is not the first.
%in% returns a logical vector in which the match (TRUE) is always 1 (if represented as an integer).
Therefore, the amounts in your calculations are likely to be different.