A simple mutation with dplyr gives the error "wrong result"

My data table dfhas a column subject(for example, "SubjectA", "SubjectB", ...). Each question answers many questions, and the table is in a long format, so for each object there are many rows. The thematic column is a factor. I want to create a new column - name it subject.id- it's just a numeric version subject. Therefore, for all lines with "SubjectA" this will be 1; for all lines with "SubjectB" this will be 2; and etc.

I know that a simple way to do this with help dplyrwould be to call df %>% mutate(subject.id = as.numeric(subject)). But I tried to do it like this:

subj.list <- unique(as.character(df$subject))
df %>% mutate(subject.id = which(as.character(subject) == subj.list))

And I get this error:

Error: wrong result size (12), expected 72 or 1

Why is this happening? I am not interested in other ways to solve this particular problem. Rather, I worry that my inability to understand this error reflects a deep misunderstanding of dplyreither mutate. I understand that this call should be conceptually equivalent:

df$subject.id <- NULL
for (i in 1:nrow(df)) {
   df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list))
}

But the latter works, and the first does not. Why?

Playable example:

df <- InsectSprays %>% rename(subject = spray)
subj.list <- unique(as.character(df$subject))

# this works
df$subject.id <- NULL
for (i in 1:nrow(df)) {
   df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list)
}

# but this doesn't
df %>% mutate(subject.id = which(as.character(subject) == subj.list))
+4
source share
2 answers

The problem is that operators and functions are applied in vector form by mutation. Thus, it whichis applied to the vector being created as.character(df$subject) == subj.list, and not to each line (as in your loop).

Using rowwiseas described here will help solve the problem: fooobar.com/questions/54686 / ...

, :

df %>% 
  rowwise() %>%
  mutate(subject.id = which(as.character(subject) == subj.list))
+2

df$subject , :

df %>% mutate(subj.id=as.numeric(subject))

:

subj.df <- df$subject %>% 
    unique() %>% 
    as_tibble() %>% 
    rownames_to_column(var = 'subj.id')

df %>% left_join(subj.df,by = c("subject"="value"))
0

Source: https://habr.com/ru/post/1671179/


All Articles