Dplyr: Graphs / Percentage of factors grouped by school not receiving grouping

Question

Dplyr: Graphs / Percentage of factors grouped by school not receiving grouping

I have a long data set with one row per person, grouped with schools. Each line has an ordered factor of {1, 2, 3, 4}, "cats". I want to get a percentage of 1, 2, 3 and 4 in each school. The data set is as follows:

school_number cats 1 10505 3 2 10505 3 3 10502 1 4 10502 1 5 10502 2 6 10502 1 7 10502 1 8 10502 2 10 10503 3 11 10505 2

I tried something like this:

 df_pcts <- df %>% group_by(school_number) %>% mutate(total=sum(table(cats))) %>% summarize(cat_pct = table(cats)/total)

but the shared variable created using the mutate () step puts the total number of lines in each line. I can’t even reach the final step. I'm confused.

PS In some other posts I saw lines like this:

 n = n()

when I do this, I get a message that

 Error in n() : This function should not be called directly

Where did it come from?

TIA

+6

r dplyr

Stuart Sep 17 '14 at 2:37

source share

3 answers

jalapic · Answer 1 · 2014-09-17T02:56:46+0000

Maybe this helps a little, although I'm not 100% sure what you need.

This counts the number of lines of each school_number / cats combination that exist in your df using tally . Then it calculates the percentage of “cats” in each school_number number, and then it is grouped only by the school number.

 df %>% group_by(school_number,cats) %>% tally %>% group_by(school_number) %>% mutate(pct=(100*n)/sum(n))

He gives the following:

  # school_number cats n pct # 1 10502 1 4 66.66667 # 2 10502 2 2 33.33333 # 3 10503 3 1 100.00000 # 4 10505 2 1 33.33333 # 5 10505 3 2 66.66667

EDIT:

to add rows with 0% missing from your sample data, you can do the following. Associate the result above with df, which contains 0% for all school_number / cats combinations. Keep only the first instance of this binding (the first instances always contain values> 0%, if they exist). Then I organized it using school_number and cats for readability:

 y<-df %>% group_by(school_number,cats) %>% tally %>% group_by(school_number) %>% mutate(pct=(100*n)/sum(n)) %>% select(-n) x<-data.frame(school_number=rep(unique(df$school_number),each=4), cats=1:4,pct=0) rbind(y,x) %>% group_by(school_number,cats)%>% filter(row_number() == 1) %>% arrange(school_number,cats)

which gives:

 # school_number cats pct #1 10502 1 66.66667 #2 10502 2 33.33333 #3 10502 3 0.00000 #4 10502 4 0.00000 #5 10503 1 0.00000 #6 10503 2 0.00000 #7 10503 3 100.00000 #8 10503 4 0.00000 #9 10505 1 0.00000 #10 10505 2 33.33333 #11 10505 3 66.66667 #12 10505 4 0.00000

user69 · Answer 2 · 2017-02-21T11:55:02+0000

All combinations of school and cat numbers, and then left, combine to calculate pct. If NA, then 0

 expand.grid(school_number = unique(df$school_number), cats = levels(df$cats)) %>% left_join(df %>% group_by(school_number, cats) %>% tally %>% mutate(pct = (n / sum(n) * 100))) %>% select(-n) %>% mutate(pct = ifelse(is.na(pct), 0, pct)) %>% arrange(school_number)

which gives

  school_number cats pct 1 10502 1 66.66667 2 10502 2 33.33333 3 10502 3 0.00000 4 10502 4 0.00000 5 10503 1 0.00000 6 10503 2 0.00000 7 10503 3 100.00000 8 10503 4 0.00000 9 10505 1 0.00000 10 10505 2 33.33333 11 10505 3 66.66667 12 10505 4 0.00000

WANNISA RITMAHAN · Answer 3 · 2018-03-23T14:27:42+0000

As suggested by @akrun, you probably previously called the plyr and dplyr . Since summaris(z)e valids is in both packages, you can specify by adding the package before the function name ie dplyr::fun(argument...) .

Dplyr: Graphs / Percentage of factors grouped by school not receiving grouping

More articles: