Maybe this helps a little, although I'm not 100% sure what you need.
This counts the number of lines of each school_number / cats combination that exist in your df using tally . Then it calculates the percentage of βcatsβ in each school_number number, and then it is grouped only by the school number.
df %>% group_by(school_number,cats) %>% tally %>% group_by(school_number) %>% mutate(pct=(100*n)/sum(n))
He gives the following:
# school_number cats n pct # 1 10502 1 4 66.66667 # 2 10502 2 2 33.33333 # 3 10503 3 1 100.00000 # 4 10505 2 1 33.33333 # 5 10505 3 2 66.66667
EDIT:
to add rows with 0% missing from your sample data, you can do the following. Associate the result above with df, which contains 0% for all school_number / cats combinations. Keep only the first instance of this binding (the first instances always contain values> 0%, if they exist). Then I organized it using school_number and cats for readability:
y<-df %>% group_by(school_number,cats) %>% tally %>% group_by(school_number) %>% mutate(pct=(100*n)/sum(n)) %>% select(-n) x<-data.frame(school_number=rep(unique(df$school_number),each=4), cats=1:4,pct=0) rbind(y,x) %>% group_by(school_number,cats)%>% filter(row_number() == 1) %>% arrange(school_number,cats)
which gives:
# school_number cats pct #1 10502 1 66.66667 #2 10502 2 33.33333 #3 10502 3 0.00000 #4 10502 4 0.00000 #5 10503 1 0.00000 #6 10503 2 0.00000 #7 10503 3 100.00000 #8 10503 4 0.00000 #9 10505 1 0.00000 #10 10505 2 33.33333 #11 10505 3 66.66667 #12 10505 4 0.00000
source share