Uisng dplyr and reshape2 , this will be one way. Since you want to summarize data using ZIP , you can use the variable to group the data. One thing is unclear whether the MEDIAN values ββare the same for each ZIP or not. Here I suggested that you may have different meanings. Therefore, I used median() . Using n() , you can find out how many fast food stores exist.
summarize(group_by(mydf, ZIP), mid = median(MEDIAN), total = n())
In the second part, you can use dcast() . You want to see how many fast food stores there are by type of fast food store. Combining ZIP and MEDIAN , you ask R to check how many stores (RESTAURANT) exist.
dcast(mydf, ZIP + MEDIAN ~ RESTAURANT, length, value.var = "RESTAURANT") # ZIP MEDIAN BurgerKing KFC McDonald TacoBell Wendy's #1 1001 56663 0 0 1 0 0 #2 1007 79076 0 0 1 0 0 #3 1008 63980 0 0 3 1 0 #4 1011 63476 0 0 1 0 0 #5 1013 36578 0 0 1 0 0 #6 1020 50058 0 1 2 1 1 #7 1027 58573 1 0 1 0 0
If you are using data.table , you can do the following.
library(data.table) setDT(mydf)[, list(mid = first(MEDIAN), total = .N), by = ZIP][] # If you calculate median setDT(mydf)[, list(mid = as.double(median(MEDIAN)), total = .N), by = ZIP][] dcast(setDT(mydf), ZIP + MEDIAN ~ RESTAURANT, fun = length, value.var = "RESTAURANT")
DATA
mydf <-structure(list(Row_NUM = c(26800L, 33161L, 23706L, 23709L, 30007L, 30008L, 30009L, 24429L, 15323L, 29196L, 33127L, 39362L, 44914L, 2542L, 35242L), ZIP = c(1001L, 1007L, 1008L, 1008L, 1008L, 1008L, 1011L, 1013L, 1020L, 1020L, 1020L, 1020L, 1020L, 1027L, 1027L ), MEDIAN = c(56663L, 79076L, 63980L, 63980L, 63980L, 63980L, 63476L, 36578L, 50058L, 50058L, 50058L, 50058L, 50058L, 58573L, 58573L), RESTAURANT = structure(c(3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 3L, 3L, 5L, 4L, 1L, 3L), .Label = c("BurgerKing", "KFC", "McDonald's", "TacoBell", "Wendy's"), class = "factor")), .Names = c("Row_NUM", "ZIP", "MEDIAN", "RESTAURANT"), class = "data.frame", row.names = c(NA, -15L))