You are having a problem with the dummy code for the next dataset.
Sample data, for example dataframe = mydata:
ID | NAMES | -- | -------------- | 1 | 4444, 333, 456 | 2 | 333 | 3 | 456, 765 |
I would like to specify only unique variables in NAMES as column and code variables if each row has this variable or not ie 1 or 0
Output Required:
ID | NAMES | 4444 | 333 | 456 | 765 | -- | -------------- |------|-----|-----|-----| 1 | 4444, 333, 456 | 1 | 1 | 1 | 0 | 2 | 333 | 0 | 1 | 0 | 0 | 3 | 456, 765 | 0 | 0 | 1 | 1 |
what i have done so far is creating a unique vector
split <- str_split(string = mydata$NAMES,pattern = ",") vec <- unique(str_trim(unlist(split))) remove <- "" vec <- as.data.frame(vec[! vec %in% remove]) colnames(vec) <- "var" vecRef <- as.vector(vec$var) namesCast <- dcast(data = vec,formula = .~var) namesCast <- nameCast[,2:ncol(namesCast)]
This gives a vector of unique NAMES with removed spaces / irregularities. From there, I have no idea how to do the mapping / dummy code, so any help would be greatly appreciated!
source share