If I have a column in a dataset that has several variables, how would I start creating these dummy variables.
Example: Suppose I have a column called color that has: Red, Green, Yellow, Blue, Pink and Gray as options for the color of the car.
What is the best way to turn these variables into factors. without manually creating an empty variable?
Edit: So I did what Greg recommended, and that is what I have. I wondered about NA coming out, although I didn't know why he was there.
> data$Trim<-factor(data$Trim) > data$Model<-factor(data$Model) > data$Type<-factor(data$Type) > data=cbind(Price,Mileage,Buick,Cadillac,Chevrolet,Pontiac,SAAB,Saturn,Model,Trim,Type,Cylinder,Liter,Doors,Cruise,Sound,Leather) > fit <- lm( Price ~ Mileage+Buick+Cadillac+Chevrolet+Pontiac+SAAB+Saturn+Model+Trim+Type+Cylinder+Liter+Doors+Cruise+Sound+Leather, x=TRUE ) > summary(fit)
Then I get the message "Odds: (21 not defined due to features)", and for some variables the output is NA.
source share