R - model with many dummy variables

If I have a column in a dataset that has several variables, how would I start creating these dummy variables.

Example: Suppose I have a column called color that has: Red, Green, Yellow, Blue, Pink and Gray as options for the color of the car.

What is the best way to turn these variables into factors. without manually creating an empty variable?

Edit: So I did what Greg recommended, and that is what I have. I wondered about NA coming out, although I didn't know why he was there.

> data$Trim<-factor(data$Trim) > data$Model<-factor(data$Model) > data$Type<-factor(data$Type) > data=cbind(Price,Mileage,Buick,Cadillac,Chevrolet,Pontiac,SAAB,Saturn,Model,Trim,Type,Cylinder,Liter,Doors,Cruise,Sound,Leather) > fit <- lm( Price ~ Mileage+Buick+Cadillac+Chevrolet+Pontiac+SAAB+Saturn+Model+Trim+Type+Cylinder+Liter+Doors+Cruise+Sound+Leather, x=TRUE ) > summary(fit) 

Then I get the message "Odds: (21 not defined due to features)", and for some variables the output is NA.

+4
source share
1 answer

R will automatically create dummy variables, here is an example:

 > mycars <- mtcars > mycars$cyl <- factor(mycars$cyl) > fit <- lm( mpg ~ wt+cyl, data=mycars, x=TRUE ) > summary(fit) Call: lm(formula = mpg ~ wt + cyl, data = mycars, x = TRUE) Residuals: Min 1Q Median 3Q Max -4.5890 -1.2357 -0.5159 1.3845 5.7915 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.9908 1.8878 18.006 < 2e-16 *** wt -3.2056 0.7539 -4.252 0.000213 *** cyl6 -4.2556 1.3861 -3.070 0.004718 ** cyl8 -6.0709 1.6523 -3.674 0.000999 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.557 on 28 degrees of freedom Multiple R-squared: 0.8374, Adjusted R-squared: 0.82 F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11 > head(fit$x) (Intercept) wt cyl6 cyl8 Mazda RX4 1 2.620 1 0 Mazda RX4 Wag 1 2.875 1 0 Datsun 710 1 2.320 0 0 Hornet 4 Drive 1 3.215 1 0 Hornet Sportabout 1 3.440 0 1 Valiant 1 3.460 1 0 > 

x=TRUE in a call to lm tells him to return the actually used matrix x, which includes dummy variables. If you do not want to look at the created dummy variables, you can leave this out. See ?contrasts more details if you want to establish a way to create dummy variables.

+10
source

Source: https://habr.com/ru/post/1447051/


All Articles