R is a subset of the sort data

I am studying the use of R (version 3.1.2), so this may come up as a noob question, but I am having problems ordering a subset of the data frame. If I use the mtcars data frame with attach(mtcars) , I can easily order it with ord.cars <- mtcars[order(hp),] . The problem is that if I use a subset, say sub.cars <- subset(mtcars, hp > 120) and try to order it with ord.sub <- sub.cars[order(mpg),] , the result is as follows:

  mpg cyl disp hp drat wt qsec vs am gear carb NA NA NA NA NA NA NA NA NA NA NA NA Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 NA.1 NA NA NA NA NA NA NA NA NA NA NA NA.2 NA NA NA NA NA NA NA NA NA NA NA NA.3 NA NA NA NA NA NA NA NA NA NA NA NA.4 NA NA NA NA NA NA NA NA NA NA NA Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 NA.5 NA NA NA NA NA NA NA NA NA NA NA Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 NA.6 NA NA NA NA NA NA NA NA NA NA NA Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 NA.7 NA NA NA NA NA NA NA NA NA NA NA Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 NA.8 NA NA NA NA NA NA NA NA NA NA NA NA.9 NA NA NA NA NA NA NA NA NA NA NA Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 NA.10 NA NA NA NA NA NA NA NA NA NA NA NA.11 NA NA NA NA NA NA NA NA NA NA NA AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 NA.12 NA NA NA NA NA NA NA NA NA NA NA NA.13 NA NA NA NA NA NA NA NA NA NA NA NA.14 NA NA NA NA NA NA NA NA NA NA NA 

Why does R return as NA all rows that are left outside the subset?

Thanks in advance!

+6
source share
1 answer

This is a problem with using attach() , which is not recommended in R - for this very reason! The problem is that your code is somewhat ambiguous, or at least it is something else than you expected.

How to resolve this?

  • detach dataset and
  • do not use attach again. Use [ and / or $ instead, and if you like with() for a subset of your data.

Here's how you could do it for an example:

 detach(mtcars) ord.cars <- mtcars[order(mtcars$hp),] sub.cars <- subset(mtcars, hp > 120) #the subset could also be written as: sub.cars <- mtcars[mtcars$hp > 120,] ord.sub <- sub.cars[order(sub.cars$mpg),] head(ord.sub) # only show the first 6 rows mpg cyl disp hp drat wt qsec vs am gear carb Cadillac Fleetwood 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4 Lincoln Continental 10.4 8 460 215 3.00 5.42 17.8 0 0 3 4 Camaro Z28 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 Chrysler Imperial 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4 Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8 

What exactly caused the problem in your code?

After you attached mtcars data, whenever you call one of the nested data column names, for example mpg , it will refer to the attached data set (the original mtcats data). Then the problem was that you multiplied the data and saved it in a new object (sub.cars) that was not attached while mtcars was still attached. Then, when you tried to order sub.cars data, you used sub.cars[order(mpg),] , and, as you can see, there you are referring to the mpg column, which is interpreted by R as one of the supplied (original) mtcars collection data with more rows than a subset of the data. All of these lines in your sub.cars that have been excluded by the subset will now display as NA in sub.cars .

Lesson: do not use attach() .

+7
source

Source: https://habr.com/ru/post/978546/


All Articles