R: Sum of Complete.cases in one column, grouped (or sorted) by the value in another column

I use the dataset airqualityavailable in R and try to count the number of rows in the data that do not contain any NAs, but aggregating by Month.

The data is as follows:

head(airquality)
#   Ozone Solar.R Wind Temp Month Day
# 1    41     190  7.4   67     5   1
# 2    36     118  8.0   72     5   2
# 3    12     149 12.6   74     5   3
# 4    18     313 11.5   62     5   4
# 5    NA      NA 14.3   56     5   5
# 6    28      NA 14.9   66     5   6

As you can see, I have NAin the columns Ozoneand Solar.R. I used the function complete.casesas follows:

x  <- airquality[,1] # for the Ozone
y  <- airquality[,2] # for the Solar.R
ok <- complete.cases(x,y)

And then to check:

nrow(airquality)
# [1] 153
sum(!ok)
# [1] 42
sum(ok)
# [1] 111

which is great.

But now I would like to allocate this data for sorting Month(Column5), and in this I ran into problems - when trying aggregateor the sortvalue in column5 ( Month).

, Month ( , ):

aggregate(x = sum(complete.cases(airquality)), by= list(nrow(airquality)), FUN = sum)
#   Group.1   x
# 1     153 111

... , . by . 5 airquality.

- airquality[,5]
- airquality[,"Month"]

:

aggregate(x = sum(complete.cases(airquality)), by= list(airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) : 
#   arguments must have same length

aggregate(x = sum(complete.cases(airquality)), by= 
      list(sum(complete.cases(airquality)),airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) : 
#   arguments must have same length

?aggregate(x, ...). by...

by - , , x. .

?factor, , , , ( ). break =, .

", ", , , # SQL.

:

Count  Month
  24       5
   9       6
  26       7
  23       8
  29       9
+4
3

data.table . by ...

require( data.table )
dt <- data.table( airquality )
dt[ , list( Count = sum( complete.cases( Ozone , Solar.R ) ) ), by = Month ]

#   Month Count
#1:     5 24
#2:     6  9
#3:     7 26
#4:     8 23
#5:     9 29

base R, ...

airquality$ok <- complete.cases( airquality$Ozone , airquality$Solar.R )
aggregate( ok ~ Month , data = airquality , FUN = sum )
#  Month ok
#1     5 24
#2     6  9
#3     7 26
#4     8 23
#5     9 29

: @Simon data.table:

dt[complete.cases(Ozone, Solar.R), list(count = .N), by=Month]
#    Month count
# 1:     5    24
# 2:     6     9
# 3:     7    26
# 4:     8    23
# 5:     9    29

, / , NA s, Month.

: .N - data.table - 1, .

+4

dplyr.

require(dplyr)

airquality %.%
  group_by(Month) %.%
  summarize(incomplete = sum(!complete.cases(Ozone, Solar.R)),
             complete = sum(complete.cases(Ozone, Solar.R)))

#  Month incomplete complete
#1     5          7       24
#2     6         21        9
#3     7          5       26
#4     8          8       23
#5     9          1       29
+4

This seems to be what you are looking for:

> foo <- table(airquality[!ok,"Month"])
> data.frame(Month=names(foo),Count=as.vector(foo))
  Month Count
1     5     7
2     6    21
3     7     5
4     8     8
5     9     1

(This is a bit different from your editing. Is it possible that between okand !okthere is a little confusion?)

+2
source

Source: https://habr.com/ru/post/1540383/


All Articles