I have a dataset containing salary data. Not all cells have values, so I used na.action = na.pass, na.rm = TRUE, but this gives me an error because I want to combine with JobTitle, which is a factor?
So far I have developed the code below:
aggregate(salaries$JobTitle,
list(pay = salaries$TotalPay),
FUN=mean,
na.action=na.pass,
na.rm=TRUE)
My test data has the following columns:
'data.frame': 104 obs. of 36 variables:
$ Id : int 1 2 3 4 5 6 7 8 9 10 ...
$ EmployeeName : Factor w/ 11 levels "","ALBERT PARDINI",..: 10 7 2 4 11 6 3 5 9 8 ...
$ JobTitle : Factor w/ 9 levels "","ASSISTANT DEPUTY CHIEF II",..: 8 4 4 9 6 2 3 7 3 5 ...
$ BasePay : num 167411 155966 212739 77916 134402 ...
$ OvertimePay : num 0 245132 106088 56121 9737 ...
$ OtherPay : num 400184 137811 16453 198307 182235 ...
$ Benefits : logi NA NA NA NA NA NA ...
$ TotalPay : num 567595 538909 335280 332344 326373 ...
$ TotalPayBenefits: num 567595 538909 335280 332344 326373 ...
$ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
$ Notes : logi NA NA NA NA NA NA ...
$ Agency : Factor w/ 2 levels "","San Francisco": 2 2 2 2 2 2 2 2 2 2 ..
The error code that appears is
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
etc...
I tried with $ Id salaries and it works like magic, so I assume the code is correct and maybe I need to change the data type for JobTitle?
source
share