Mean (, na.rm = TRUE) still returns NA

I am very new to R (transition from SPSS). I am using RStudio on a Mac working with Mavericks. Please answer my question with the words of 2 syllables, as this is my first real attempt at something like this. I have worked through some basic tutorials and can make everyone work on all the sample data.

I have a dataset with 64,000-decimal rows and about 20 columns. I want to get the average value of the variable "hold_time", but all I try to get, I get either NA or NA, and a warning message

I tried all of the following:

> summary(data_Apr_Jun$hold_time,na.rm=TRUE) 5 6 7 4 8 2 1 3 10 9596 9191 3192 1346 1145 977 940 655 534 11 9 12 0 13 15 14 16 17 490 444 249 128 106 86 73 68 40 98 118 121 128 125 97 101 188 86 31 29 28 28 27 27 26 26 26 102 105 113 81 119 139 127 134 152 25 25 25 25 24 24 23 23 23 18 69 96 106 110 111 120 190 76 23 23 23 22 22 22 22 22 22 82 132 135 156 166 94 115 116 117 22 21 21 21 21 21 20 20 20 142 153 165 19 93 100 104 112 126 20 20 20 20 20 19 19 19 19 131 138 143 157 177 189 61 87 103 19 19 19 19 19 19 19 19 18 108 148 176 212 54 56 64 74 79 18 18 18 18 18 18 18 18 18 99 107 129 163 168 171 178 226 236 18 17 17 17 17 17 17 17 17 59 71 78 95 114 122 123 130 (Other) 17 17 17 17 16 16 16 16 2739 NA 29807 > mean(as.numeric(data_Apr_Jun$hold_time,NA.rm=TRUE)) [1] NA > data_Apr_Jun$hold_time[data_Apr_Jun$hold_time=="NA"]<-0 > mean(as.numeric(data_Apr_Jun$hold_time)) [1] NA > mean(data_Apr_Jun$hold_time) [1] NA Warning message: In mean.default(data_Apr_Jun$hold_time) : argument is not numeric or logical: returning NA > mean(as.numeric(data_Apr_Jun$hold_time,na.rm=TRUE)) [1] NA > colMeans(data_Apr_Jun$hold_time) Error in colMeans(data_Apr_Jun$hold_time) : 'x' must be an array of at least two dimensions > colMeans(data_Apr_Jun) Error in colMeans(data_Apr_Jun) : 'x' must be numeric > mean(data_Apr_Jun$hold_time,na.omit) [1] NA Warning message: In mean.default(data_Apr_Jun$hold_time, na.omit) : argument is not numeric or logical: returning NA 

So, although I delete NA, they do not seem to be deleted. I'm confused.

+5
source share
2 answers

Hello, Rnovice, unfortunately, there are several errors ... Allows you to solve them in turn:

 > mean(as.numeric(data_Apr_Jun$hold_time,NA.rm=TRUE)) [1] NA 

This is due to the misuse of na.rm : it should be

 mean(as.numeric(data_Apr_Jun$hold_time),na.rm=TRUE) 
  • na.rm is a mean argument, not as.numeric (caution with brackets)
  • na.rm R case sensitive

==================================================== ==================================

 > data_Apr_Jun$hold_time[data_Apr_Jun$hold_time=="NA"]<-0 

R does not allow comparison with NA , as I pointed out here: Something strange regarding the return of NAs

You mean

 data_Apr_Jun$hold_time[which(is.na(data_Apr_Jun$hold_time))] <- 0 

Another note =="NA" compared with the string "NA" . Try is.na("NA") and is.na(NA) to see the difference.

==================================================== ==================================

 colMeans(data_Apr_Jun$hold_time) Error in colMeans(data_Apr_Jun$hold_time) : 'x' must be an array of at least two dimensions 

try data_Apr_Jun$hold_time and you will see that it returns a vector. This is why the average value (calculated by colMeans ) does not matter.

Hope the rest is clear or accessible with these hints. One very important thing that you already understood:
Use R! You are on the right track!

+11
source

Unfortunately, as.numeric does implicit coercion, which leads to incorrect answers. don't imply it on factors.

+2
source

Source: https://habr.com/ru/post/1200280/


All Articles