Conditional Average Statement

I have a dataset called bwght that contains the variable cigs (cigarettes smoked per day)

When I calculate the average cigs in the cigs dataset using: mean(bwght$cigs) , I get the number 2.08.

Only 212 out of 1388 women in the smoke sample (and 1176 do not smoke):

summary(bwght$cigs>0) gives the result:

 Mode FALSE TRUE NA logical 1176 212 0 

I am asked to find the average number of cigs among women who smoke (212).

I find it difficult to find the correct syntax to exclude non-smoking = 0 I tried:

  • mean(bwght$cigs| bwght$cigs>0)

  • mean(bwght$cigs>0 | bwght$cigs=TRUE)

  • if (bwght$cigs > 0){ sum(bwght$cigs) }

  • x <-as.numeric(bwght$cigs, rm="0"); mean(x)

But nothing works! Can anyone help me out?

+4
source share
2 answers

If you want to exclude non-smokers, you have several options. The easiest way is:

 mean(bwght[bwght$cigs>0,"cigs"]) 

With a data frame, the first variable is the row, and the next is the column. Thus, you can multiply using dataframe[1,2] to get the first row, second column. You can also use logic in row selection. Using bwght$cigs>0 as the first element, you will only multiply lines in which cigs non-zero.

Your others did not work for the following reasons:

 mean(bwght$cigs| bwght$cigs>0) 

This is actually a logical comparison. You request the result TRUE / FALSE bwght$cigs OR bwght$cigs>0 , and then take the average value on it. I'm not quite sure, but I think that R cannot even take the data entered as logical for the mean() function.

 mean(bwght$cigs>0 | bwght$cigs=TRUE) 

Same problem. Do you use sign | which returns a boolean, and R tries to take the middle of the booleans.

 if(bwght$cigs > 0){sum(bwght$cigs)} 

If you were a SAS programmer initially? This is similar to the way I typed at first. In principle, if() does not work the same as in SAS. In this example, you use bwght$cigs > 0 as the if condition, which will not work, because R will only look at the first element of the vector resulting from bwght $ cigs> 0. R handles the cycle differently from SAS - verification functions like lapply, tapply etc.

 x <-as.numeric(bwght$cigs, rm="0") mean(x) 

I honestly don't know what this will do. It can work if rm="0" does not have quotes ...?

+6
source
 mean(bwght[bwght$cigs>0,"cigs"]) 

I found that the statement failed, returning "non-numeric or logical argument: return NA"

Converting to a matrix resolved this:

 mean(data.matrix(bwght[bwght$cigs>0,"cigs"])) 
0
source

Source: https://habr.com/ru/post/1435792/


All Articles