I have a data.table dat
with 4 columns, say ( col1
, col2
, col3
, col4
).
Input data:
structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 4.7, 4.6, 5), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.1, 3.6), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 1.4 ), col4 = structure(c(1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("setosa", "versicolor", "virginica", "eer"), class = "factor")), .Names = c("col1", "col2", "col3", "col4"), row.names = c(NA, -10L), class = c("data.table", "data.frame")) r col1 col2 col3 col4 1: 5.1 3.5 1.4 setosa 2: 5.1 3.5 1.4 setosa 3: 4.7 3.2 1.3 setosa 4: 4.6 3.1 1.5 setosa 5: 5.0 3.6 1.4 setosa 6: 5.1 3.5 3.4 eer 7: 5.1 3.5 3.4 eer 8: 4.7 3.2 1.3 eer 9: 4.6 3.1 1.5 eer 10: 5.0 3.6 1.4 eer
I perform the following operation on col3
for each unique col4
value
dat[ , r_new:= sum(col3, na.rm = T), .(col4)] #syntax 1
So above, sytnax creates a new r_new
column with the values ββobtained by adding these col3
values, where col4
is the same. Thus, each unique col4
value will have an unuique value in the r_new
column.
Now I want to do the same as above, but not include those lines where col1
and col2
take the same value (something like below)
dat[col1 is different OR col2 is different , r_new:= sum(col3, na.rm = T), .(col4)]
What this will do when the sum
function is executed line by line, it will not contain those lines where both col1
and col2
take the same values.
How to include this condition in the same syntax as 1?
Expected Result:
col1 col2 col3 col4 r_new 1: 5.1 3.5 1.4 setosa 5.6 2: 5.1 3.5 1.4 setosa 5.6 3: 4.7 3.2 1.3 setosa 5.6 4: 4.6 3.1 1.5 setosa 5.6 5: 5.0 3.6 1.4 setosa 5.6 6: 5.1 3.5 3.4 eer 7.6 7: 5.1 3.5 3.4 eer 7.6 8: 4.7 3.2 1.3 eer 7.6 9: 4.6 3.1 1.5 eer 7.6 10: 5.0 3.6 1.4 eer 7.6
As you can see in the expected output, for setosa
lines 1 and 2 took the same value for col1
and col2
, and for err
lines 6 and 7 took the same values ββfor col1
and col2
, so we did not add these lines (we just looked at them one time). Do not worry about col3
(it will take the same value if col1
and col2
take the same value.
EDIT: second dput:
structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 4.7, 4.6, 5.1), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.1, 3.4), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 3.4 ), col4 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), count = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), r_new = c(5.6, 5.6, 5.6, 5.6, 5.6, 9.6, 9.6, 9.6, 9.6, 9.6)), .Names = c("col1", "col2", "col3", "col4", "count", "r_new"), row.names = c(NA, -10L), class = c("data.table", "data.frame")) col1 col2 col3 col4 count r_new 1: 5.1 3.5 1.4 A 1 5.6 2: 5.1 3.5 1.4 A 1 5.6 3: 4.7 3.2 1.3 A 1 5.6 4: 4.6 3.1 1.5 A 1 5.6 5: 5.0 3.6 1.4 A 1 5.6 6: 5.1 3.5 3.4 B 1 9.6 7: 5.1 3.5 3.4 B 1 9.6 8: 4.7 3.2 1.3 B 1 9.6 9: 4.6 3.1 1.5 B 1 9.6 10: 5.1 3.4 3.4 B 1 9.6
EDIT 2: Third dput
col1 col2 col3 col4 count r_new 1: 5.1 3.5 1.4 A 1 5.6 2: 5.1 3.5 1.4 A 1 5.6 3: 4.7 3.2 1.3 A 1 5.6 4: 4.6 3.1 1.5 A 1 5.6 5: 5.0 3.6 1.4 A 1 5.6 6: 5.1 3.5 3.4 B 1 6.2 7: 5.1 3.5 3.4 B 1 6.2 8: 4.7 3.2 1.3 B 1 6.2 9: 4.6 3.1 1.5 B 1 6.2 10: 5.1 3.5 3.4 B 1 6.2 structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 4.7, 4.6, 5.1), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.1, 3.5), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 3.4 ), col4 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), count = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), r_new = c(5.6, 5.6, 5.6, 5.6, 5.6, 6.2, 6.2, 6.2, 6.2, 6.2)), .Names = c("col1", "col2", "col3", "col4", "count", "r_new"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))