Link String in R.table Dataset

Say I have the following sample data:

iris <- data.table(iris)[c(1:5,51:55,101:105), list(ID=.I, Species,Sepal.Length)] 

Then say that I want to calculate the absolute difference between the lines within the group (in this case Species ).

 iris[ , SL.Diff := c(NA,abs(diff(Sepal.Length))) , by = Species] 

At this point, I have a dataset that looks like this:

  ID Species Sepal.Length SL.Diff 1: 1 setosa 5.1 NA 2: 2 setosa 4.9 0.2 3: 3 setosa 4.7 0.2 4: 4 setosa 4.6 0.1 5: 5 setosa 5.0 0.4 6: 6 versicolor 7.0 NA 

Now I want to compute a new variable, Sepal.Length2 , which takes the value of the next line if SL.Diff less than a threshold of 0.3.

 iris[ , Sepal.Length2 := ifelse(SL.Diff < 0.3, iris[ID+1]$Sepal.Length, Sepal.Length)] 

It works the way I want it. But what if I want to do the same comparison, but instead of taking the next line, I want to take the value of the previous line?

 iris[ , Sepal.Length3 := ifelse(SL.Diff < 0.3, iris[ID-1]$Sepal.Length, Sepal.Length)] 

Sepal.Length3 does not give the result that I expected. Does anyone know what I can do wrong here?

  ID Species Sepal.Length SL.Diff Sepal.Length2 Sepal.Length3 1: 1 setosa 5.1 NA NA NA 2: 2 setosa 4.9 0.2 4.7 4.9 3: 3 setosa 4.7 0.2 4.6 4.7 4: 4 setosa 4.6 0.1 5.0 4.6 5: 5 setosa 5.0 0.4 5.0 5.0 6: 6 versicolor 7.0 NA NA NA 7: 7 versicolor 6.4 0.6 6.4 6.4 8: 8 versicolor 6.9 0.5 6.9 6.9 9: 9 versicolor 5.5 1.4 5.5 5.5 10: 10 versicolor 6.5 1.0 6.5 6.5 11: 11 virginica 6.3 NA NA NA 12: 12 virginica 5.8 0.5 5.8 5.8 13: 13 virginica 7.1 1.3 7.1 7.1 14: 14 virginica 6.3 0.8 6.3 6.3 15: 15 virginica 6.5 0.2 NA 5.1 
+6
source share
3 answers

data.

therefore

iris[ID+1]$Sepal.Length evaulates ID in the iris (second time).

Your problem really arises from the fact that you are creating index 0 (which R quietly discards)

 a <- c('a','b') a[0:1] # [1] "a" a[1] # [1] "a" 

So, you need to better deal with "known NA values" and mean NA values.

Here is the approach

 # calculate the "threshold" column iris[,thresh := SL.Diff <0.3] # where does it need to go "up" and what indexed value need it go up by iris[!is.na(thresh), up := ifelse(thresh, ID+1L,ID)] # create the column iris[, S2 := Sepal.Length[up]] # the same for "down" iris[!is.na(thresh), down := ifelse(thresh, ID-1L,ID)] iris[, S3 := Sepal.Length[down]] iris # ID Species Sepal.Length SL.Diff thresh up S2 down S3 # 1: 1 setosa 5.1 NA NA NA NA NA NA # 2: 2 setosa 4.9 0.2 TRUE 3 4.7 1 5.1 # 3: 3 setosa 4.7 0.2 TRUE 4 4.6 2 4.9 # 4: 4 setosa 4.6 0.1 TRUE 5 5.0 3 4.7 # 5: 5 setosa 5.0 0.4 FALSE 5 5.0 5 5.0 # 6: 6 versicolor 7.0 NA NA NA NA NA NA # 7: 7 versicolor 6.4 0.6 FALSE 7 6.4 7 6.4 # 8: 8 versicolor 6.9 0.5 FALSE 8 6.9 8 6.9 # 9: 9 versicolor 5.5 1.4 FALSE 9 5.5 9 5.5 # 10: 10 versicolor 6.5 1.0 FALSE 10 6.5 10 6.5 # 11: 11 virginica 6.3 NA NA NA NA NA NA # 12: 12 virginica 5.8 0.5 FALSE 12 5.8 12 5.8 # 13: 13 virginica 7.1 1.3 FALSE 13 7.1 13 7.1 # 14: 14 virginica 6.3 0.8 FALSE 14 6.3 14 6.3 # 15: 15 virginica 6.5 0.2 TRUE 16 NA 14 6.3 
+4
source

Not sure about the speed effects of this, but here's another try:

 # make a column of the next values using head() iris[, S3 := c(NA,head(Sepal.Length,-1)), by=Species] # overwrite those values not meeting your criteria with the original values iris[ !(SL.Diff < 0.3), S3 := Sepal.Length] iris # ID Species Sepal.Length SL.Diff S3 # 1: 1 setosa 5.1 NA NA # 2: 2 setosa 4.9 0.2 5.1 # 3: 3 setosa 4.7 0.2 4.9 # 4: 4 setosa 4.6 0.1 4.7 # 5: 5 setosa 5.0 0.4 5.0 # 6: 6 versicolor 7.0 NA NA # 7: 7 versicolor 6.4 0.6 6.4 # 8: 8 versicolor 6.9 0.5 6.9 # 9: 9 versicolor 5.5 1.4 5.5 #10: 10 versicolor 6.5 1.0 6.5 #11: 11 virginica 6.3 NA NA #12: 12 virginica 5.8 0.5 5.8 #13: 13 virginica 7.1 1.3 7.1 #14: 14 virginica 6.3 0.8 6.3 #15: 15 virginica 6.5 0.2 6.3 
+5
source

I think dplyr makes this a little easier to express by providing lead() and lag() functions:

 library(dplyr) iris2 <- iris[c(1:5, 51:55, 101:105), c("Species", "Sepal.Length")] names(iris2) <- c("species", "sepal") iris2$id <- 1:15 iris2 %>% group_by(species) %>% mutate( thres = abs(sepal - lag(sepal)), up = ifelse(thres < 0.3, lead(sepal), sepal), down = ifelse(thres < 0.3, lag(sepal), sepal) ) 
+1
source

Source: https://habr.com/ru/post/973204/


All Articles