Loop calculation as it goes to r

I find it difficult to perform a calculation that is defined iteratively. The following data is given as an example (the actual data set is much larger):

## DATA ## # Columns Individual<-c("A","B","C","D","E","F","G","H1","H2","H3","H4","H5","K1","K2","K3","K4","K5") P1<-c(0,0,"A",0,"C","C",0, rep("E",5),"H1","H2","H3","H4","H5") P2<-c(0,0,"B",0,"D", "E",0,rep("G",5),"H1","H2","H3","H4","H5") # Dataframe myd<-data.frame(Individual,P1,P2,stringsAsFactors=FALSE) Individual P1 P2 1 A 0 0 2 B 0 0 3 CAB 4 D 0 0 5 ECD 6 FCE 7 G 0 0 8 H1 EG 9 H2 EG 10 H3 EG 11 H4 EG 12 H5 EG 13 K1 H1 H1 14 K2 H2 H2 15 K3 H3 H3 16 K4 H4 H4 17 K5 H5 H5 

The data represent the relationship between an Individual and two parents, P1 , P2 .

The required calculation, indicated by relationA , shows how much each person is associated with A.

By definition, the relationship between A and A is set to 1. The values ​​for all other entities should be calculated based on the information in the table as follows:

 The value of relationA for an individual should be equal to 1/2 (the value of relationA of P1 of the individual) + 1/2 (the value of relationA of P2 of the individual) 

FOR EXAMPLE

  Individual P1 P2 relationA 1 A 0 0 1 2 B 0 0 0 3 CAB (A = 1 + B = 0)/2 = 0.5 4 D 0 0 0 5 ECD (C= 0.5 + D = 0)/2 = 0.25 6 FCE (C = 0.5 + E = 0.25)/2 = 0.375 

The expected output is as follows:

  Individual P1 P2 relationA 1 A 0 0 1 2 B 0 0 0 3 CAB 0.5 4 D 0 0 0 5 ECD 0.25 6 FCE 0.375 7 G 0 0 0 8 H1 EG 0.125 9 H2 EG 0.125 10 H3 EG 0.125 11 H4 EG 0.125 12 H5 EG 0.125 13 K1 H1 H1 0.125 14 K2 H2 H2 0.125 15 K3 H3 H3 0.125 16 K4 H4 H4 0.125 17 K5 H5 H5 0.125 

My difficulty is to correctly express this in R Any help would be greatly appreciated.

+4
source share
2 answers

Edit:

more concisely, you can use sapply and rowSums to end the for-loop in one line of code:

 # Initialize values of relationA myd$relationA <- 0 myd$relationA[myd$Individual=="A"] <- 1 # Calculate relationA myd$relationA <- myd$relationA + rowSums(sapply(myd$Individual, function(indiv) myd$relationA[myd$Individual==indiv]/2 * ((myd$P1==indiv) + (myd$P2==indiv)))) 



Is this what you are looking for?

 # Initialize values of relationA myd$relationA <- 0 myd$relationA[myd$Individual=="A"] <- 1 # Iterate over all Individuals for (indiv in myd$Individual) { indiVal <- myd$relationA[myd$Individual==indiv] # all columns handled at once, thanks to vectorization; no need for myd$P1[i] myd$relationA <- myd$relationA + indiVal/2 * ((myd$P1==indiv) + (myd$P2==indiv)) } 

Output

 myd Individual P1 P2 relationA 1 A 0 0 1.000 2 B 0 0 0.000 3 CAB 0.500 4 D 0 0 0.000 5 ECD 0.250 6 FCE 0.375 7 G 0 0 0.000 8 H1 EG 0.125 9 H2 EG 0.125 ... 
+3
source

You can write a function to calculate the value given by individual and (implicitly) relationships as a simple recursive function.

 relationA <- function(ind) { if(ind == "A") { 1 } else if (ind == "0") { 0 } else { pts <- myd[myd$Individual == ind,] (relationA(pts[["P1"]]) + relationA(pts[["P2"]])) / 2 } } 

Simply, if individual A is equal to 1; if an individual is 0, then he is 0; for any other, recursively call relationA for each parent ( P1 and P2 ) corresponding to the individual, and add them together and divide by 2. This only works for the individual person at a time:

 > relationA("A") [1] 1 > relationA("F") [1] 0.375 > relationA("K5") [1] 0.125 

but you can easily digitize it relative to all faces:

 > sapply(myd$Individual, relationA) ABCDEFG H1 H2 H3 H4 H5 K1 1.000 0.000 0.500 0.000 0.250 0.375 0.000 0.125 0.125 0.125 0.125 0.125 0.125 K2 K3 K4 K5 0.125 0.125 0.125 0.125 

and this can be returned to myd with

 myd$relationA <- sapply(myd$Individual, relationA) 

This is not particularly effective, because it must again and again calculate relationA for each case. When it comes to "K5", it calls reationA("H5") twice, each of which calls relationA("E") and relationA("G") , and those that call relationA("C") , relationA("D") , relationA("0") and relationA("0") , etc. no, the results are not cached, but recalculated every time. For this small dataset, this does not matter, because even inefficient is still very fast.

If you need / need to cache the results and use this cache, you can modify relationA to do this.

 relationAc <- function(ind) { pts <- myd[myd$Individual == ind,] if(nrow(pts) == 0 | any(is.na(pts[["relationA"]]))) { relationA <- if(ind == "A") { 1 } else if (ind == "0") { 0 } else { (relationAc(pts[["P1"]]) + relationAc(pts[["P2"]])) / 2 } myd[myd$Individual == ind, "relationA"] <<- relationA relationA } else { pts[["relationA"]] } } 

Then you need to initialize the cache:

 myd$relationA <- NA_real_ 

One call will fill in the required values, and a call to the entire set of persons will lead to filling all the values.

 > myd Individual P1 P2 relationA 1 A 0 0 NA 2 B 0 0 NA 3 CAB NA 4 D 0 0 NA 5 ECD NA 6 FCE NA 7 G 0 0 NA 8 H1 EG NA 9 H2 EG NA 10 H3 EG NA 11 H4 EG NA 12 H5 EG NA 13 K1 H1 H1 NA 14 K2 H2 H2 NA 15 K3 H3 H3 NA 16 K4 H4 H4 NA 17 K5 H5 H5 NA > relationAc("K5") [1] 0.125 > myd Individual P1 P2 relationA 1 A 0 0 1.000 2 B 0 0 0.000 3 CAB 0.500 4 D 0 0 0.000 5 ECD 0.250 6 FCE NA 7 G 0 0 0.000 8 H1 EG NA 9 H2 EG NA 10 H3 EG NA 11 H4 EG NA 12 H5 EG 0.125 13 K1 H1 H1 NA 14 K2 H2 H2 NA 15 K3 H3 H3 NA 16 K4 H4 H4 NA 17 K5 H5 H5 0.125 > sapply(myd$Individual, relationAc) ABCDEFG H1 H2 H3 H4 H5 K1 1.000 0.000 0.500 0.000 0.250 0.375 0.000 0.125 0.125 0.125 0.125 0.125 0.125 K2 K3 K4 K5 0.125 0.125 0.125 0.125 > myd Individual P1 P2 relationA 1 A 0 0 1.000 2 B 0 0 0.000 3 CAB 0.500 4 D 0 0 0.000 5 ECD 0.250 6 FCE 0.375 7 G 0 0 0.000 8 H1 EG 0.125 9 H2 EG 0.125 10 H3 EG 0.125 11 H4 EG 0.125 12 H5 EG 0.125 13 K1 H1 H1 0.125 14 K2 H2 H2 0.125 15 K3 H3 H3 0.125 16 K4 H4 H4 0.125 17 K5 H5 H5 0.125 
+4
source

Source: https://habr.com/ru/post/1446497/


All Articles