I analyze large tables (300,000 - 500,000 rows) that store data obtained using the disease simulation model. In the model, animals on the landscape infect other animals. For example, in the example below, animal a1 infects each animal in the landscape, and the infection passes from animal to animal, separating from the "chains" of the infection.
In my example below, I want to take a table that stores information about each animal (in my example below, table = allanimals) and cut out only information about the animals the infection chain (I selected the chain in green), so I can calculate the average value of the environment for this chain of infection. d2 d2
Although my while loop works, it is slow, like molasses, when hundreds of thousands of rows are stored in a table and the chain has 40-100 members.
Any ideas on how to speed this up? We hope for a solution tidyverse. I know this "looks fast enough" with my sample data set, but with my data it is very slow ...
Scheme:

Desired conclusion from the example below:
AnimalID InfectingAnimal habitat
1 d2 d1 1
2 d1 c3 1
3 c3 c2 3
4 c2 c1 2
5 c1 b3 3
6 b3 b2 6
7 b2 b1 5
8 b1 a2 4
9 a2 a1 2
10 a1 x 1
Code example:
library(tidyverse)
allanimals <- structure(list(AnimalID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8",
"b1", "b2", "b3", "b4", "b5", "c1", "c2", "c3", "c4", "d1", "d2", "e1", "e2",
"e3", "e4", "e5", "e6", "f1", "f2", "f3", "f4", "f5", "f6", "f7"),
InfectingAnimal = c("x", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a2", "b1",
"b2", "b3", "b4", "b3", "c1", "c2", "c3", "c3", "d1", "b1", "e1", "e2", "e3",
"e4", "e5", "e1", "f1", "f2", "f3", "f4", "f5", "f6"), habitat = c(1L, 2L, 1L,
2L, 2L, 1L, 3L, 2L, 4L, 5L, 6L, 1L, 2L, 3L, 2L, 3L, 2L, 1L, 1L, 2L, 5L, 4L,
1L, 1L, 1L, 1L, 4L, 5L, 4L, 5L, 4L, 3L)), .Names = c("AnimalID",
"InfectingAnimal", "habitat"), class = "data.frame", row.names = c(NA, -32L))
head(allanimals)
Focal.Animal <- "d2"
Focal.Animal <- allanimals %>%
filter(AnimalID == Focal.Animal)
Focal.Animal
Chain <- Focal.Animal
InfectingAnimalInTable <- TRUE
ptm <- proc.time()
while(InfectingAnimalInTable == TRUE){
NextAnimal <- Chain %>%
slice(n()) %>%
select(InfectingAnimal) %>%
unlist()
NextRow <- allanimals %>%
filter(AnimalID == NextAnimal)
if (nrow(NextRow) > 0) {
Chain[(nrow(Chain)+1),] <- NextRow
} else {InfectingAnimalInTable <- FALSE}
}
proc.time() - ptm
Chain