Conditional mutant cumsum dlpyr

I have cities (from A to D) that have different populations and are at different distances. The goal is to reduce the total population living within a circle of radius (distance XY), where X is the city in the center of the circle and Y is any other city.

In this code:

Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B", "C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From", "Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame") Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000, 800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA, -4L), class = "data.frame") Df <- Df %>% left_join(Df2,by=c("Town_From"="Town")) %>% left_join(Df2,by=c("Town_To"="Town"))%>% group_by(Town_From) %>% arrange(Distance) colnames(Df)[4]<-c("pop_TF") colnames(Df)[5]<-c("pop_TT") Source: local data frame [6 x 5] Groups: Town_From [3] Town_From Town_To Distance pop_TF pop_TT <chr> <chr> <dbl> <dbl> <dbl> 1 AC 5 1000 500 2 AB 10 1000 800 3 BC 17 800 500 4 AD 18 1000 200 5 BD 20 800 200 6 CD 21 500 200 

Cities were organized (Town_From) and organized (distance).

In a circle of radius 5 km (from A to C) live 1000 (in A) + 500 (in C) = 1500 people; 1500 + 800 (in B) = 2300 live in the next circle. 2300 people still live in the third circle, because cities A, B, C are in a radius from B to C = 17 km. Within the radius of the circle from A to D = 18 km, live 2300 + 200 (in D) = 2500 people.

Here is a visualization of the circles in question. Theoretically, circles can expand to any arbitrary radius. In practice, I only need to check them at distances between pairs of cities (places where the counters change).

enter image description here

+5
source share
2 answers

It’s easier for this if you can put your data in a format where each city is represented at each β€œend” of the distance (both from and from). So, I changed the change you made at the end to Df for this. Note that it uses complete from tidyr .

 Df_full <- Df %>% bind_rows( select(Df, Town_From = Town_To, Town_To = Town_From, Distance) ) %>% complete(Town_From, Town_To, fill = list(Distance = 0)) %>% left_join(Df2, c("Town_To" = "Town")) 

This will cancel the to-from relationship and add it to the bottom of the list. He then uses complete to add the city as his own "To" (for example, A to A). Finally, it unites the population, but they only need to be added once. Here is the new data:

 # A tibble: 16 Γ— 4 Town_From Town_To Distance Population <chr> <chr> <dbl> <dbl> 1 AA 0 1000 2 AB 10 800 3 AC 5 500 4 AD 18 200 5 BA 10 1000 6 BB 0 800 7 BC 17 500 8 BD 20 200 9 CA 5 1000 10 CB 17 800 11 CC 0 500 12 CD 21 200 13 DA 18 1000 14 DB 20 800 15 DC 21 500 16 DD 0 200 

Then we set the thresholds that we want to explore. In your question, you mean that you want to use each of the unique pair distances. If you prefer any other kit for your production, simply enter them here.

 radiusCuts <- Df_full$Distance %>% unique %>% sort 

Then we will create the sum command, which sums only the paired cities in the radius, setting names in the process to facilitate the use of summarise_ in an instant.

 forPops <- radiusCuts %>% setNames(paste("Pop within", ., "km")) %>% lapply(function(x){ paste("sum(Population[Distance <=", x,"])") }) 

Finally, we are group_by Town_From and pass these constructed arguments to the standard evaluation function summarise_ , which will create each of the columns in forPops :

 Df_full %>% group_by(Town_From) %>% summarise_(.dots = forPops) 

gives:

 # A tibble: 4 Γ— 8 Town_From `Pop within 0 km` `Pop within 5 km` `Pop within 10 km` `Pop within 17 km` `Pop within 18 km` `Pop within 20 km` `Pop within 21 km` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 A 1000 1500 2300 2300 2500 2500 2500 2 B 800 800 1800 2300 2300 2500 2500 3 C 500 1500 1500 2300 2300 2300 2500 4 D 200 200 200 200 1200 2000 2500 

Which should give you all the thresholds you want.

+4
source

If your goal is to calculate the sum of the population as a function of increasing the distance from each city (in the center of the circle), then we can (i) group by Town_From , (ii) sort each of these groups by Distance , and then (iii) calculate cumsum . Using dplyr :

 library(dplyr) res <- Df %>% group_by(Town_From) %>% arrange(Distance) %>% mutate(sumPop=pop_TF+cumsum(pop_TT)) 

Using your data, the result:

 print(res) ##Source: local data frame [6 x 6] ##Groups: Town_From [3] ## ## Town_From Town_To Distance pop_TF pop_TT sumPop ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ##1 AC 5 1000 500 1500 ##2 AB 10 1000 800 2300 ##3 BC 17 800 500 1300 ##4 AD 18 1000 200 2500 ##5 BD 20 800 200 1500 ##6 CD 21 500 200 700 
+1
source

Source: https://habr.com/ru/post/1262992/


All Articles