I use the data.tree structure to summarize various information between file folders. In each folder, I have several files (value), and what I need to do for each folder is to sum how many files the folder + all subfolders contains.
Sample data:
library(data.tree)
data <- data.frame(pathString = c("MainFolder",
"MainFolder/Folder1",
"MainFolder/Folder2",
"MainFolder/Folder3",
"MainFolder/Folder1/Subfolder1",
"MainFolder/Folder1/Subfolder2"),
Value = c(1,1,5,2,4,10))
tree <- as.Node(data, Value)
print(tree, "Value")
levelName Value
1 MainFolder 1
2 ¦--Folder1 1
3 ¦ ¦--Subfolder1 4
4 ¦ °--Subfolder2 10
5 ¦--Folder2 5
6 °--Folder3 2
My current and VERY SLOW solution to the problem:
total_count <- function(node) {
results <- sum(as.data.frame(print(node, "Value"))$Value)
return(results)
}
tree$Do(function(node) node$Value_by_folder <- total_count(node))
print(tree, "Value", "Value_by_folder")
levelName Value Value_by_folder
1 MainFolder 1 23
2 ¦--Folder1 1 15
3 ¦ ¦--Subfolder1 4 4
4 ¦ °--Subfolder2 10 10
5 ¦--Folder2 5 5
6 °--Folder3 2 2
Do you have any suggestion on how to do this more efficiently? I am trying to create a recursive method and also use the "isLeaf" and "children" functions on the nodes, but could not get it working.
source
share