I have a data frame with the results for some tools, and I want to create a new column that contains the totals for each row. Since every time I start analyzing new data, I have a different number of tools, I need a function to dynamically calculate a new column using Row Total.
For just my problem, what my data frame looks like:
Type Value 1 A 10 2 A 15 3 A 20 4 A 25 5 B 30 6 B 40 7 B 50 8 B 60 9 B 70 10 B 80 11 B 90
My goal is to achieve the following results:
AB Total 1 10 30 40 2 15 40 55 3 20 50 70 4 25 60 85 5 70 70 6 80 80 7 90 90
Ive tried a different method, but this method has the greatest value:
myList <- list(a = c(10, 15, 20, 25), b = c(30, 40, 50, 60, 70, 80, 90)) tmpDF <- data.frame(sapply(myList, '[', 1:max(sapply(myList, length)))) > tmpDF ab 1 10 30 2 15 40 3 20 50 4 25 60 5 NA 70 6 NA 80 7 NA 90 totalSum <- rowSums(tmpDF) totalSum <- data.frame(totalSum) tmpDF <- cbind(tmpDF, totalSum) > tmpDF ab totalSum 1 10 30 40 2 15 40 55 3 20 50 70 4 25 60 85 5 NA 70 NA 6 NA 80 NA 7 NA 90 NA
Although this method was able to combine two data frames of different lengths, the rowSums function gives incorrect values โโin this example. In addition, my source data is not in list format, so I cannot apply such a โsolutionโ.
I think Im breaking this problem, so I was wondering how I can ...
- A subset of the data from a data frame based on Type,
- Insert these individual subsets of different lengths into a new data frame,
- Add the โTotalโ column to this data frame, which is the correct sum of the individual subsets.
An additional complication of this problem is that it needs to be done in a function or dynamically, so I do not need to manually multiply the tens of "types (A, B, C, etc.) in my data frame.
Here is what I still have that doesn't work, but illustrates the lines that I think:
TotalDf <- function(x){ tmpNumberOfTypes <- c(levels(x$Type)) for( i in tmpNumberOfTypes){ subSetofData <- subset(x, Type = i, select = Value) if( i == 1) { totalDf <- subSetOfData } else{ totalDf <- cbind(totalDf, subSetofData)} } return(totalDf) }
Thanks in advance for any thoughts or ideas about this,
Hello,
EDIT:
Thanks to Joris's comment (see below), I got the end in the right direction, however, trying to translate its solution into my data frame, I have additional problems. His suggested answer works and gives me the following (correct) sum of the values โโof A and B:
> tmp78 <- tapply(DF$value,DF$id,sum) > tmp78 1 2 3 4 5 6 6 8 10 12 9 10 > data.frame(tmp78) tmp78 1 6 2 8 3 10 4 12 5 9 6 10
However, when I try to execute this solution on my data frame, it does not work:
> subSetOfData <- copyOfTradesList[c(1:3,11:13),c(1,10)] > subSetOfData Instrument AccountValue 1 JPM 6997 2 JPM 7261 3 JPM 7545 11 KFT 6992 12 KFT 6944 13 KFT 7069 > unlist(sapply(rle(subSetOfData$Instrument)$lengths,function(x) 1:x)) Error in rle(subSetOfData$Instrument) : 'x' must be an atomic vector > subSetOfData$InstrumentNumeric <- as.numeric(subSetOfData$Instrument) > unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x)) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 > subSetOfData$id <- unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x)) Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 2L, 3L, 1L, 2L, : replacement has 3 rows, data has 6
I have a disturbing idea that Im going in a circle ...