A subset of the function to calculate the total number of rows

I have a data frame with the results for some tools, and I want to create a new column that contains the totals for each row. Since every time I start analyzing new data, I have a different number of tools, I need a function to dynamically calculate a new column using Row Total.

For just my problem, what my data frame looks like:

Type Value 1 A 10 2 A 15 3 A 20 4 A 25 5 B 30 6 B 40 7 B 50 8 B 60 9 B 70 10 B 80 11 B 90 

My goal is to achieve the following results:

  AB Total 1 10 30 40 2 15 40 55 3 20 50 70 4 25 60 85 5 70 70 6 80 80 7 90 90 

Ive tried a different method, but this method has the greatest value:

 myList <- list(a = c(10, 15, 20, 25), b = c(30, 40, 50, 60, 70, 80, 90)) tmpDF <- data.frame(sapply(myList, '[', 1:max(sapply(myList, length)))) > tmpDF ab 1 10 30 2 15 40 3 20 50 4 25 60 5 NA 70 6 NA 80 7 NA 90 totalSum <- rowSums(tmpDF) totalSum <- data.frame(totalSum) tmpDF <- cbind(tmpDF, totalSum) > tmpDF ab totalSum 1 10 30 40 2 15 40 55 3 20 50 70 4 25 60 85 5 NA 70 NA 6 NA 80 NA 7 NA 90 NA 

Although this method was able to combine two data frames of different lengths, the rowSums function gives incorrect values โ€‹โ€‹in this example. In addition, my source data is not in list format, so I cannot apply such a โ€œsolutionโ€.

I think Im breaking this problem, so I was wondering how I can ...

  • A subset of the data from a data frame based on Type,
  • Insert these individual subsets of different lengths into a new data frame,
  • Add the โ€œTotalโ€ column to this data frame, which is the correct sum of the individual subsets.

An additional complication of this problem is that it needs to be done in a function or dynamically, so I do not need to manually multiply the tens of "types (A, B, C, etc.) in my data frame.

Here is what I still have that doesn't work, but illustrates the lines that I think:

 TotalDf <- function(x){ tmpNumberOfTypes <- c(levels(x$Type)) for( i in tmpNumberOfTypes){ subSetofData <- subset(x, Type = i, select = Value) if( i == 1) { totalDf <- subSetOfData } else{ totalDf <- cbind(totalDf, subSetofData)} } return(totalDf) } 

Thanks in advance for any thoughts or ideas about this,

Hello,

 EDIT: 

Thanks to Joris's comment (see below), I got the end in the right direction, however, trying to translate its solution into my data frame, I have additional problems. His suggested answer works and gives me the following (correct) sum of the values โ€‹โ€‹of A and B:

 > tmp78 <- tapply(DF$value,DF$id,sum) > tmp78 1 2 3 4 5 6 6 8 10 12 9 10 > data.frame(tmp78) tmp78 1 6 2 8 3 10 4 12 5 9 6 10 

However, when I try to execute this solution on my data frame, it does not work:

 > subSetOfData <- copyOfTradesList[c(1:3,11:13),c(1,10)] > subSetOfData Instrument AccountValue 1 JPM 6997 2 JPM 7261 3 JPM 7545 11 KFT 6992 12 KFT 6944 13 KFT 7069 > unlist(sapply(rle(subSetOfData$Instrument)$lengths,function(x) 1:x)) Error in rle(subSetOfData$Instrument) : 'x' must be an atomic vector > subSetOfData$InstrumentNumeric <- as.numeric(subSetOfData$Instrument) > unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x)) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 > subSetOfData$id <- unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x)) Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 2L, 3L, 1L, 2L, : replacement has 3 rows, data has 6 

I have a disturbing idea that Im going in a circle ...

+4
source share
2 answers

Two thoughts:

1) you can use na.rm = T in rowSums

2) How do you know who to go with? You can add some indexing.

eg:

 DF <- data.frame( type=c(rep("A",4),rep("B",6)), value = 1:10, stringsAsFactors=F ) DF$id <- unlist(lapply(rle(DF$type)$lengths,function(x) 1:x)) 

Now it allows an easy tapply sum on the original data frame

 tapply(DF$value,DF$id,sum) 

And, more importantly, get your framework in the correct form:

 > DF type value id 1 A 1 1 2 A 2 2 3 A 3 3 4 A 4 4 5 B 5 1 6 B 6 2 7 B 7 3 8 B 8 4 9 B 9 5 10 B 10 6 > library(reshape) > cast(DF,id~type) id AB 1 1 1 5 2 2 2 6 3 3 3 7 4 4 4 8 5 5 NA 9 6 6 NA 10 
+3
source
 TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B") , Value = c(10,15,20,25,30,40,50,60,70,80,90) , stringsAsFactors = FALSE) # Added Type C for testing # TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B", "C", "C", "C") # , Value = c(10,15,20,25,30,40,50,60,70,80,90, 100, 150, 130) # , stringsAsFactors = FALSE) lnType <- with(TV, tapply(Value, Type, length)) lnType <- as.integer(lnType) lnType id <- unlist(mapply(FUN = rep_len, length.out = lnType, x = list(1:max(lnType)))) (TV <- cbind(id, TV)) require(reshape2) tvWide <- dcast(TV, id ~ Type) # Alternatively # tvWide <- reshape(data = TV, direction = "wide", timevar = "Type", ids = c(id, Type)) tvWide <- subset(tvWide, select = -id) # If you want something neat without the <NA> # for(i in 1:ncol(tvWide)){ # # if (is.na(tvWide[j,i])){ # tvWide[j,i] = 0 # } # # } # } tvWide transform(tvWide, rowSum=rowSums(tvWide, na.rm = TRUE)) 
0
source

Source: https://habr.com/ru/post/1334052/


All Articles