How to calculate and verify the amount and repeat the action

I need to check the value of'peso '(see replication code below) for each factor. Regardless of whether the factor reaches 50% of the total for the "peso", the values โ€‹โ€‹of each factor should be inserted into the results of the new object, otherwise R should evaluate which coefficient has the smallest aggregate value for the "peso" and consider the coefficient in the next column for the aggregate "peso" again. Basically, this process replaces the lowest coefficient for the next factor. The process should be repeated until the coefficient overcomes the 50% threshold. So my question is: where to start?

set.seed(51) Data <- sapply(1:100, function(x) sample(1:10, size=5)) Data <- data.frame(t(Data)) names(Data) <- letters[1:5] Data$peso <- sample(0:3.5, 100, rep=TRUE) 

It should be like

 If your first two rows are: abcde peso 8 2 3 7 9 1 8 3 4 5 7 3 9 7 4 10 1 2 10 3 4 5 7 3 What would you like for the total? Totals_08 = 4 Totals_09 = 2 Totals_10 = 3 etc? 

Thus, coefficient 8 received a large share of 4 / (4 + 2 + 3) = 0.4444444, but did not reach the 50% threshold in round a. So I need something else: repeat aggregation, but now consider factor 7 in column "b" instead of factors 9 in column "a", since it received the smallest aggregated value in the first round.

+4
source share
1 answer

It is unclear whether you have your list of factors already or not. If you donโ€™t have it and you take it from a data set, you can capture it in several ways:

 # Get a list of all the factors myFactors <- levels(Data[[1]]) # If actual factors. myFactors <- sort(unique(unlist(Data))) # Otherwise use similar to this line 


Then, to calculate the totals for the coefficient, you can do the following

 Totals <- colSums(sapply(myFactors, function(fctr) # calculate totals per fctr as.integer(Data$peso) * rowSums(fctr == subset(Data, select= -peso)) )) names(Totals) <- myFactors 

What gives

 Totals # 1 2 3 4 5 6 7 8 9 10 # 132 153 142 122 103 135 118 144 148 128 



Next: Iโ€™m not sure that after that you will want to compare with the sum of the pesos or the sum of the totals. Here are both options, broken down into stages:
 # Calculate the total of all the Totals: TotalSum <- sum(Totals) # See percentage for each: Totals / TotalSum Totals / sum(as.integer(Data$peso)) # See which, if any, is greater than 50% Totals / TotalSum > 0.50 Totals / sum(as.integer(Data$peso)) > 0.50 # Using Which to identify the ones you are looking for which(Totals / TotalSum > 0.50) which(Totals / sum(as.integer(Data$peso)) > 0.50) 



Selection Note for Peso

You took the sample 0:3.5 , however the sequence x:y gives only integers. If you want fractions, you can use seq() , or you can make a larger sequence and then split accordingly:

 option1 <- (0:7) / 2 option2 <- seq(from=0, to=3.5, by=0.5) 

If you want integers from 0: 3, as well as a value of 3.5, use c ()

  option3 <- c(0:3, 3.5) 
+1
source

Source: https://habr.com/ru/post/1446379/


All Articles