How to calculate the number of occurrences of a certain value in a row in R

I have a rather complicated problem that I just cannot solve.

I have a large dataset (23277 rows, 151 columns). Each column has values ​​from 0: 100 (inclusive), representing the probabilities assigned to world events.

As part of calculating the score for each person, I need to count the occurrences of each of the values ​​in the data set.

I tried applying first, but I need to ignore NA and a subset, so when I tried the following:

apply(ans.samp, 1, sum(ans.samp[ans==0]), na.rm=TRUE) 

I got an error: sum (ans.samp [ans == 0]) 'is not a function, character or symbol

I repeated this process with sapply, vapply, tapply and do.call to no avail.

Having abandoned the vectorized solution, I wrote the following for loop.

 RespCount <- function (x) { for (i in (1:nrow(x))) { res <- vector(mode="numeric", length=nrow(x)) ans.tmp <- x[i,] res[i] <- length(ans.tmp[ans.tmp==0]) print(res) } return(res) } 

However, after I got this job, it only returns the total amount O in the sample.

I would appreciate some help with this, as I am under some time pressure, and I would like to be able to solve these problems in R in the future.

Examples of data for reproducibility:

 structure(list(X = 1:6, X100 = c(70L, NA, 80L, 0L, 40L, NA), X10 = c(30L, NA, NA, NA, NA, NA), X1 = c(50L, NA, NA, NA, NA, NA), X11 = c(50L, NA, NA, NA, NA, NA), X12 = c(30L, NA, NA, NA, NA, NA), X13 = c(50L, NA, NA, NA, NA, NA), X14 = c(70L, NA, NA, NA, NA, NA), X15 = c(60L, NA, NA, NA, NA, NA), X158 = c(30L, NA, NA, NA, NA, NA), X159 = c(50L, NA, NA, NA, NA, NA), X160 = c(80L, NA, NA, NA, NA, NA), X16 = c(50L, NA, NA, NA, NA, NA), X161 = c(40L, NA, NA, NA, NA, NA), X162 = c(100L, NA, NA, NA, NA, NA), X163 = c(50L, NA, NA, NA, NA, NA), X164 = c(0L, NA, NA, NA, NA, NA), X165 = c(0L, NA, NA, NA, NA, NA), X166 = c(20L, NA, NA, NA, NA, NA), X167 = c(0L, NA, NA, NA, NA, NA), X168 = c(30L, NA, NA, NA, NA, NA), X169 = c(100L, NA, NA, NA, NA, NA), X170 = c(30L, NA, NA, NA, NA, NA), X17 = c(40L, NA, NA, NA, NA, NA), X171 = c(50L, NA, NA, NA, NA, NA), X172 = c(20L, NA, NA, NA, NA, NA), X173 = c(30L, NA, NA, NA, NA, NA), X174 = c(20L, NA, NA, NA, NA, NA), X175 = c(30L, NA, NA, NA, NA, NA), X176 = c(10L, NA, NA, NA, NA, NA), X177 = c(70L, NA, NA, NA, NA, NA), X178 = c(40L, NA, NA, NA, NA, NA), X179 = c(70L, NA, NA, NA, NA, NA), X180 = c(0L, NA, NA, NA, NA, NA), X18 = c(30L, NA, NA, NA, NA, NA), X181 = c(100L, NA, NA, NA, NA, NA), X182 = c(100L, NA, NA, NA, NA, NA), X183 = c(20L, NA, NA, NA, NA, NA), X184 = c(80L, NA, NA, NA, NA, NA), X185 = c(90L, NA, NA, NA, NA, NA), X186 = c(0L, NA, NA, NA, NA, NA), X187 = c(10L, NA, NA, NA, NA, NA), X188 = c(100L, NA, NA, NA, NA, NA), X189 = c(100L, NA, NA, NA, NA, NA), X190 = c(0L, NA, NA, NA, NA, NA), X19 = c(100L, NA, NA, NA, NA, NA), X191 = c(0L, NA, NA, NA, NA, NA), X192 = c(90L, NA, NA, NA, NA, NA), X193 = c(50L, NA, NA, NA, NA, NA), X194 = c(100L, NA, NA, NA, NA, NA), X195 = c(10L, NA, NA, NA, NA, NA), X196 = c(100L, NA, NA, NA, NA, NA), X197 = c(20L, NA, NA, NA, NA, NA), X198 = c(40L, NA, NA, NA, NA, NA), X199 = c(20L, NA, NA, NA, NA, NA), X200 = c(0L, NA, NA, NA, NA, NA), X20 = c(0L, NA, NA, NA, NA, NA), X201 = c(0L, NA, NA, NA, NA, NA), X202 = c(20L, NA, NA, NA, NA, NA), X203 = c(20L, NA, NA, NA, NA, NA), X204 = c(80L, NA, NA, NA, NA, NA), X205 = c(0L, NA, NA, NA, NA, NA), X206 = c(80L, NA, NA, NA, NA, NA), X207 = c(0L, NA, NA, NA, NA, NA), X2 = c(10L, NA, NA, NA, NA, NA), X21 = c(0L, NA, NA, NA, NA, NA), X22 = c(100L, NA, NA, NA, NA, NA), X23 = c(50L, NA, NA, NA, NA, NA), X24 = c(50L, NA, NA, NA, NA, NA), X25 = c(70L, NA, NA, NA, NA, NA), X26 = c(60L, NA, NA, NA, NA, NA), X27 = c(40L, NA, NA, NA, NA, NA), X28 = c(20L, NA, NA, NA, NA, NA), X29 = c(0L, NA, NA, NA, NA, NA), X30 = c(90L, NA, NA, NA, NA, NA), X3 = c(0L, NA, NA, NA, NA, NA), X31 = c(50L, NA, NA, NA, NA, NA), X32 = c(50L, NA, NA, NA, NA, NA), X33 = c(0L, NA, NA, NA, NA, NA), X34 = c(50L, NA, NA, NA, NA, NA), X35 = c(90L, NA, NA, NA, NA, NA), X36 = c(50L, NA, NA, NA, NA, NA), X37 = c(60L, NA, NA, NA, NA, NA), X38 = c(40L, NA, NA, NA, NA, NA), X39 = c(50L, NA, NA, NA, NA, NA), X40 = c(0L, NA, NA, NA, NA, NA), X4 = c(50L, NA, NA, NA, NA, NA), X41 = c(90L, NA, NA, NA, NA, NA), X42 = c(80L, NA, NA, NA, NA, NA), X43 = c(50L, NA, NA, NA, NA, NA), X44 = c(80L, NA, NA, NA, NA, NA), X45 = c(80L, NA, NA, NA, NA, NA), X46 = c(0L, NA, NA, NA, NA, NA), X47 = c(80L, NA, NA, NA, NA, NA), X48 = c(20L, NA, NA, NA, NA, NA), X49 = c(100L, NA, NA, NA, NA, NA), X50 = c(0L, NA, NA, NA, NA, NA), X5 = c(0L, NA, NA, NA, NA, NA), X51 = c(80L, 100L, 70L, 100L, 0L, 60L ), X52 = c(10L, 0L, 0L, 0L, 0L, 20L), X53 = c(40L, 40L, 70L, 20L, 90L, 50L), X54 = c(0L, 10L, 0L, 50L, 50L, 0L), X55 = c(20L, 80L, 90L, 80L, 30L, 0L), X56 = c(100L, 100L, 50L, 100L, 80L, 100L), X57 = c(60L, 0L, 100L, 70L, 100L, 80L), X58 = c(100L, 100L, 100L, 50L, 100L, 100L), X59 = c(80L, 50L, 80L, 0L, 30L, 50L), X60 = c(70L, 50L, 60L, 50L, 100L, 100L), X6 = c(100L, NA, NA, NA, NA, NA), X61 = c(50L, 50L, 50L, 30L, 70L, 50L ), X62 = c(20L, 50L, 40L, 40L, 50L, 100L), X63 = c(50L, 0L, 100L, 10L, 50L, 100L), X64 = c(60L, 30L, 0L, 50L, 50L, 50L ), X65 = c(50L, 50L, 70L, 80L, 50L, 50L), X66 = c(70L, 40L, 10L, 90L, 60L, 50L), X67 = c(30L, 50L, 50L, 0L, 50L, 60L), X68 = c(30L, 0L, 0L, 40L, 70L, 80L), X69 = c(30L, NA, 70L, 10L, 0L, 20L), X70 = c(80L, NA, 50L, 50L, 70L, 100L), X7 = c(100L, NA, NA, NA, NA, NA), X71 = c(70L, NA, 50L, 100L, 100L, 100L ), X72 = c(60L, NA, 70L, 50L, 80L, 50L), X73 = c(80L, NA, 80L, 80L, 80L, NA), X74 = c(50L, NA, 50L, 0L, 50L, NA), X75 = c(30L, NA, 70L, 10L, 80L, NA), X76 = c(70L, NA, 40L, 80L, 100L, NA), X77 = c(80L, NA, 50L, 100L, 40L, NA), X78 = c(80L, NA, 0L, 0L, 0L, NA), X79 = c(80L, NA, 50L, 50L, 50L, NA), X80 = c(40L, NA, 90L, 70L, 60L, NA), X8 = c(50L, NA, NA, NA, NA, NA), X81 = c(70L, NA, 60L, 40L, 80L, NA), X82 = c(80L, NA, 100L, 60L, 60L, NA), X83 = c(30L, NA, 100L, 30L, 0L, NA), X84 = c(80L, NA, 0L, 60L, 100L, NA), X85 = c(80L, NA, 50L, 40L, 30L, NA ), X86 = c(50L, NA, 90L, 50L, 50L, NA), X87 = c(80L, NA, 50L, 70L, 20L, NA), X88 = c(40L, NA, 70L, 30L, 90L, NA), X89 = c(50L, NA, 50L, 80L, 80L, NA), X90 = c(90L, NA, 100L, 60L, 100L, NA), X91 = c(0L, NA, 0L, 0L, 0L, NA), X9 = c(100L, NA, NA, NA, NA, NA), X92 = c(50L, NA, 70L, 90L, 80L, NA), X93 = c(40L, NA, 50L, 50L, 50L, NA), X94 = c(40L, NA, 0L, 60L, 40L, NA), X95 = c(90L, NA, 100L, 40L, 50L, NA), X96 = c(50L, NA, 50L, 50L, 50L, NA), X97 = c(60L, NA, 60L, 100L, 50L, NA), X98 = c(40L, NA, 40L, 0L, 0L, NA), X99 = c(30L, NA, 0L, 50L, 70L, NA)), .Names = c("X", "X100", "X10", "X1", "X11", "X12", "X13", "X14", "X15", "X158", "X159", "X160", "X16", "X161", "X162", "X163", "X164", "X165", "X166", "X167", "X168", "X169", "X170", "X17", "X171", "X172", "X173", "X174", "X175", "X176", "X177", "X178", "X179", "X180", "X18", "X181", "X182", "X183", "X184", "X185", "X186", "X187", "X188", "X189", "X190", "X19", "X191", "X192", "X193", "X194", "X195", "X196", "X197", "X198", "X199", "X200", "X20", "X201", "X202", "X203", "X204", "X205", "X206", "X207", "X2", "X21", "X22", "X23", "X24", "X25", "X26", "X27", "X28", "X29", "X30", "X3", "X31", "X32", "X33", "X34", "X35", "X36", "X37", "X38", "X39", "X40", "X4", "X41", "X42", "X43", "X44", "X45", "X46", "X47", "X48", "X49", "X50", "X5", "X51", "X52", "X53", "X54", "X55", "X56", "X57", "X58", "X59", "X60", "X6", "X61", "X62", "X63", "X64", "X65", "X66", "X67", "X68", "X69", "X70", "X7", "X71", "X72", "X73", "X74", "X75", "X76", "X77", "X78", "X79", "X80", "X8", "X81", "X82", "X83", "X84", "X85", "X86", "X87", "X88", "X89", "X90", "X91", "X9", "X92", "X93", "X94", "X95", "X96", "X97", "X98", "X99"), row.names = c(NA, 6L), class = "data.frame") 

Any insight would be very helpful.

From some attempts on the small dataset above, it seems like a number is being calculated for each row, but when I return the res object, it just gives me the final value. How can i fix this?

+6
source share
4 answers

There are two ways to use apply family functions. Or do you

 apply(mat, 1, sum, na.rm=TRUE) 

if you want to apply the sum() function to each line by passing additional parameters like na.rm=TRUE . Or you can do

 apply(mat, 1, foo) 

where foo() is a function of your own externally defined, e.g.

 foo <- function(x) sum(x==0, na.rm=TRUE) 

Note that NA processing can also be processed by the parameter of the function itself with the default value set to TRUE in the above definition, as in

 foo2 <- function(x, no.na=TRUE) sum(x==0, na.rm=no.na) 

and you can name it as apply(mat, 1, foo2, no.na=F) , although this does not make sense with the sum() function (if you do not want to check if there are NA values, but in this case it is better to use is.na() : -).

Finally, you can define your function directly in the line

 apply(mat, 1, function(x) sum(x==0, na.rm=TRUE)) 

In your case, it gives me

 > apply(mat, 1, function(x) sum(x==0, na.rm=TRUE)) 1 2 3 4 5 6 22 4 9 8 7 2 

which is equivalent to apply(ex, 1, foo) .

+14
source

Call your dat dataset. You can use table() to get a frequency table for each value in your dataset. If you want to apply this to all the data in your data frame, force the data to a single vector and use table() for the resulting vector:

 table(do.call('c', dat)) 

This gives you:

 > table(do.call('c', dat)) 0 1 2 3 4 5 6 10 20 30 40 50 60 70 80 90 100 52 1 1 1 1 1 1 10 16 21 25 76 19 25 37 14 45 

If you want to check the frequencies for individual columns, simply do:

 apply(dat, 1, table) 
+4
source

For data in data.frame format named df ,

 sapply(df + 1, tabulate, 101) 

creates a 101 x 151 matrix, where the rows correspond to 0, 1, ..., 100 and the columns for 151 samples; the matrix may be convenient for subsequent calculations, and the tabulation table is faster than the table.

+4
source

I am trying to refer to the description of the problem, rather than correlating the problem with coding in what seemed to be the initial partial effort. To count the number of occurrences in a row, use "apply" with "table"

 > apply(dfrm, 1, table) $`1` 0 1 10 20 30 40 50 60 70 80 90 100 22 1 5 12 14 12 26 7 10 19 7 16 $`2` 0 2 10 30 40 50 80 100 4 1 1 1 2 6 1 3 $`3` 0 3 10 40 50 60 70 80 90 100 9 1 1 3 13 3 8 3 3 7 $`4` 0 4 10 20 30 40 50 60 70 80 90 100 8 1 3 1 3 5 11 4 3 5 2 5 $`5` 0 5 20 30 40 50 60 70 80 90 100 7 1 1 3 3 13 3 4 7 2 7 $`6` 0 6 20 50 60 80 100 2 1 2 7 2 2 7 

And note that this result includes the case x == 0 as a subset:

 > sapply( apply(dfrm, 1, table), function(x) x['0']) 1.0 2.0 3.0 4.0 5.0 6.0 22 4 9 8 7 2 
+3
source

Source: https://habr.com/ru/post/888159/


All Articles