I have a data frame whose first 5 lines look like this:
Sample CCT6 GAT1 IMD3 PDR3 RIM15 001 0000000000 111111111111111111111 010001000011 0N100111NNNN 01111111111NNNNNN 002 1111111111 111111111111111111000 000000000000 0N100111NNNN 00000000000000000 003 0NNNN00000 000000000000000000000 010001000011 000000000000 11111111111111111 004 000000NNN0 11100111111N111111111 010001000011 111111111111 01111111111000000 005 0111100000 111111111111111111111 111111111111 0N100111NNNN 00000000000000000
The complete data set contains 2000 samples. I am trying to write code that allows me to determine if the row of numbers for each of the 5 columns is uniform (i.e., just 1 or 0) in all of my samples. Ideally, I would also like to distinguish between 1 and 0 in cases where the answer is True . In my example, the expected results are:
Sample CCT6 GAT1 IMD3 PDR3 RIM15 001 TRUE (0) TRUE (1) FALSE FALSE FALSE 002 TRUE (1) FALSE TRUE (0) FALSE TRUE (0) 003 FALSE TRUE (0) FALSE TRUE (0) TRUE (1) 004 FALSE FALSE FALSE TRUE (1) FALSE 005 FALSE TRUE (1) TRUE (1) FALSE TRUE (0)
I was not obsessed with using logic elements, and I could use symbols if they can be used to distinguish between different classes. Ideally, id would like to return the results to a similar data frame.
I am having problems with the very basic first step here, which is to tell R if the string consists of the same value. Ive tried to use different expressions using grep and regexpr , but could not get the result that I can use to apply the whole data frame using ddply or something like that. Here are some examples of what I tried for this step:
a = as.character("111111111111") b = as.character("000000000000") c = as.character("000000011110") > grep("1",a) [1] 1 > grep("1",c) [1] 1 > regexpr("1",a) [1] 1 attr(,"match.length") [1] 1 > regexpr("1",c) [1] 8 attr(,"match.length") [1] 1
Id really appreciate any help to get me started with this problem or help me fulfill my big goal.
source share