I have a dataframe as follows:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 1 1 1
1 4565 -1 0 0 0
1 6887 -1 1 0 0
1 12098 1 -1 1 1
2 12 -1 1 0 1
2 43 -1 1 1 1
5 1 -1 1 1 0
5 43 0 1 -1 0
5 6554 1 1 1 1
5 7654 -1 0 0 0
5 8765 1 1 1 0
5 9833 1 1 1 -1
6 12 1 1 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I would like to convert it according to the following rules: For each chromosome:
a. If there are three or more 1 or -1 in a column, then the value remains as it is.
b. If the column consistently has less than 1 or -1 s, then the value 1 or -1 changes to 0
Lines in a column must have the same sign (+ or -ve), which will be called sequential.
The result of the above data frame should be:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 0 0 0
1 4565 -1 0 0 0
1 6887 -1 0 0 0
1 12098 0 0 0 0
2 12 0 0 0 0
2 43 0 0 0 0
5 1 0 1 0 0
5 43 0 1 0 0
5 6554 0 1 0 0
5 7654 0 0 0 0
5 8765 0 0 0 0
5 9833 0 0 0 0
6 12 0 0 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I managed to do this for two consecutive lines, but I'm not sure how to change this for three or more lines.
DAT_list2res <-cbind(DAT_list2[1:2],DAT_list2res)
colnames(DAT_list2res)[1:2]<-c("chr","leftPos")
DAT_list2res$chr<-as.numeric(gsub("chr","",DAT_list2res$chr))
DAT_list2res<-as.data.frame(DAT_list2res)
dx<-DAT_list2res
f0 <- function( colNr, dx)
{
col <- dx[,colNr]
n1 <- which(col == 1| col == -1)
d0 <- which( diff(col) == 0)
dc0 <- which( diff(dx[,1]) == 0)
m <- intersect( n1-1, intersect( d0, dc0 ) )
return ( setdiff( 1:nrow(dx), union(m,m+1) ) )
}
g <- function( dx )
{
for ( i in 3:ncol(dx) ) { dx[f0(i,dx),i] <- 0 }
return ( dx )
}
dx<-g(dx)