Convert coefficient to integer in data frame

I have the following code

anna.table<-data.frame (anna1,anna2) write.table<-(anna.table, file="anna.file.txt",sep='\t', quote=FALSE) 

my table at the end contains numbers like

 chr start end score chr2 41237927 41238801 151 chr1 36976262 36977889 226 chr8 83023623 83025129 185 

etc.

after that I try to get only the values ​​that meet some criteria, such as a score less than a certain value

so i do the following

 anna3<-"data/anna/anna.file.txt" anna.total<-read.table(anna3,header=TRUE) significant.anna<-subset(anna.total,score <=0.001) Error: In Ops.factor(score, 0.001) <= not meaningful for factors 

so I think the problem is that my table has factors, not integers

I assume that my anna.total $ score is a factor, and I have to make it an integer

If I read correctly, then as.numeric can solve my problem.

I am reading about as.numeric function, but I can’t understand how I can use it

So could you give me some advice?

early

Best regards Anna

PS: I tried the following

 anna3<-"data/anna/anna.file.txt" anna.total<-read.table(anna3,header=TRUE) anna.total$score.new<-as.numeric (as.character(anna.total$score)) write.table(anna.total,file="peak.list.numeric.v3.txt",append = FALSE ,quote = FALSE,col.names =TRUE,row.names=FALSE, sep="\t") anna.peaks<-subset(anna.total,fdr.new <=0.001) Warning messages: 1: In Ops.factor(score, 0.001) : <= not meaningful for factors 

I have the same problem again ......

+4
source share
2 answers

With anna.table (by the way, this is a data frame, a table is something else!), The easiest way is simply:

 anna.table2 <- data.matrix(anna.table) 

as data.matrix() converts factors into their basic numerical (integer) levels. This will work for a data frame that contains only numeric, integer, multipliers, or other variables that can be forced to be numeric, but any character strings (character) will make the matrix become a character matrix.

If you want anna.table2 be a data frame, not a matrix, you can subsequently:

 anna.table2 <- data.frame(anna.table2) 

Other parameters are forcing all variable factors to their whole levels. Here is an example of this:

 ## dummy data set.seed(1) dat <- data.frame(a = factor(sample(letters[1:3], 10, replace = TRUE)), b = runif(10)) ## sapply over `dat`, converting factor to numeric dat2 <- sapply(dat, function(x) if(is.factor(x)) { as.numeric(x) } else { x }) dat2 <- data.frame(dat2) ## convert to a data frame 

What gives:

 > str(dat) 'data.frame': 10 obs. of 2 variables: $ a: Factor w/ 3 levels "a","b","c": 1 2 2 3 1 3 3 2 2 1 $ b: num 0.206 0.177 0.687 0.384 0.77 ... > str(dat2) 'data.frame': 10 obs. of 2 variables: $ a: num 1 2 2 3 1 3 3 2 2 1 $ b: num 0.206 0.177 0.687 0.384 0.77 ... 

However, note that the above will only work if you want to get a basic numeric representation. If your factor has substantially numerical levels, then we need to be a little smarter in how we convert the coefficient to numerical, preserving the β€œnumerical” information encoded at the levels. Here is an example:

 ## dummy data set.seed(1) dat3 <- data.frame(a = factor(sample(1:3, 10, replace = TRUE), levels = 3:1), b = runif(10)) ## sapply over `dat3`, converting factor to numeric dat4 <- sapply(dat3, function(x) if(is.factor(x)) { as.numeric(as.character(x)) } else { x }) dat4 <- data.frame(dat4) ## convert to a data frame 

Note that we need to do as.character(x) before we do as.numeric() . An extra call encodes level information before we convert it to numeric. To understand why this matters, note that dat3$a

 > dat3$a [1] 1 2 2 3 1 3 3 2 2 1 Levels: 3 2 1 

If we just convert this to a numeric number, we get the wrong data, since R converts the codes of the basic level.

 > as.numeric(dat3$a) [1] 3 2 2 1 3 1 1 2 2 3 

If we first force the factor to a character vector, and then to a numerical one, we store the original information, and not the internal representation of R

 > as.numeric(as.character(dat3$a)) [1] 1 2 2 3 1 3 3 2 2 1 

If your data is similar to this second example, you cannot use the simple data.matrix() trick, since it is the same as applying as.numeric() directly to the coefficient and, as this second example shows, which does not preserve the original information.

+11
source

I know this is an older question, but I had the same problem, and maybe this helps:

In this case, your score column looks as if it should not become a factor column. This usually happens after read.table when it is a text column. Depending on which country you are from, there may be separate floats with the symbol "," and not with ".". Then R thinks it is a column of characters and makes it a factor. And in this case, the Gavins answer will not work, because R will not make "123,456" to 123,456. You can easily fix this in a text editor with the replacement of "," by ".". although.

+4
source

Source: https://habr.com/ru/post/1398791/


All Articles