Import exponential values ​​as numeric in R

I need to import many datasets automatically when the first column is a name, so the character vector, and the second column is a number vector, so I used these specifications with read.table: colClasses = c ("character", "numeric").

This works fine if I have a dataframe stored in df_file, like this:

df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("1e-04","1e-04","1e-04","1e-04") read.table(df_file, header = FALSE, comment.char="", colClasses = c("character", "numeric"), stringsAsFactors=FALSE) 

The problem in some cases I have dataframes with numeric values ​​in the form of an exponent in the second column, and in these cases the import does not work, because it does not recognize the column as numeric (or imports as a character) "unless I specify colClasses), so my question : how can I specify the column to import as numeric, even if the values ​​are exponential?

For instance:

 df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("10^(-4)","10^(-4)","10^(-4)","10^(-4)")) 

I want all exponential values ​​to be imported as numeric, but even when I try to switch from character to numeric after import, I get all "NA" (as.numeric (as.character (df $ V2)) "Warning message: NAs introduced by duress ")

I tried using "real" or "complex" with colClasses, but it still imports exponents as a symbol.

Please help, thanks!

+4
source share
3 answers

I think the problem is that the form in which your exponents are written does not match the style of R. If you read them as character vectors, you can convert them to exponentials if you know that they are all exponents. Use gsub to cut "10 ^ (" and ")", leaving you with "-4", convert to a numeric value, and then convert back to exponent. This may not be the fastest way, but it works.

In your example:

df <- data.frame (V1 = c ("s1", "s2", "s3", "s4"), V2 = c ("10 ^ (- 4)", "10 ^ (- 4))" , "10 ^ (- 4)", "10 ^ (- 4)"))

 > df$V2 <- 10^(as.numeric(gsub("10\\^\\(|\\)", "", df$V2))) > df V1 V2 1 s1 1e-04 2 s2 1e-04 3 s3 1e-04 4 s4 1e-04 

What happens in detail: gsub("10\\^\\(|\\)", "", df$V2) replaces 10 ^ (s) with an empty string (you need to avoid carats and parentheses), as.numeric() converts your string -4 to the number -4, then you simply run 10 ^ for each element of the number vector you just made.

+5
source

If you read in your data.frame with stringsAsFactors=FALSE , the column in question should appear as a character vector, in which case you can simply do:

 transform(df, V2=eval(parse(text=V2))) 
+6
source

You can use readLines to first load data and perform all necessary operations, and then use read.table with textConnection as follows:

 tt <- readLines("~/tmp.txt") tt <- gsub("10\\^\\((.*)\\)$", "1e\\1", tt) read.table(textConnection(tt), sep="\t", header=TRUE, stringsAsFactors=FALSE) V1 V2 1 s1 1e-04 2 s2 1e-04 3 s3 1e-04 4 s4 1e-04 
+3
source

Source: https://habr.com/ru/post/1487867/


All Articles