Substituting calculation results

I am collecting data, in particular, I opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and cleared the data from table s4,

1a 1b 1a 1b 1 5.27 4.76 5.09 4.75 2 2.47 2.74 2.77 2.80 4 1.14 1.38 1.12 1.02 6 7.43 7.35 7.22-7.35a 7.25-7.36a 7 7.38 7.34 7.22-7.35a 7.25-7.36a 8 7.23 7.20 7.22-7.35a 7.25-7.36a 9(R) 4.16 3.89 4.12b 4.18b 9(S) 4.16 3.92 4.12b 4.18b 10 1.19 0.91 1.21 1.25 

pasted it into notepad and saved it as a txt file.

 s4 <- read.table("s4.txt", header=TRUE, stringsAsFactors=FALSE) 

gives

  X1a X1b X1a.1 X1b.1 1 5.27 4.76 5.09 4.75 2 2.47 2.74 2.77 2.80 4 1.14 1.38 1.12 1.02 6 7.43 7.35 7.22-7.35a 7.25-7.36a 7 7.38 7.34 7.22-7.35a 7.25-7.36a 8 7.23 7.20 7.22-7.35a 7.25-7.36a 

to use the data that I need to change everything to numeric and delete letters, thanks to this link R regex gsub separate letters and numbers I can use the following code,

 gsub("([[:alpha:]])","",s4[,3]) 

I can get rid of extraneous letters.

What I want to do now, and the point of the question, is to change the ranges,

 "7.22-7.35" "7.22-7.35" "7.22-7.35" 

by my own means

 "7.29" 

Can I use gsub for this? (or do I need strsplit through a hyphen, combine into a vector and return the average?).

+6
source share
4 answers

For this task you need one regular expression in strsplit (deleting letters and splitting):

 s4[] <- lapply(s4, function(x) { if (is.numeric(x)) x else sapply(strsplit(as.character(x), "-|[[:alpha:]]"), function(y) mean(as.numeric(y))) }) 

Result:

 > s4 X1a X1b X1a.1 X1b.1 1 5.27 4.76 5.090 4.750 2 2.47 2.74 2.770 2.800 4 1.14 1.38 1.120 1.020 6 7.43 7.35 7.285 7.305 7 7.38 7.34 7.285 7.305 8 7.23 7.20 7.285 7.305 
+3
source

Here's an approach that seems to work correctly with sample data:

 df[] <- lapply(df, function(col){ col <- gsub("([[:alpha:]])","", col) col <- ifelse(grepl("-", col), mean(as.numeric(unlist(strsplit(col[grepl("-", col)], "-")))), col) as.numeric(col) }) > df # X1a X1b X1a.1 X1b.1 #1 5.27 4.76 5.090 4.750 #2 2.47 2.74 2.770 2.800 #4 1.14 1.38 1.120 1.020 #6 7.43 7.35 7.285 7.305 #7 7.38 7.34 7.285 7.305 #8 7.23 7.20 7.285 7.305 

Disclaimer: it only works if the ranges in each column are the same (as in the example data)

+3
source

something like that:

 mean(as.numeric(unlist(strsplit("7.22-7.35","-")))) 

should work (and match what you had in mind, I think)

or you can do:

 eval(parse(text=paste0("mean(c(",gsub("-",",","7.22-7.35"),"))"))) 

but I'm not sure if this is easier ...

To apply it to a vector:

 vec<-c("7.22-7.35","7.22-7.35") 1st solution : sapply(vec, function(x) mean(as.numeric(unlist(strsplit(x,"-"))))) 2nd solution : sapply(vec, function(x) eval(parse(text=paste0("mean(c(",gsub("-",",",x),"))")))) 

In both cases, you will receive:

 7.22-7.35 7.22-7.35 7.285 7.285 
+2
source

Besides,

 library(gsubfn) indx <- !sapply(s4, is.numeric) s4[indx] <- lapply(s4[indx], function(x) sapply(strapply(x, '([0-9.]+)', ~as.numeric(x)), mean)) s4 # X1a X1b X1a.1 X1b.1 #1 5.27 4.76 5.090 4.750 #2 2.47 2.74 2.770 2.800 #4 1.14 1.38 1.120 1.020 #6 7.43 7.35 7.285 7.305 #7 7.38 7.34 7.285 7.305 #8 7.23 7.20 7.285 7.305 
0
source

Source: https://habr.com/ru/post/979620/


All Articles