R-tonic substitution for simple cycles containing a condition

Question

R-tonic substitution for simple cycles containing a condition

I use R and I am new. I have two large lists (each 30K items). One is called descriptionsand where each element (possibly) is a tokenized string. Another is called probeswhere each element is a number. I need to make a dictionary that maps probesto something in descriptions, if that is something there. Here is how I do it:

probe2gene <- list()
for (i in 1:length(probes)){
 strings<-strsplit(descriptions[i]), '//')
 if (length(strings[[1]]) > 1){ 
  probe2gene[probes[i]] = strings[[1]][2]
 }
}

Which works fine, but seems slow, much slower than roughly equivalent python:

probe2gene = {}
for p,d in zip(probes, descriptions):
    try:
     probe2gene[p] = descriptions.split('//')[1]
    except IndexError:
     pass

My question is: is there an “R-tonic” way of doing what I'm trying to do? Manual recording of R for loops suggests that such loops are rare. Is there a better solution?

Edit: A typical good “description” looks like this:

"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"

":

"-----"

. - . probe description , .. probe[i] description[i].

+3

for-loop r

Mike Dewar 10 . '10 19:50

4

, -, () ?

?

, ?

a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))

names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only

. , .

,

+2

Jay 10 . '10 20:21

.

probe<-c(4,3,1)
gene<-c('red//hair','strange','blue//blood')
probe2gene<-character()
probe2gene[probe]<-sapply(strsplit(gene,'//'),'[',2)
probe2gene
[1] "blood" NA      NA      "hair"

sapply , R '[', . , , NA. , .

0

Jyotirmoy Bhattacharya 11 . '10 5:59

, . , . , lapply. ?

make_desc <- function(n)
{
    word <- function(x) paste(sample(letters, 5, replace=TRUE), collapse = "")
    if (runif(1) < 0.70)
        paste(sapply(seq_len(n), word), collapse = "//")
    else
        "----"
}

description <- sapply(seq_len(10), make_desc)
probes <- seq_len(length(description))

desc_parts <- strsplit(description, "//", fixed=TRUE, useBytes=TRUE)
lens <- sapply(desc_parts, length)
probes_expand <- rep(probes, lens)
ans <- split(unlist(desc_parts), probes_expand)


> description
 [1] "fmbec"                                                               
 [2] "----"                                                                
 [3] "----"                                                                
 [4] "frrii//yjxsa//wvkce//xbpkc"                                          
 [5] "kazzp//ifrlz//ztnkh//dtwow//aqvcm"                                   
 [6] "stupm//ncqhx//zaakn//kjymf//swvsr//zsexu"                            
 [7] "wajit//sajgr//cttzf//uagwy//qtuyh//iyiue//xelrq"                     
 [8] "nirex//awvnw//bvexw//mmzdp//lvetr//xvahy//qhgym//ggdax"              
 [9] "----"                                                                
[10] "ubabx//tvqrd//vcxsp//rjshu//gbmvj//fbkea//smrgm//qfmpy//tpudu//qpjbu"


> ans[[3]]
[1] "----"
> ans[[4]]
[1] "frrii" "yjxsa" "wvkce" "xbpkc"

0

seth 11 . '10 21:51

Johann Hibschman · Accepted Answer · 2010-02-10T20:08:23+0000

R, , . , ; , .

> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"

, R /. , , - . , , .

R-tonic substitution for simple cycles containing a condition

More articles: