I use R and I am new. I have two large lists (each 30K items). One is called descriptionsand where each element (possibly) is a tokenized string. Another is called probeswhere each element is a number. I need to make a dictionary that maps probesto something in descriptions, if that is something there. Here is how I do it:
probe2gene <- list()
for (i in 1:length(probes)){
strings<-strsplit(descriptions[i]), '//')
if (length(strings[[1]]) > 1){
probe2gene[probes[i]] = strings[[1]][2]
}
}
Which works fine, but seems slow, much slower than roughly equivalent python:
probe2gene = {}
for p,d in zip(probes, descriptions):
try:
probe2gene[p] = descriptions.split('//')[1]
except IndexError:
pass
My question is: is there an “R-tonic” way of doing what I'm trying to do? Manual recording of R for loops suggests that such loops are rare. Is there a better solution?
Edit: A typical good “description” looks like this:
"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"
":
"-----"
. - . probe description , .. probe[i] description[i].