How to avoid list looping after reading from JSON with R

Question

How to avoid list looping after reading from JSON with R

I have a JSON data vector in R, and using lapply I retrieve the information:

list <- lapply(temp, fromJSON)

The structure of the first element of this list is as follows:

 str(list[[1]]) List of 4 $ boundedBy :List of 2 ..$ type : chr "Polygon" ..$ coordinates:List of 1 .. ..$ :List of 5 .. .. ..$ : num [1:2] 89328 208707 .. .. ..$ : num [1:2] 89333 208707 .. .. ..$ : num [1:2] 89333 208713 .. .. ..$ : num [1:2] 89328 208713 .. .. ..$ : num [1:2] 89328 208707 $ hnrlbl : NULL $ opndatum : chr "2011-05-30" $ oidn : chr "2954841"

This works for the first element: list[[1]]$hnrlbl , but how to do it right away for the whole list? Something like list[[.]]$hnrlbl

+6

json r

Kasper Van Lombeek Aug 26 '14 at 10:22

source share

4 answers

jdharrison · Answer 1 · 2014-08-26T11:09:04+0000

In this case, you can simply use list.map from the rlist package:

 mylist <- lapply(temp, fromJSON) library(rlist) list.map(mylist, hnrlbl)

http://cran.r-project.org/web/packages/rlist/vignettes/Mapping.html

hadley · Answer 2 · 2014-08-26T12:26:53+0000

I have a helper function that is useful for these scenarios:

 pluck <- function(x, name, type) { if (missing(type)) { lapply(x, .subset2, name) } else { vapply(x, .subset2, name, FUN.VALUE = type) } }

(This was inspired by underscore and Winston Chang .subset2() is an internal version [[ - is faster, but does not send S3, which means x should be a simple list).

Using this function, solving your problem is very simple:

 x <- list( a = list(x = rnorm(10), y = letters[1:10], z = "OK"), b = list(x = rnorm(10), y = letters[11:20], z = "notOK") ) # List of results str(pluck(x, "z")) #> List of 2 #> $ a: chr "OK" #> $ b: chr "notOK" # Vector of results str(pluck(x, "z", character(1))) #> Named chr [1:2] "OK" "notOK" #> - attr(*, "names")= chr [1:2] "a" "b"

(You can also select by position: pluck(x, 2, character(10)) )

Benchmarking

This method is also pretty quick:

 x_big <- rep(x, 1000) myselect <- function(x,name){ tmp <- unlist(x, recursive = FALSE) id <- grep(paste0("\\.",name,"$"), names(tmp)) tmp[id] } library(microbenchmark) options(digits = 2) microbenchmark( sapply(x_big, function(i)i$z), myselect(x_big,"z"), pluck(x_big, "z", character(1)) ) #> Unit: microseconds #> expr min lq median uq max neval #> sapply(x_big, function(i) i$z) 2771 2886 2972 3124 5903 100 #> myselect(x_big, "z") 2250 2330 2366 2401 3551 100 #> pluck(x_big, "z", character(1)) 717 786 825 889 1731 100

Kasper Van Lombeek · Answer 3 · 2014-08-26T10:42:24+0000

After a couple of hours that were looking for the cleanest method, we did:

  kadaster_building_temp$hnrlbl <- sapply(list,function(x){x$hnrlbl} )

Joris meys · Answer 4 · 2014-08-26T11:57:58+0000

Warning When using regular expressions, this solution may fail under certain conditions (depending on the names you use in your lists). If speed is not an option, either list.map or a solution using sapply more reliable

You can get pretty high speed using unlist() here and look for names. Take the following function myselect :

 myselect <- function(x,name){ tmp <- unlist(x,recursive=FALSE) id <- grep(paste0("(^|\\.)",name,"$"),names(tmp)) tmp[id] }

This is done in much the same way, but in vector form. Using the recursive=FALSE argument, you insert a nested list into a flat list (all elements are part of the same list). You then use the naming convention used by this function to search for all elements containing the exact name you want to select. Therefore, calling paste0 to create a regular expression that avoids matching the partial name. A simple selection returns you a list with the necessary items again. If you want it to be a vector or so, you can simply use unlist() for the result.

Please note that I assume you have a list of lists, so you only want to flatten one level. For a more complex investment, this obviously will not work in the current form.

Example and benchmarking

The gain probably depends on the structure of the list, but can reach a 50x or greater increase in speed.

Take the following (very simple) example:

 aList <- list( a=list(x=rnorm(10),y=letters[1:10],z="OK"), b=list(x=rnorm(10),y=letters[11:20],z="notOK") )

Benchmarking gives:

 require(rbenchmark) benchmark( sapply(aList,function(i)i$z), myselect(aList,"z"), columns=c("test","elapsed","relative"), replications=10000 ) test elapsed relative 2 myselect(aList, "z") 0.24 1.000 1 sapply(aList, function(i) i$z) 0.39 1.625

With larger objects, improvement can be significant. Using this on the list that I happened to have in my workspace (dput is not an option here ...):

 > benchmark( + sapply(StatN0_1,function(i)i$SP), + myselect(StatN0_1,"SP"), + columns=c("test","elapsed","relative"), + replications=100 + ) test elapsed relative 2 myselect(StatN0_1, "SP") 0.02 1.0 1 sapply(StatN0_1, function(i) i$SP) 1.13 56.5

How to avoid list looping after reading from JSON with R

Benchmarking

More articles: