Vector subset: name and index

Question

Vector subset: name and index

If I have a vector v with names:

 John Murray Lisa Mike Joe Ann 0.0832090 0.0475580 -0.2797860 0.1086225 0.0104590 -0.0028250

What is the time complexity of v['Joe'] compared to v[4] ? I assume the former will accept O (log n), as it should include a binary search, but I'm still not sure if the latter is O (1) or not.

In addition, is the result generalized to the case when v is a list / data frame and not an atomic vector?

+6

r

chanp Dec 02 '13 at 9:58

source share

1 answer

Simon O'Hanlon · Accepted Answer · 2013-12-02T10:16:40+0000

It seems that approximately O (n), i.e. a vector scan, in the case of a name search. Your O (1) hypothesis for searching using indexes seems sound ...

 # Unique names for longish vector nms <- apply( expand.grid( letters , letters , letters , letters ) , 1 , paste , collapse = "" ) length(nms) #[1] 456976 length(unique(nms)) #[1] 456976 # Start of names head(nms) #[1] "aaaa" "baaa" "caaa" "daaa" "eaaa" "faaa" # End of names tail(nms) #[1] "uzzz" "vzzz" "wzzz" "xzzz" "yzzz" "zzzz" # Large named vector x <- setNames( runif( 456976 ) , nms ) # Small named vector y <- setNames( runif(26) , letters ) # Timing information require( microbenchmark ) bm <- microbenchmark( x['daaa'] , x[4] , x['vzzz'] , x[456972] , y['d'] , y[4] ) print( bm , order = 'median' , unit = 'relative' , digits = 3 ) #Unit: relative # expr min lq median uq max neval # x[456972] NaN 1.00e+00 1.00 1.00 1.000 100 # x[4] Inf 1.00e+00 1.33 1.07 0.957 100 # y[4] NaN 5.01e-01 1.33 1.14 0.191 100 # y["d"] Inf 1.00e+00 2.00 1.25 0.265 100 # x["vzzz"] Inf 6.57e+04 44412.24 9969.64 3439.154 100 # x["daaa"] Inf 6.59e+04 44582.73 10049.63 1207.337 100

Vector subset: name and index

More articles: