How can you effectively check the values ​​of large vectors in R?

One thing that I want to do all the time in my R code is to check whether certain conditions for the vector are preserved, for example, whether it contains any or all values ​​equal to some given value. RThe ish way to do this is to create a logical vector and use all or all, for example:

any(is.na(my_big_vector))
all(my_big_vector == my_big_vector[[1]])
...

It seems inefficient to allocate a large vector and fill it with values, just to throw it away (especially if the call any()or all()can be short-circuited after testing only a few values. Is there a better way to do this, or should I just hand over his desire to write code that is efficient and brief when working in R?

+3
source share
3 answers

“Cheap, fast, reliable: choose any two” - this is a dry way to say that sometimes you need to order your priorities when creating or designing systems.

Here it is quite similar: the cost of a brief expression is the fact that memory is allocated behind the scenes. If this is really a problem, you can always write routines (compiled?) To execute (quickly) along vectors and use only a couple of values ​​at a time.

You can swap memory usage versus performance versus expressiveness, but at the same time it's hard to hit all three.

+3
source
which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)

And if you want to count them ...

length(which(is.na(my_big_vector)))
0
source

, . R - , . , R , . , , R - , ,

any(is.na(a))

can be recognized and executed as something like

.Internal(is_any_na,a)
0
source

Source: https://habr.com/ru/post/1752699/


All Articles