Minimal identifier with non-repeating elements

I am stuck in a complicated problem in R and cannot solve it. The problem is this.

x and y are two vectors as follows:

x<- c(1,2,3,4,5)
y<- c(12,4,2,5,7,18,9,10)

I want to create a new vector p, where length (p) = length (x), as follows:

  • For each id in x, find the id in y that has the minimum absolute distance in terms of values. For example, for id = 1 in x, value_x (id = 1) = 1, min_value_y = 2, and id_y (value == 2) = 3. Thus, the answer to id 1 in x is 3. Thus, we create a new vector p, which will have the following meanings: p = (3,3,3,2,4);

Now we must update p as follows:

  • Since 3 was an id corresponding to id_x = 1, it cannot be id_x = 2. Therefore, we must discard id_y = 3 with a value of 2 in order to calculate the next minimum distance for id_x = 2. The next best minimum distance for id_x = 2 is id_y = 2 with a value of 4. Therefore, the updated p is (3,2,3,2,4).
  • Since 3 was an id corresponding to id_x = 1, it cannot be id for id_x = 3. Therefore, we must discard id_y = 3 with a value of 2 in order to calculate the next minimum distance for id_x = 3. The next best minimum distance for id_x = 3 is equal to 2. Therefore, the updated p is equal to (3,2,4,2,4).

p 2 4, , . , x y id x id of y, . , p .

.

- , :

minID <- function(x,y) {return(which(abs(x-y)==min(abs(x-y))))};
p1 <- sapply(x,minID,y=y);
#Calculates the list of all minimum elements -no where close to actual solution :(

x y 1 , . .

+4
3

y, p. set stl ++, Rcpp, R:

library(Rcpp)
getVals = cppFunction(
'NumericVector getVals(NumericVector x, NumericVector y) {
    NumericVector p(x.size());
    std::vector<std::pair<double, int> > init;
    for (int j=0; j < y.size(); ++j) {
        init.push_back(std::pair<double, int>(y[j], j));
    }
    std::set<std::pair<double, int> > s(init.begin(), init.end());
    for (int i=0; i < x.size(); ++i) {
        std::set<std::pair<double, int> >::iterator p1, p2, selected;
        p1 = s.lower_bound(std::pair<double, int>(x[i], 0));
        p2 = p1;
        --p2;
        if (p1 == s.end()) {
            selected = p2;
        } else if (p2 == s.begin()) {
            selected = p1;
        } else if (fabs(x[i] - p1->first) < fabs(x[i] - p2->first)) {
            selected = p1;
        } else {
            selected = p2;
        }
        p[i] = selected->second+1;  // 1-indexed
        s.erase(selected);
    }
    return p;
}')

pure-R, , - 1 :

# Pure-R posted solution
getVals2 = function(x, y) {
  n <- length(x)
  p <- rep(NA, n)
  for(i in 1:n) {
    id <- which.min(abs(y - x[i]))
    y[id] <- Inf
    p[i] <- id
  }
  return(p)
}

# Test with medium-sized vectors
set.seed(144)
x = rnorm(10000)
y = rnorm(20000)
system.time(res1 <- getVals(x, y))
#    user  system elapsed 
#   0.008   0.000   0.008 
system.time(res2 <- getVals2(x, y))
#    user  system elapsed 
#   1.284   2.919   4.211 
all.equal(res1, res2)
# [1] TRUE

# Test with large vectors
set.seed(144)
x = rnorm(1000000)
y = rnorm(2000000)
system.time(res3 <- getVals(x, y))
#    user  system elapsed 
#   4.402   0.097   4.467 

, - x n, y m, O((n+m)log(m)) time - O(m log(m)) BST O(n log(m)) p - which.min O(nm) .

+9
n <- length(x)
p <- rep(NA, n)
for(i in 1:n) {
  id <- which.min(abs(y - x[i]))
  y[id] <- Inf
  p[i] <- id
}
+3

I tried to develop the code in R and got an improvement of 20 times per loop. A piece of code is as follows:

Generalized.getMinId <- function(a,b)
{
     sapply(a, FUN = function(x) which.min(abs(x-b)))
}

Generalized.getAbsDiff <- function(a,b)
{
     lapply(a, FUN = function(x) abs(x-b))
}

min_id = Generalized.getMinId(tlist,clist);
dup = which(duplicated(min_id));

while(length(dup) > 0)
{

absdiff = Generalized.getAbsDiff(tlist[dup],clist);
infind = lapply(dup, function(x,y)
                    {l <- head(y,x-1); l[l>0]}, y = min_id);

absdiff = Map(`[<-`, absdiff, infind, Inf);
dupind = sapply(absdiff, which.min);
min_id[dup] = dupind;

dup = which(duplicated(min_id));
}

In case someone can improve this piece of code, it will be awesome.

0
source

Source: https://habr.com/ru/post/1530590/


All Articles