Efficient algorithm for finding the number of elements less than a query

Question

Efficient algorithm for finding the number of elements less than a query

I have two unsorted arrays a and b . For each element a[i] I need to find the number of elements b[j] such that b[j] < a[i] . In addition, b may contain duplicates that should not be taken into account. Both arrays can be very large.

I tried (for one x request)

 public static void main(String arg[]) { int x = 5; int b[] = {2, 4, 3, 4, 11, 13, 17}; Arrays.sort(b); int count = 0; for(int i = 0; i < b.length; ++i) { if(b[i] < x) { if(i == 0) ++count; else { // this check avoids counting duplicates if(b[i - 1] != b[i]) ++count; } } else { break; } } System.out.println(count); }

My problem is that this does not work well enough when querying all elements of a iteratively. What can I do to speed this up?

+5

java algorithm dynamic-programming

vks Nov 12 '16 at 10:30

source share

4 answers

Ghostcat · Answer 1 · 2016-11-12T10:39:27+0000

EDIT: given the later comments, some of the updates I just put in the beginning; leaving my first text below.

So, the main aspects are here:

You came here with some kind of problem X, but later asked that you actually had problem Y to solve. This is what you should try to avoid: when you come here (or when working on problems yourself!) ... then you should be able to clearly describe the problem that you have or are going to solve, I don’t see fingers here; just stating that you have to work hard so that you understand what your real problem is.
This is also evident from the fact that you are asking us what to do with duplicate numbers in your data. Err sir: problem . We do not know why you want to calculate these numbers; we do not know where your data comes from; and how the final decision should deal with duplicate entries. In this sense, I simply rephrase the first paragraph: you must clarify your requirements. We cannot help with this part in everything . And you see: you mentioned only duplicates in the second array. How about those in the first ?!

Ok, so about your problem. The fact is that this is really just a "job." There is no magic. Since you have two very large arrays, working with unsorted data is absolute non-go.

So you start by sorting both arrays.

Then you iterate over the first array, and in doing so, you also look at the second array:

 int indexWithinB = 0; int counterForCurrentA = 0; // and actually ALL values from a before for (int i=0; i<a.length; i++) { int currentA = a[i]; while (b[indexWithinB] < currentA) { if (indexWithinB > 0) { // check required to avoid using 0-1 if (b[indexWithinB-1] != b[indexWithinB] { // avoid counting duplicates! counterForCurrentA++; } } indexWithinB++; } // while loop ended, this means: b[indexWithinB] == or > currentA // this also means: counterForCurrentA ... should have the correct value }

The above is obviously pseudo code. It is designed to keep you moving; and it may well be that there are subtle mistakes. For example, as Andreas pointed out: the above needs to be improved to check also b.length. But this remains as an exercise for the reader.

Here is what I meant with “just work”: you just have to sit down, write test tags and refine my project algorithm until it does the job for you. If it will be difficult for you to program it initially, then take a piece of paper, put two arrays with numbers ... and do it manually.

The final hint: I suggest writing a lot of unit tests to test your algorithm (such material is perfect for unit tests); and make sure you have all your corner cases in such tests. You want to be 100% sure that your algorithm is correct before going after your arrays of elements of size 10 ^ 5!

And so, as promised, the original answer:

Simply put: iteration and counting are the most effective way to solve this problem. Thus, in the above case, sorting exception can lead to faster overall runtime.

The logic there is very simple: in order to find out the number of numbers less than x, you have to look at everything . So you have to iterate over the full array (when this array is not sorted).

Thus, given your original statement, there is nothing else but: iteration and counting.

Of course, if you need this number of times several times ... maybe you should sort the data first. Because then you can use binary search and get this account you are looking for without repeating all the data.

And: what makes you think that iterating through an array using 10 ^ 5 elements is a problem? In other words: are you just worried about a potential performance problem or do you have a problem with a real one ? You see, at some point you probably would have to create and populate this array. This took more time (and resources) than a simple loop to count records. And honestly: if we are not talking about a small built-in device ... 10 ^ 5 elements ... it's almost nothing, even when using a little outdated equipment.

Finally: when you are worried about runtime , the simple answer is to cut the input and use streams 2,4, 8, ... to count each fragment in parallel! But, as said: before writing this code, I would do a few profiling so that you really spend valuable development time on this. Do not solve hypothetical performance issues; focus on those that really matter to you or your users!

TDG · Answer 2 · 2016-11-12T10:34:58+0000

Combining each element of the array with x, you get O (n) time. Sorting an array will take O (n log n), and then you can use binary search, which is O (log n), and you get the total number of O (n log n). So the most efficient way is also trivial: just go through the array and compare each element with x.

 public static void main(String arg[] ){ int b[]={2, 4, 3, 4, 11, 13, 17}; int x=5; int count=0; for(int i=0;i<b.length;i++){ if(b[i]<x){ count++; } } System.out.println(count); }

user6904265 · Answer 3 · 2016-11-12T22:41:01+0000

I suggest you consider the following approach, but it only works if the array b has non-negative numbers. The algorithm works even if the input arrays (both a and b ) are not sorted.

Pseudo code

Get the max element of array b .
Create a new array c size max + 1 and place 1 at position c[b[i]] .
Create a new array d size max + 1 and fill it as follows:
d[0]=0;
d[i]=d[i-1] + c[i];
Create a new array e size n and fill it as follows:
if(a[i] > max) then e[i] = last(d)
otherwise e[i]=d[a[i]-1];

e array represents the solution: it contains in the i-th position the number counter of the array b below the i-th element of the array a . This example should be more explanatory than pseudo-code:

 a = [5, 1, 4, 8, 17, 12, 22] b = [2, 4, 3, 4, 11, 13, 17] c = [0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1] d = [0, 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 6] e = [3, 0, 2, 3, 5, 4, 6]

Complexity

 Steps 1, 2 and 4 are O(n). Step 3 is O(max(b))

if the input array b contains only "short" numbers (max (b) is in the same order of size n ), that the algorithm runs in O(n) . The algorithm can be optimized by creating an array of size max-min+1 and consider the counter 0 for all elements of the array a below min(b) .

Simple Java implementation:

 int a[] = {5, 1, 4, 8, 17, 12, 22}; int b[] = {2, 4, 3, 4, 11, 13, 17}; int max = Arrays.stream(b).max().getAsInt(); int c[] = new int[max+1]; int d[] = new int[max+1]; int e[] = new int[a.length]; for(int i=0;i<b.length;i++){ c[b[i]]=1; } for(int i=1;i<c.length;i++){ d[i] = d[i-1] + c[i]; } for (int i = 0; i<a.length;i++){ e[i]=(a[i]>max)?d[d.length-1]:d[a[i]-1]; } System.out.println(Arrays.toString(a)); System.out.println(Arrays.toString(b)); System.out.println(Arrays.toString(c)); System.out.println(Arrays.toString(d)); System.out.println(Arrays.toString(e));

Repi · Answer 4 · 2016-11-12T13:10:31+0000

This should be a possible solution. An “expensive” task is sorting lists. The list of bots must be sorted before the for loop. Make sure you use a quick mechanism to do the sorting. As explained, sorting by an array / array list is a very important operation, especially if there are many values that need to be sorted.

 public static void main(String[] args) throws IOException { // int x = 5; int a[] = { 1, 2, 3, 4, 5 }; int b[] = { 2, 4, 3, 4, 11, 13, 17 }; List<Integer> listA = new LinkedList<>(); for (int i : a) { listA.add(i); } List<Integer> listB = new LinkedList<>(); for (int i : b) { listB.add(i); } Collections.sort(listA); Collections.sort(listB); int smallerValues = 0; int lastValue = 0; Iterator<Integer> iterator = listB.iterator(); int nextValue = iterator.next(); for (Integer x : listA) { while (nextValue < x && iterator.hasNext()) { lastValue = nextValue; nextValue = iterator.next(); if (nextValue > lastValue) { smallerValues++; } } System.out.println(x + " - " + smallerValues); } }

Efficient algorithm for finding the number of elements less than a query

More articles: