Working with large amounts of data in java: speed

Question

Working with large amounts of data in java: speed

I want to work with 10k-100k data points as 16-tuples (x_1, ..., x_16). Most tuple elements are float in [0,1], along with one line and some ints.

I want to be able to do lightning-fast (preferably <10ms) mathematical operations at selected data points. For example: calculate the average value of x_15 for all points that satisfy: x_3 is in [0.3,0.4] and x_5> x_2.

My naive approach would be to do something like create a class for each tuple, and then do my math in the classes. For storage, I simply write all the tuples to a text file when the program is completed, and load them there when the program starts.

Is this possible, and this approach will be lightning fast?

+3

java performance algorithm

Oliver Jan 25 '11 at 13:32

source share

4 answers

tim_yates · Answer 1 · 2011-01-25T13:38:02+0000

It would probably be faster to load tuples into a 2-dimensional array of floats rather than into a single dimensional array of class instances, since it seems you would like to do a lot of comparisons between the individual tuples (so you would have to access the class properties 100k times + per request executing its 1d array path)

Peter Lawrey · Answer 2 · 2011-01-25T13:38:48+0000

If you want a quick column-based scan, I suggest you keep each column separately. for example, it is much faster to scan by float [] than as many objects as float. (Your cache prefers it to start)

Another approach is to use indexed data, but you need to determine if it will be faster for you.

drekka · Answer 3 · 2011-01-25T13:39:23+0000

. , , , , . , , CPU . , - , , .

Dwb · Answer 4 · 2011-01-25T13:56:37+0000

If the float values are real fixed point values, I believe you will have speed acceleration by storing them as integers (or long) and manipulating them using arithmetic operations. For example, you can represent the value 0.000001 as 1 and the value 0.123456 as 123456.

Memory

As mentioned in at least one other answer, when you load your values, storing them in an array of values will have less memory than an array of objects with errors (at least 1 less link to the tappel). For instance:

public class MathTupple
{
    public MathTupple(int tuppleCount)
    {
        valueBlah = new long[tuppleCount];
    }

    private long[] valueBlah;
}

Working with large amounts of data in java: speed

Memory

More articles: