I have a large dataset (10 billion) of data points (doubles) that I need to display on a chart. Since displaying all the data at once is not useful, I was looking for an algorithm that will help me choose the best N points from the entire set.
I am currently doing Systematic Sampling to reduce the data set. Any suggestions on how to improve it? Thanks.
Update: The data are 16-bit numbers representing the amplitude of the waveform. Thus, they can range from -32.768 to 32.767. I want to capture the peaks and the valley so that N points selected for display from the whole set give an approximation of the whole set.
rahul source share