Ideas for using the Hoeffding "D" GPU coefficient (dependency)?

Question

Ideas for using the Hoeffding "D" GPU coefficient (dependency)?

I am trying to come up with a very fast algorithm for calculating these very interesting statistics, which makes full use of the capabilities of a powerful GPU. Ideally, I will do this in Matlab using Jacket, but other ideas in CUDA or OpenCL code will also be highly appreciated. Basically, I want the source crowd to have a lot of smart ideas that I can try to put together, and then I will try to open the original result so that others can use it.

Despite the strength of this coefficient of dependencies (it is able to detect even "mutual relations"), there is almost nothing about it on the network, with the exception of two sources: SAS statistical software and the excellent R-package Hick Frank Harrell. Here you can read the description of the algorithm:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect016.htm

And here is the Harrell code in Fortran (which is surprisingly easy to understand if you already understand the calculation):

http://hmisc.sourcearchive.com/documentation/3.7-0/hoeffd_8f-source.html

(Also, see page 128 of the PDF documentation for Hmisc for more information.)

This is a very demanding computational algorithm - if you want to apply it to a data set that consists of thousands of rows and several thousand columns, even with the fast implementation of Fortran, you will wait many days for the result, even on a new machine. I hope that using an Nvidia GTX 580-level card, or, even better, Tesla, will lead to this up to two hours. In this case, the combination will be an analytical force to be reckoned with, regardless of whether genes are identified or correlated in experimental data.

In any case, I look forward to hearing from people and hope that we can make the fast, GPU-based algorithm for Hoeffding D a reality.

Thanks in advance for any ideas - and please feel free to give partial or half-baked ideas!

Update: So, Yasha generously provided the working implementation of Hoeffding D in Matlab, which fulfilled one of my goals. Another was a radical increase in speed using the GPU, preferably using a jacket. Does anyone see a path or strategy to do this in a smart way on the GPU?

+4

statistics matlab gpu cuda correlation

Doodles Feb 14 '12 at 1:24

source share

2 answers

ely · Answer 1 · 2012-02-16T03:51:13+0000

I intended this to be just a comment, but it is too long. Do not worry if you do not want to accept this as an answer or something else.

Firstly, it is commendable that you think about how to match this with the GPU. Many scientists need to take the time to consider such ideas for their algorithms. However, after reading the description, I'm not sure if the GPU is the ideal way to parallelize this particular method. The reason I'm talking about is that GP-GPU programming tends to be very efficient with "batch processing", that is, algorithms that naturally lend themselves to elementary manipulation of elements (stream level).

It is not clear if there is a useful separation of this algorithm along these lines, except for the summation / ranking level that others have already talked about (and these kinds of sub-functions are already well understood on the GPU), but if you zoom out and start trying to think about how you You can use the GPU to speed up the comparison of columns with columns, you can not do much. Knowing places where specific records are less than a given point value will not allow you to avoid doing the same calculations when one or the other columns are changed.

In short, you will have to perform N(N+1)/2 different iterations of the algorithm at the top level, and there is no way to avoid this. Inside each of these algorithms, there is enough space to send column arrays to the GPU and compare at the thread level to quickly calculate various rank statistics.

The best approach would be to use MPI in a multi-processor setup, so that you will handle each high-level iteration for different processors. The parallel master-slave model (terribly termed) seems appropriate if you do not have enough processors and / or if you want each processor to use the GPU within a single high-level iteration. As long as the elements of the upper triangle of the Hoeffding “covariance” matrix that you are trying to calculate continue to exist, you continue to roll back tasks to available processors.

Secondly, I think that Matlab and Jacket have more problems here than you might believe. Yes, Matlab is optimized for some operations with linear algebra, but almost always it is unequivocally slower than any "real" programming language. The trade-off is that you get many convenient features with commercial-level documentation, and this is sometimes useful.

An alternative that I offer you is to use PyCUDA along with mpi4py in the Python programming language. With NumPy and Cython , Python is strictly better and faster than Matlab, even in terms of usability. If you use the PyDev plugin for Eclipse , even the Matlab terminal user interface is basically identical. In addition, you do not need to pay for a license or any additional discount on a jacket.

You will need to work a bit to get PyCUDA and mpi4py to work together, but in the end, I think open-source-ness will make your efforts much more valuable to more people, and the code will probably be much faster.

Also, if you really adhere to Matlab, have you defined existing Fortran code in the case of one pair of columns for thousands of records, but only two columns? If it's fast enough, you can write a wrapper for this Fortran with a mex file. Again, simple Matlab multiprocessing sessions may be all you need.

Jascha · Answer 2 · 2012-02-17T03:54:50+0000

Below is a sample code with a simple implementation of the Hoeffding D dependency measure in MATLAB. This is NOT GPU-ized, but it can be useful for people who want to calculate these statistics and do not use Fortran, or as a starting point for posting on a GPU. (The extended heading illustrates the values of Hoeffding D for several types of shared distributions.)

 function [ D ] = hoeffdingsD( x, y ) %Compute Hoeffding D measure of dependence between x and y % inputs x and y are both N x 1 arrays % output D is a scalar % The formula for Hoeffding D is taken from % http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect016.htm % Below is demonstration code for several types of dependencies. % % % this case should be 0 - there are no dependencies % x = randn(1000,1); % y = randn(1000,1); % D = hoeffdingsD( x, y ); % desc = 'x = N( 0, 1 ), y and x independent'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % % the rest of these cases have dependencies of different types % x = randn(1000,1); % y = x; % D = hoeffdingsD( x, y ); % desc = 'x = N( 0, 1 ), y = x'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % x = randn(1000,1); % y = cos(10*x); % D = hoeffdingsD( x, y ); % desc = 'x = N( 0, 1 ), y = cos(10x)'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % x = randn(1000,1); % y = x.^2; % D = hoeffdingsD( x, y ); % desc = 'x = N( 0, 1 ), y = x^2'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % x = randn(1000,1); % y = x.^2 + randn(1000,1); % D = hoeffdingsD( x, y ); % desc = 'x = N( 0, 1 ), y = x^2 + N( 0, 1 )'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % x = rand(1000,1); % y = rand(1000,1); % z = sign(randn(1000,1)); % x = z.*x; y = z.*y; % D = hoeffdingsD( x, y ); % desc = 'x = z U( [0, 1) ), y = z U( [0, 1) ), z = U( {-1,1} )'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); % % x = rand(1000,1); % y = rand(1000,1); % z = sign(randn(1000,1)); % x = z.*x; y = -z.*y; % D = hoeffdingsD( x, y ); % desc = 'x = z U( [0, 1) ), y = -z U( [0, 1) ), z = U( {-1,1} )'; % desc = sprintf( '%s, Hoeffding' D = %f', desc, D ); % fprintf( '%s\n', desc ); % figure; plot(x,y,'.'); title( desc ); N = size(x,1); R = tiedrank( x ); S = tiedrank( y ); Q = zeros(N,1); parfor i = 1:N Q(i) = 1 + sum( R < R(i) & S < S(i) ); % and deal with cases where one or both values are ties, which contribute less Q(i) = Q(i) + 1/4 * (sum( R == R(i) & S == S(i) ) - 1); % both indices tie. -1 because we know point i matches Q(i) = Q(i) + 1/2 * sum( R == R(i) & S < S(i) ); % one index ties. Q(i) = Q(i) + 1/2 * sum( R < R(i) & S == S(i) ); % one index ties. end D1 = sum( (Q-1).*(Q-2) ); D2 = sum( (R-1).*(R-2).*(S-1).*(S-2) ); D3 = sum( (R-2).*(S-2).*(Q-1) ); D = 30*((N-2)*(N-3)*D1 + D2 - 2*(N-2)*D3) / (N*(N-1)*(N-2)*(N-3)*(N-4)); end

Ideas for using the Hoeffding "D" GPU coefficient (dependency)?

More articles: