Calculating the rate of a vector expression || aW + bX + cY ||

I am a PhD student. In introducing my thesis, the compromise between expressiveness and the characteristics of linear algebra tools undermines me.

As a simple example, I use the calculation of the norm of a vector expression. C code for my example:

float normExpression3(float a, float *W, float b, float *X, float c, float*Y){ double norm = 0; for (int i=0; i<n; ++i) // n in [3e6; 2e8] { float tmp = a*W[i]+b*X[i]+c*Y[i]; norm+=tmp*tmp; } return sqrtf(norm); 

}

I compare the results achieved with different tricks. Since vectors are large (several million elements), the characteristics are limited by the memory bandwidth. However, there are huge differences between the various approaches.

The optimized version of C that I wrote is not expressive (the new function should be written as the 4th vector) and very ugly (stream and vectorized), but reached 6.4 GFlops. MATLAB code, on the other hand, is very nice:

 result = norm(a*W+b*X+c*Y) 

but only reaches 0.28 GFlops.

The expressions of C ++ templates in la Blitz ++ provide both expressiveness and results for the user (6.5 GFlops).

As part of my analysis, I would like to know how functional languages ​​can compare with these approaches. I thought of showing an example in either Haskell or OCaml (AFAIK, both are well suited for this kind of operation).

I do not know any of these languages. I could learn about them to present my example, but that would not be a fair comparison: I am not sure that I can provide an implementation that allows both expressiveness and speech.

So my two questions are: 1) which language is best suited? 2) how can we efficiently calculate the norm of vector expressions without compromising the generality of implementation?

Thank you in advance!

Wilfried K.


Edit: adjusted battery type norm for float to double

+4
source share
4 answers

For what it's worth, the next version of OCaml is your function:

 let normExpression3 awbxcy = let n = Array.length w in if not (n = Array.length x && n = Array.length y) then invalid_arg "normExpression3"; let (@) = (Array.unsafe_get : float array -> int -> float) in let rec accum awbxcyni norm = if i = n then sqrt norm else let t = a *. (w @ i) +. b *. (x @ i) +. c *. (y @ i) in accum awbxcyn (i + 1) (norm +. t) in accum awbxcyn 0 0. 

It provides some performance benefits, namely:

  • Uncontrolled access to the array (or, rather, checking the boundaries of the array manually removed from the loop)
  • Access to a Monomorphic Array
  • Recursive inner loop to avoid boxing and unpacking the floating point drive
  • Lambda-raising inner loop to avoid referencing closed values

The latter optimization should be tested on a closed inner loop, since with so many parameters, the case distinction may differ from the cost of links to closed access parameters.

Note that usually no one will worry about these optimizations unless someone competes in benchmark ;-) Note. In addition, you will need to test this with 64-bit OCaml, because otherwise arrays are limited to 4 mega elements.

+5
source

1) which language is best suited?

Or used for this kind of tasks. The main problem would be the availability of the necessary libraries (for example, for vectors or matrices) and whether parallelism was necessary.

Libraries such as vector and repa in Haskell work well. And in the case of repa, you also get parallelism.

2) how can we efficiently calculate the norm of vector expressions without compromising the generality of implementation?

One approach would be to use metaprogramming techniques to create specialized implementations of computing cores from high-level descriptions. In functional languages, this is a relatively common method based on small domain-specific languages ​​with customizable code generators.

See Specialty Simulator Generators for High Performance Monte Carlo Techniques or Work at OCaml on FFTW.

+2
source

Not as much as an answer to a functional language as such, but note that your implementation for calculating norm (in C) is different from how matlab actually calculates it.

And yes, there really are good reasons for this. Most likely, your norm approximation is pretty much useless (as it is currently implemented) for any real use. Please do not underestimate the "difficulties" associated with calculating the numerically superior approximations of norm s.

+1
source

As the don said, consider Repa. Here is some simple code to get you started.

 import Data.Array.Repa as R len :: Int len = 50000 main = do let ws = R.fromList (Z :. len) [0..len-1] xs = R.fromList (Z :. len) [10498..10498 + len - 1] ys = R.fromList (Z :. len) [8422..8422 + len - 1] print (multSum 52 73 81 ws xs ys) multSum abc ws xs ys = R.map (a*) ws +^ R.map (b*) xs +^ R.map (c*) ys 

This still leaves you the opportunity to find a good way to get data from disk and into a Repa array. I think that reading all this as an Llazy ByteString and using Repa.fromFunction, maybe someone will have something in common with a smarter way.

0
source

Source: https://habr.com/ru/post/1347136/


All Articles