F # / "Accelerator v2" implementation of the DFT algorithm is probably incorrect

Question

F # / "Accelerator v2" implementation of the DFT algorithm is probably incorrect

I am trying to experiment with radio concepts. From this article, I tried to implement the discrete Fourier transform of GPU-parallelism.

I'm sure I can pre-calculate the 90 degrees of sin (i) cos (i), and then just flip and repeat, not what I'm doing in this code, and that it will speed it up. But so far I don’t even think that I am getting the right answers. Entering all zeros gives the result 0, as expected, but all 0.5 gives 78.9985886f as input (in this case, I would expect 0 results). In principle, they just confuse me. I do not have good input, and I do not know what to do with the result or how to check it.

This question is related to my other post here

open Microsoft.ParallelArrays
open System

 // X64MulticoreTarget is faster on my machine, unexpectedly
let target = new DX9Target() // new X64MulticoreTarget()

ignore(target.ToArray1D(new FloatParallelArray([| 0.0f |]))) // Dummy operation to warm up the GPU

let stopwatch = new System.Diagnostics.Stopwatch() // For benchmarking

let Hz = 50.0f
let fStep = (2.0f * float32(Math.PI)) / Hz
let shift = 0.0f // offset, once we have to adjust for the last batch of samples of a stream

// If I knew that the periodic function is periodic 
// at whole-number intervals, I think I could keep 
// shift within a smaller range to support streams 
// without overflowing shift - but I haven't 
// figured that out

//let elements = 8192 // maximum for a 1D array - makes sense as 2^13
//let elements = 7240 // maximum on my machine for a 2D array, but why?
let elements = 7240

// need good data!!
let buffer : float32[,] = Array2D.init<float32> elements elements (fun i j -> 0.5f) //(float32(i * elements) + float32(j))) 

let input = new FloatParallelArray(buffer)
let seqN : float32[,] = Array2D.init<float32> elements elements (fun i j -> (float32(i * elements) + float32(j)))
let steps = new FloatParallelArray(seqN)
let shiftedSteps = ParallelArrays.Add(shift, steps)
let increments = ParallelArrays.Multiply(fStep, steps)
let cos_i = ParallelArrays.Cos(increments) // Real component series
let sin_i = ParallelArrays.Sin(increments) // Imaginary component series

stopwatch.Start()
// From the documentation, I think ParallelArrays.Multiply does standard element by 
// element multiplication, not matrix multiplication
// Then we sum each element for each complex component (I don't understand the relationship 
// of this, or the importance of the generalization to complex numbers)
let real = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, cos_i))).[0]
let imag = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, sin_i))).[0]
printf "%A in " ((real * real) + (imag * imag)) // sum the squares for the presence of the frequency
stopwatch.Stop()

printfn "%A" stopwatch.ElapsedMilliseconds

ignore (System.Console.ReadKey ())

+2

f # fft radio gpu-programming accelerator

Jason kleban Jun 27 '10 at 18:38

source share

2 answers

:

, - .
sans- parallelism F # parallelism

( F #,

let a : float[] = ...

" ",

let aShift = a |> (fun x -> async { return x + shift }) 
               |> Async.Parallel |> Async.RunSynchronously

( , , ).

+2

Brian 27 . '10 18:55

kvb · Accepted Answer · 2010-06-27T20:50:18+0000

I share your surprise that your answer is no closer to zero. I would suggest writing naive code to execute your DFT in F # and see if you can track the source of the mismatch.

Here is what I think you are trying to do:

let N = 7240
let F = 1.0f/50.0f
let pi = single System.Math.PI

let signal = [| for i in 1 .. N*N -> 0.5f |]

let real = 
  seq { for i in 0 .. N*N-1 -> signal.[i] * (cos (2.0f * pi * F * (single i))) }
  |> Seq.sum

let img = 
  seq { for i in 0 .. N*N-1 -> signal.[i] * (sin (2.0f * pi * F * (single i))) }
  |> Seq.sum

let power = real*real + img*img

, , , , . , - ~ 52 , 79 . FWIW, ~ 0.05 , ~ 4e-18 .

F # / "Accelerator v2" implementation of the DFT algorithm is probably incorrect

More articles: