Optimized FFT and math for AT91SAM9 ARM processor Linux user program

Question

Optimized FFT and math for AT91SAM9 ARM processor Linux user program

I am developing C / C ++ software for an embedded Linux system with Atmel AT91SAM9G20 processor. I need to quickly calculate the FFT using a fixed point math (or possibly a floating point) using a Linux user space program. I understand that the assembler could be here in relation to the implementation and that when compiling using the gcc compiler, the additional -mpcu switch may be required. What is the best way to get started with this implementation, and are there any good links to books or optimized FOSS libraries?

I need to implement some algorithms that will also require small FFT lengths (e.g. 1024 points), and I would like to wonder if some libraries (e.g. kissfft) will work. I am also interested in long FFT lengths, so FFTW, as suggested in the answer below, will work well too.

As related to this question, I am also interested in how integer division is handled in the ARM9 Linux user program. If I divide two integers (e.g. 25/4), is separation done using floating point floating point numbers? I also need to implement some large-number crunching algorithms, and I wonder if it's better to use a fixed point here than floating point math, and how the gcc compiler really handles things.

+4

c ++ floating-point linux arm fixed-point

Nicholas kinar Mar 26 '12 at 15:14

source share

1 answer

sehe · Accepted Answer · 2012-03-26T15:17:50+0000

FFTw contains optimizations for specific processors (and can also compile CPU time and runtime).

Version 3.3.1 introduces support for ARM Neon extensions

http://www.fftw.org/#features

And from the FAQ: Question 4.2. Why is FFTW so fast?

This is a difficult question and there is no easy answer. In fact, the authors do not fully know the answer. In addition to many of the small performance hacks in FFTW, there are three main reasons for the speed of FFTW.
FFTW uses many FFT algorithms and implementation styles that can be arbitrarily designed to adapt to the machine. See Q4.1 “How does FFTW work?”.
FFTW uses a code generator to create highly optimized routines for computing small conversions.
FFTW uses explicit partitioning and capture to take advantage of the memory hierarchy.

Optimized FFT and math for AT91SAM9 ARM processor Linux user program

More articles: