We're after an implementation of an optimised 256 point real FFT, operating on 16 bit integers. The routine needs to run 8 FFTs in parallel using the SSE registers. Input data will be presented ordered for 128 bit SSE registers (i.e for streams a,b..h, it'll be laid out in memory as
a0 b0 c0 d0 e0 f0 g0 h0
a1 b1 c1 d1 e1 f1 g1 h1
etc. It can be presented as required in terms of scale, sign, etc.
The target's 4us per 8 FFTs running single-threaded on the T2500 in my laptop.
Any questions, please PM me.
3 freelancers are bidding on average $218 for this job
I have participated in an embeded system project, in which Fast FFT is used to calculate the electrical harmonious waves. I wrote the procedure about the Fast FFT alogrithm in C++ and emluator on Windows platform.