I want to get a simple 3 layer (Input-Hidden-Output) layer neural network implemented on an FPGA. The network I wish to implement is a wide network with hidden neurons ~1000-2000. I want this to be implemented for highest data throughput with optimized resource utilization. Also want to software to be written for the implemented hardware.