This project is based on 2 parts. The first part is basically already done. I need help with the second part.
Analyze the size of the caches on the machine.
Then answer a set of given questions in a written report:
Use the information you gathered in Part I to evaluate the performance of the six possible loop orderings for matrix-matrix multiplication. You should write a function, MM, that performs matrix multiplication on *N* x *N* matrices. The inputs to your matrix multiplication function are two matrices, *A* of dimension *N*x*N* and *B* of dimension *N*x*N*, along with *N*. The output of the function is the *N*x*N* matrix *C = A · B*. Implement all six variants in the C programming language, with three levels of manual loop unrolling for each variant - none, 2 and 4 (loop unrolling at level *i* means *i* iterations of the loop are merged into 1 iteration of the unrolled loop). You can assume that *N* is divisible by 4. Evaluate the performance of all the programs (18 in total) for values of *N* up to 512, for N a power of 2, starting at 16 (i.e. *N* = 16, 32, 64, ..., 512). Run each variant multiple times to be sure you get consistent timings (i.e. throw out outliers, and average the run times for a given variant). The matrix entries should have type **double**. You can fill in the matrix entries however you want (but don't make them all the same). The overall idea is to determine which variations work best for which values of *N*, and why.
Write up the results of your program in which you would have to include specified details.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased.
You must run your code on the CSIC Linux cluster, available via ssh at [url removed, login to view] (that IP address corresponds to the remote login server machines), so that's where you should do your development work and run the programs. More information about the lab is available here. To get accurate timing results, we suggest going to the lab and sitting at one of the machines, so that you are likely to be the only user on the machine. Also, for Part II, the timing routines from Exercise 5.2 for Part I are not accurate enough to time your matrix multiply programs. For more accurate timing routines, look at the man page for the **gettimeofday**() C library function.
I can provide more details ..