Write down the simplest algorithm and look carefully for race conditions. How would you handle them? The first step would be to parallelize as
#pragma omp parallel for
for (i=0; i<n; i++) {
#pragma omp critical
{
if (x[i] > maxval) {
maxval = x[i];
maxloc = i;
}
}
}
Exercise: write a code which implements this and give an estimate on performance. Perform several runs, with a serial code only with and without vectorization and compare the serial code with the one that uses OpenMP. Run on different archictectures if you can.