Computational Physics Lectures: How to optimize codes, from vectorization to parallelization

Loading [MathJax]/extensions/TeX/boldsymbol.js

Contents

Automatic vectorization and vectorization inhibitors, data dependencies

One has to keep in mind that vectorization changes the order of operations inside a loop. A so-called read-after-write statement with an explicit flow dependency cannot be vectorized. The following code

  double b = 15.;
  for (int i = 1; i < n; i++) {
      a[i] = a[i-1] + b;
  }

is an example of flow dependency and results in wrong numerical results if vectorized. For a scalar operation, the value $a[i-1]$ computed during the iteration is loaded into the right-hand side and the results are fine. In vector mode however, with a vector length of four, the values $a[0]$ , $a[1]$ , $a[2]$ and $a[3]$ from the previous loop will be loaded into the right-hand side and produce wrong results. That is, we have

   a[1] = a[0] + b;
   a[2] = a[1] + b;
   a[3] = a[2] + b;
   a[4] = a[3] + b;

and if the two first iterations are executed at the same by the SIMD instruction, the value of say $a[1]$ could be used by the second iteration before it has been calculated by the first iteration, leading thereby to wrong results.