Compiling with and without vectorization

We can compile and link without vectorization using the clang c++ compiler

clang -o novec.x vecexample.cpp

and with vectorization (and additional optimizations)

clang++ -O3 -Rpass=loop-vectorize -o  vec.x vecexample.cpp 

The speedup depends on the size of the vectors. In the example here we have run with \( 10^7 \) elements. The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU.

Compphys:~ hjensen$ ./vec.x 10000000
Time used  for norm computation=0.04720500000
Compphys:~ hjensen$ ./novec.x 10000000
Time used  for norm computation=0.03311700000

This particular C++ compiler speeds up the above loop operations with a factor of 1.5 Performing the same operations for \( 10^9 \) elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster.

Compphys:~ hjensen$ ./vec.x 1000000000
Time used  for norm computation=58.41391100
Compphys:~ hjensen$ ./novec.x 1000000000
Time used  for norm computation=46.51295300

We will discuss these issues further in the next slides.