We can compile and link without vectorization using the clang c++ compiler
clang -o novec.x vecexample.cpp
and with vectorization (and additional optimizations)
clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp
The speedup depends on the size of the vectors. In the example here we have run with 107 elements. The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU.
Compphys:~ hjensen$ ./vec.x 10000000
Time used for norm computation=0.04720500000
Compphys:~ hjensen$ ./novec.x 10000000
Time used for norm computation=0.03311700000
This particular C++ compiler speeds up the above loop operations with a factor of 1.5 Performing the same operations for 109 elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster.
Compphys:~ hjensen$ ./vec.x 1000000000
Time used for norm computation=58.41391100
Compphys:~ hjensen$ ./novec.x 1000000000
Time used for norm computation=46.51295300
We will discuss these issues further in the next slides.