clang -o novec.x vecexample.cpp
and with vectorization (and additional optimizations)
clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp
The speedup depends on the size of the vectors. In the example here we have run with \( 10^7 \) elements. The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU.
Compphys:~ hjensen$ ./vec.x 10000000
Time used for norm computation=0.04720500000
Compphys:~ hjensen$ ./novec.x 10000000
Time used for norm computation=0.03311700000
This particular C++ compiler speeds up the above loop operations with a factor of 1.5 Performing the same operations for \( 10^9 \) elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster.
Compphys:~ hjensen$ ./vec.x 1000000000
Time used for norm computation=58.41391100
Compphys:~ hjensen$ ./novec.x 1000000000
Time used for norm computation=46.51295300
We will discuss these issues further in the next slides.