Processing math: 100%

 

 

 

Compiling with and without vectorization

We can compile and link without vectorization using the clang c++ compiler

clang -o novec.x vecexample.cpp

and with vectorization (and additional optimizations)

clang++ -O3 -Rpass=loop-vectorize -o  vec.x vecexample.cpp 

The speedup depends on the size of the vectors. In the example here we have run with 107 elements. The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU.

Compphys:~ hjensen$ ./vec.x 10000000
Time used  for norm computation=0.04720500000
Compphys:~ hjensen$ ./novec.x 10000000
Time used  for norm computation=0.03311700000

This particular C++ compiler speeds up the above loop operations with a factor of 1.5 Performing the same operations for 109 elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster.

Compphys:~ hjensen$ ./vec.x 1000000000
Time used  for norm computation=58.41391100
Compphys:~ hjensen$ ./novec.x 1000000000
Time used  for norm computation=46.51295300

We will discuss these issues further in the next slides.