Computational Physics Lectures: How to optimize codes, from vectorization to parallelization

Compiling with and without vectorization

c++ -o novec.x vecexample.cpp

and with vectorization (and additional optimizations)

c++ -O3 -o  vec.x vecexample.cpp

The speedup depends on the size of the vectors. In the example here we have run with

$10^7$ elements. The example here was run on a PC with ubuntu 14.04 as operating system and an Intel i7-4790 CPU running at 3.60 GHz.

Compphys:~ hjensen$ ./vec.x 10000000
Time used  for vector addition = 0.0100000
Compphys:~ hjensen$ ./novec.x 10000000
Time used  for vector addition = 0.03000000000

This particular C++ compiler speeds up the above loop operations with a factor of 3. Performing the same operations for

$10^8$ elements results only in a factor

$1.4$ . The result will however vary from compiler to compiler. In general however, with optimization flags like

$-O3$ or

$-Ofast$ , we gain a considerable speedup if our code can be vectorized. Many of these operations can be done automatically by your compiler. These automatic or near automatic compiler techniques improve performance considerably.