for (i = 0; i < n; i++){
a[i] = b[i] + c[i];
}
If the code is not vectorized and we have a 128-bit register to store a 32 bits floating point number, it means that we have \( 3\times 32 \) bits that are not used. For the first element we have
0 | 1 | 2 | 3 |
a[0]= | not used | not used | not used |
b[0]+ | not used | not used | not used |
c[0] | not used | not used | not used |
We have thus unused space in our SIMD registers. These registers could hold three additional integers.