for (i = 0; i < n; i++){
a[i] = b[i] + c[i];
}
If the code is not vectorized and we have a 128-bit register to store a 32 bits floating point number, it means that we have \( 3\times 32 \) bits that are not used. For the first element we have
| 0 | 1 | 2 | 3 |
| a[0]= | not used | not used | not used |
| b[0]+ | not used | not used | not used |
| c[0] | not used | not used | not used |
We have thus unused space in our SIMD registers. These registers could hold three additional integers.