If we vectorize the code, we can perform, with a 128-bit register four simultaneous operations, that is we have
for (i = 0; i < n; i+=4){
a[i] = b[i] + c[i];
a[i+1] = b[i+1] + c[i+1];
a[i+2] = b[i+2] + c[i+2];
a[i+3] = b[i+3] + c[i+3];
}
Four additions are now done in a single step.