The speedup on p processors can
be greater than p if memory usage is optimal!
Consider the case of a memorybound computation with M words of memory
- If M/p fits into cache while M does not, the time to access memory will be different in the two cases:
- T_1 uses the main memory bandwidth
- T_p uses the appropriate cache bandwidth