The speedup on \( p \) processors can
be greater than \( p \) if memory usage is optimal!
Consider the case of a memorybound computation with \( M \) words of memory
- If \( M/p \) fits into cache while \( M \) does not, the time to access memory will be different in the two cases:
- \( T_1 \) uses the main memory bandwidth
- \( T_p \) uses the appropriate cache bandwidth