When I posted the cache results, it bugged me that the difference in cache vs non-cache performance was so little compared to real-world results. That is, roughly 1,340,000 bytes/sec vs 1,320,000 bytes/sec = only 1.5% improvement when dealing with 1KB of data. Yet, real world shows 25% gains.
Two reasons:
1. My test loop is small. It easily fits in the 256 byte code cache of the 68030. Real world programs bounce around and thus benefit from an L2 cache to quickly read that additional code.
2. My test reads a byte at a time and then loops. This is legitimate code (example: the strlen routine). However, behind the scenes, the 68030 on the IIci is actually reading 32-bits (four bytes) at a time. The first uncached byte does indeed incur a bus read from the memory chip. But, the next 3 bytes are returned from the processor cache because they had been read as well. Thus, only 1 out of every 4 reads exercises the external cache card or lack thereof.
What happens if I modify my test loop to read a full 32-bit value at time? Furthermore, what if it does this 8 times in a row per loop, to reduce the percentage of time taken by the loop check?
View attachment 66358
The IIci now reads over 6 times faster. More importantly, we see a 8,500,00 / 5,800,00 = 46% improvement due to caching. The real world is going be a mix of memory access sizes, so averaging around 25% makes sense.
Other interesting notes:
* The L2 cache is no longer worse than no cache on constant cache misses (for example at 1 MB).
* I can't explain the dip between 4KB and 8KB
* I can't explain the gradual improvement by the Micron board between 8KB and 64KB. I saw something similar with byte reads. I ran the tests multiple times and the results are consistent.
- David