A Quadra 650 should theoretically be somewhere between 30-40mb/s of memory bandwidth with line transfers. Scaling for clockspeed that'd give BFG 58 mb/s of bandwidth with the full speed bus @ 64mhz, seems to correspond nicely with the observed results of BFG managing ~2x blockmove performance.
That "slow" half speed L2 cache can supply data at 85 MB/s with similar latency to what BFG's RAM timings probably are; it makes sense that the cache would bring the read-heavy Tree, Sort and Search benchmarks to a similar range as the BFG. Real world performance of the cache is going to depend on data locality, though, cache thrashing wrecks performance.
If this page is to be believed Cyberstorm MK3 uses a mix of interleaving and what PC manufacturers today call dual channel RAM, two 32 bit modules supplying a 64 bit bus. Initial access latency is going to be limited by technology of the time but burst reads should be ridiculously quick as each ram access supplies two longwords and the interleaved access supplies the other two making up the 4 longwords of each 68040/68060 RAM access. It'll be complex to sequence and shuffle the extra longwords around, but performance should be very good. This would be a practical implementation of what @eharmon was speculating about.
That "slow" half speed L2 cache can supply data at 85 MB/s with similar latency to what BFG's RAM timings probably are; it makes sense that the cache would bring the read-heavy Tree, Sort and Search benchmarks to a similar range as the BFG. Real world performance of the cache is going to depend on data locality, though, cache thrashing wrecks performance.
If this page is to be believed Cyberstorm MK3 uses a mix of interleaving and what PC manufacturers today call dual channel RAM, two 32 bit modules supplying a 64 bit bus. Initial access latency is going to be limited by technology of the time but burst reads should be ridiculously quick as each ram access supplies two longwords and the interleaved access supplies the other two making up the 4 longwords of each 68040/68060 RAM access. It'll be complex to sequence and shuffle the extra longwords around, but performance should be very good. This would be a practical implementation of what @eharmon was speculating about.


