Well, not all experiments are successful but here's a quick write up on my latest one. This feature probably won't make it to prime time, but it was an interesting experiment.
It started with the thought - Modern ROM SIMMs have a drastically faster response time of ~70ns as compared to the original ROM's ~150ns. In theory, if I modify the booster logic it would be possible to 'shortcut' ROM access in order to increase ROM read performance by 2x or more.
First a couple of findings about the GLUE:
- ROMOE signal on GLUE does *not* use /AS or /DS as a qualifier. So, at any times the ROM may be activated if the upper address lines A31-A28 encode $4 for the ROM space, including between bus cycles. Eeek.
- DSACK0/1 driven in response to /AS + address $40000000 is run by a state machine inside GLUE, it does not cancel earlier if /AS goes away early.
But, the first case actually helps us: Because the booster has an internal /AS signal that goes in accordance with the CPU's 47mhz operation, and as the ROM will be active and starting to read data at that point, we don't even need to do a proper "external" (off accelerator) bus cycle. We can just
not issue the /AS signal to the PDS (system bus) and wait on the ROM to return data anyways, since ROM access is not qualified with /AS at all by the GLUE.
So instead I modified my Booster code to issue an internal bus cycle termination after ~ 70ns, which is a good deal faster than the 230ns the complete cycle normally takes. Less than a third of the time, so we should have a
lot more ROM bandwidth!
Results?
Speedometer reported no performance change. Boo.
System info indicated 2% faster quickdraw performance, mostly in the few algorithmically heavy tests (ie. drawing shapes)
Macbench reported 10% faster quickdraw performance. (It mostly uses algorithmically heavy tests rather than memory-constrained stuff like CopyBits)
Individual macbench subtests ranged from a minimum of 2% faster to a maximum of 41% (frame round rect) with most being 10-30%.
Interestingly, macbench shows the stock CPU to be slightly faster at "copybits - copy" by 15%. I'll have to validate that...
A little underwhelming. I suppose it makes sense QuickDraw would be the code in ROM that would be accessed most heavily. But, most code is going to be running out of RAM so it would not benefit from this unless it hit a ROM routine with a hotspot (CPU intensive portion) just a bit too big to fit in the 68030 internal caches. And RAM access is as slow as ever, so fundamental performance elsewhere won't change.... unless an L2 cache was in play, anyways.
Admittedly, these benchmarks aren't really heavilly accessing the toolbox anyways outside of QuickDraw tests. ROM shadowing isn't a new idea, and supposedly it helps Quadras a bit; perhaps the IIsi ROM is largely bypassed by the newer System 7.5 I was testing with. Open to ideas if anyone knows a benchmark that'd really hammer toolbox calls.