The last line is a recent modification of the gateware, where the write are put in a FIFO and immediately acknowledged to the NuBus, thus saving on write latency - and boosting write bandwidth significantly. It doesn't change read behavior. That single modification push the Speed 4.02 number for unaccelerated 8-bits by 30%, to 0.486. It also helps the accelerated mode, which (with basic rectangular bitblit & solid fill) moves from 0.8 to 0.9 ; it also helps all lower-depth numbers but less significantly.
The 0.486 is close to what you get with Toby in an identical machine; from DCDMF3's description of Toby, I suspect the loss comes from the higher read latency (Toby has really fast VRAM!)- I have to cross clock domains from NuBus to the system clock then wait for the DDR3 then cross back to NuBus, all of that takes time.
You are welcome.
Pretty significant improvement indeed. 0.9 is kinda impressive to be fair, good job !
Toby VRAM is also dual ported, which means reads from the DAC and writes from NuBus are not interfering with each other. Read latency is 40-60ns IIRC. All chips are also sort of in parallel, so that's a 32 bit interface as well. Can BRAM be used in a similar way ? (Sorry my knowledge of FPGA is limited)