With some feline assistance I've got a working nubus prototype.
Unfortunately, it's proven (unsurprisingly) that nubus is a bit crap, especially in the early machines with system busses that don't divide cleanly into 10mhz. On the IIcx I've tested with, the max practical data rate with my prototype is about 2.5 MB/s over nubus. Theoretical maximum of 5MB/s if I were doing 32 bit transfers (currently 16 due to card design). So, this is what an unaccelerated nubus video card has to work with. Eeek! Compare to ~21MB/s on 15.66mhz PDS.
Practically, maximum throughput is about 2.1 MB/s. It still beats a scsi emulator badly (3x) at random access and <16K transfers, but past that point it's merely faster than SCSI rather than a massive improvement. The random/small block access is responsible for much of the practical speedup, so there's still a point, but it's less exciting than the 3x and 4x across the board the SE/30 and IIfx/Iisi manage. A 32 bit card would be somewhere around 1.6x faster on sequential performance, at a guess, so that'd be kind of the bare minimum for a viable design.
Nubus cycle time is essentially limited to about 1.25 MHZ maximum on Mac II/IIx/IIcx assuming each nubus transfer completes in the minimum two cycles.
@Jockelill found an
interesting document that confirms these findings. On Quadra 650, it's a bit better as that can do 3.33 MHZ owing to it being able to issue 3 cycle nubus cycles forever. That works out to 13.3MB/s hypothetical bandwidth without block transfers or 2x clock mode (both of which are unavailable on the earlier nubus implementations). I've got someone set to measure what a IIci can manage as I view that as kind of the median case for a nubus application. Of course, IIci could also have a hypothetical cache slot splitter that does CF which would eliminate the need for NuBus on that particular machine.
NuBus is also problematic to implement due to using inverted signalling - the multiplexing isn't a problem, but inverted is as almost all of the handy chips that'll deal with the inversion are obsolete. Practically, I'd need to use a CPLD to demultiplex, invert, and act as registers as well as control logic... and the ATF15xx CPLD don't have enough drive strength to do nubus directly. Unless I do as the bluescsi v1 does and drive the chip out of spec, anyways (nope). So, that makes the "minimum viable" implementation already a rather complicated affair requiring around 8 TTL chips worth of glue. I'll have to keep chewing on it, but a hypothetical nubus version is on the backburner for now.
The qualification batch of the PDS cards is due soon. Unfortunately, I don't know that I'll be able to get an actual production batch done before the criminal in chief's tariffs and de minimus removal take effect, so the price point may vary depending on how badly I get hit by tariffs
On the programming front, don't do quickdraw at SecondaryInit / second BootRec time; only 7.5 seems to have a valid QD state here. Other versions of the OS will crash, understandably enough.
I took a video of what the "quick start" first time setup for one of these cards looks like:
Some preliminary documentation here.