+1 for true color. Such would be the heart of the matter for any high end NuBus VidCard spec. and that's exactly what we're talking about here when you get right down to it...
A couple comments:
#1: If you look at the advertising flyers for Apple's *own products* (I looked up a few when I was checking to see what grayscale depth things like the Portrait card support) Apple actually lists as a feature in some of them that the card can let you use fewer colors for greater performance. I *really* think you might need to take a step back and think about how fast these machines actually are. The Quadra 950 went to 24 bit color at 832x624 with PDS-speed VRAM and even at the time I don't think anyone would have described that mode as "fast".
The counter argument is, of course, that cards like SuperMac's wares that went up to 1600x1200 did exist, and in fact existed over Nubus, but they *were* accelerated.
#2: Handwaving games is great and all, but the fact is you *will* be sacrificing software compatibility if your main display is locked only into True Color mode, and while it may be all right for some people to say "oh, well for that stuff I keep around this old Multisync I connect to this other card/motherboard video" that's not a super helpful suggestion for a lot of other use cases.
#3: Seriously, a CLUT isn't hard and I wish I hadn't mentioned it. The sample driver code in the manual *does* lay out the boilerplate for what functions you need to support for handling the Quickdraw transactions that write values to the CLUT, the part that was missing was the code for actually writing updates onto the Toby hardware, which strictly speaking isn't important unless the goal is to make a register-level compatible hardware clone of the Toby. The reason I bemoaned that is at least when I skimmed it the first time it was unclear to me exactly where the hardware registers (including those to set the CLUT table, but a lot of the others too) are mapped in slot space and, possibly importantly how the division between the "RAM space" and "Control Space" is handled when both need to be crammed into a 1MB slice while running in 24 bit mode.
This is a thing you're going to have to at least minimally figure out even if you have a card lacking a CLUT. And I'm sure the information is there somewhere, I just didn't really grok it the first time.
But, *shrug*, whatever.
Here's an updated version of my earlier code, modified to assume the framebuffer is in external SRAM instead of internal FPGA memory. The CLUT is small and can stay in internal FPGA memory.
The dev board that's been bandied around for this has 32MB of (DDR?) SDRAM with a 16 bit bus width on it, the assumption so far has been that the framebuffer will live in that. (Which of course is going to necessitate a read/write buffer for the Mac to be able to reach it, but that shouldn't be a huge deal.) But, yeah, the CLUT should definitely live internally. It only needs to hold 256 24 bit words of memory, IE, less than 1K, so it shouldn't be a big deal.
I wanted to elaborate my thinking: I think true color is actually a natural choice to target at the beginning because it's what the HDMI TMDS encoder takes as input.
One question about that? Does the TDMS encoder operate on "a word at a time", or is there some kind of streaming function with it? IE, are there primitives hard-coded into the FPGA board's hardware design that accelerate grabbing bytes straight off the DRAM memory? Unless something like that is in play then I don't see the problem with inserting a CLUT; as BMoW's pseudocode shows, you can basically think of the CLUT as if it were a 256 pixel long framebuffer, with the pixel value the output circuitry reads from the "clutbuffer" determined by using the data value fetched from the actual framebuffer as the address.
Even if there is some kind of "streaming" where "streaming" is a FIFO of some size on the FPGA that still shouldn't be a problem, you can load that FIFO with the results of the indirection described above, right?