• Hello, Guest! Welcome back, and be sure to check out this post for more info about the recent service interruption and migration.

NuBusFPGA: HDMI on NuBus Macs

Melkhior

Well-known member
Hello,
I've mentioned my crazy project of interfacing a FPGA (board) with a NuBus Mac in another thread.
I'm opening a new dedicated thread as there is a significant update now:

NuBusFPGA.jpg
This show the naked PCB on the left, the populated version without the required daughterboard in the middle, and the ready-to-use (if you can live without a proper bracket, need to 3D-print one) on the right.

As of earlier today, the board will pass POST in my Quadra 650, and the embedded declaration ROM (in the FPGA bitstream) will enable a single-resolution, 8-bits depth only framebuffer of the desired resolution up to 1920x1080, which is what I'm testing it with. This framebuffer will work as a secondary screen, the 'startup' screen (by moving the smiling mac in the Monitor control panel) or as the only screen by unplugging the monitor from the onboard video of the Q650.

The framebuffer in the FPGA should be able to support 1/2/4/8 bits and even full color, it was tested using my other (and more mature) project, SBusFPGA., but that is'nt supported yet.

Now for the disclaimers, warnings, and other caveats:
  • As mentioned previously, this is an expensive toy and nothing more - there's currently no plan to manufacture and/or sell them (if some established vendor wants to give a more cost-effective version a try, I'm open to supporting the project, but I don't have the time to do it myself)
  • They are not easy to make; the PCB is only 4 layers but plenty of surface-mount chips including the large Xilinx CPLD (mine were professionally assembled by SeeedStudio, which also made the PCB)
  • They are not easy to set up; the FPGA is simple enough, but the CPLD requires configuration via a dedicated JTAG programmer (which adds to the cost), and Xilinx tools have not been upgraded for years so I needed to hand over a USB controller via VFIO-PCI to a Windows 7 virtual machine just to program the d*mn thing...
  • They are wider than a 'proper' NuBus card, so would block a neighboring NuBus slot; however in the Q650 (or the IIci) they fit in the innermost slot ($C for the Q650) with no issue leaving both other slots free
  • Most of the gateware & software is super-early and not tested beyond being able to start SimCity 2000 and it looking OK
  • Changing resolution dynamically would require changing the video clock dynamically in the FPGA, which is doable but difficult, so it's not on the TODO list (if you change monitor, changing the FPGA bitstream is easier and more efficient design-wise)
  • Changing depth should be a matter of ROM/software support
  • It's not fast. Quite usable in 8 bits, but noticeably slower than the Q650 internal video (even accounting for the larger area requiring more memory). Experience with the SBusFPGA shows that even on a faster bus, unaccelerated 32-bits truecolor is too slow to be usable.
  • Acceleration should be possible, but is going to require a *lot* of software work and hardware-software codesign
  • VGA output may have color range limitation (of unknown hw origin), so it's likely HDMI is the way to go
  • Dual-head with HDMI and VGA is theoretically possible, but not really planned
And here's what it looks like in the Q650 (the bright debug LEDs aren't helping):
NuBusFPGAinQ650.jpg

Special mention to the awesome folks who made the Quadra 800 version of Qemu, without that simulation tool to implement the DeclRom this might not have happen. Also to the entire Litex community, another FOSS/H project without which this would not have been possible.

The entire project is on GitHub (and is a bit of a mess, requiring files from various other projects including SBusFPGA and XiBus).
 

cy384

Well-known member
Special mention to the awesome folks who made the Quadra 800 version of Qemu, without that simulation tool to implement the DeclRom this might not have happen. Also to the entire Litex community, another FOSS/H project without which this would not have been possible.
Thanks for sharing your code for this! (even using retro68 to build it! I really like to be able to use modern tools for mac dev). Fantastic! I'm very interested in hearing about your development process for the DeclROM, could you discuss that a bit?
 

Melkhior

Well-known member
@cy384 I started with the various examples from Apple in 68k assembly for MPW, which isn't pleasant, and quickly moved to 68k assembly using retro68, which was somewhat less unpleasant, all of that 'blind' (no hardware). That got me not very far not very quickly...

Then I discovered Qemu can actually emulates a Q800, and used the existing Mac framebuffer from the virtual Q800 as a basis to create a new virtual NuBus framebuffer that had virtual hardware registers 'matching' the hardware I was using in my FPGA design - so I could write the software on that 'digital twin' instead of the (then non-existent) hardware. This enabled me not just to check for 'basic' stuff, but quickly cycle through multiple MacOS (7.1, 7.6, 8.1) to discover what was needed/used 'for real' in the DeclRom and the embedded driver, up to a functional DeclRom/driver in Qemu. I could basically dump any information I wanted (which functions were called, which which selector, values of various variables, ...) from inside my code to some 'fake' 'debug' hardware registers, and the virtual framebuffer in Qemu would just print those. It's a lot easier to get debug info out of C code via 'printf' than out of an FPGA... The DeclRom that worked on Qemu worked unmodified on the real hardware as well, to my delight (and surprise).

I also mostly moved away from assembly to C code; that last bit was greatly helped by the "Pickles" boards source code whose existence I learned on this forum (thanks folks!), they were doing mostly-C as well and having an example helped figure things out.

An issue is that there's some apparent 'holes' in Apple's documentation; some of the selectors were added after DCDMF3, but are obsolete by the time the PCI was introduced and I couldn't find any in-between documentation. Also, shamefully, there is support for hardware cursors on PCI but the selectors are never called on NuBus so we can't use one (I implemented a hardware cursor in the framebuffer as it's quite helpful under X11 for NetBSD/sparc in the SBusFPGA, so I hoped to leverage that in NuBusFPGA as well).

There's still a lot of 'potential' work to do in the DeclRom/driver, starting with depth change support.
 

Melkhior

Well-known member
and quickly moved to 68k assembly using retro68
... retro68 to which I forgot to give a shoot-out, hereby corrected: that's another project that was super important.

There might be others that I forget; the strength of FOSS/H comes from the ability to build on each others' projects without having to reinvent the wheel every time. Thank you all :)
 

bdurbrow

Well-known member
FWIW, on my other machine (2015 i7 iMac) I've got a linux-based VM that runs Xilinx ISE... although I haven't tested programming an XC9500 series chip with it yet. It's based off of the VM that Xilinx was distributing for running ISE under Windows 10. IIRC, it runs in VirtualBox.
 

SuperSVGA

Well-known member
I'd recommend being careful about using HDMI if you don't have a license. While unlikely, you could theoretically get in trouble for it.
 

bdurbrow

Well-known member
I'm pretty sure that's only if you sell it; and mark it with the registered trademarks. However - I am not a lawyer, nor do I play one on TV, etc, etc...

But in this case; it's most definitely NOT an HDMI connection - that would require adherence to HDCP, etc - it's just a DVI signal being sent over an oddball connector. See how this works? ;)

Should probably change the thread name, come to think of it...
 

SuperSVGA

Well-known member
I'm pretty sure that's only if you sell it; and mark it with the registered trademarks. However - I am not a lawyer, nor do I play one on TV, etc, etc...

But in this case; it's most definitely NOT an HDMI connection - that would require adherence to HDCP, etc - it's just a DVI signal being sent over an oddball connector. See how this works? ;)

Should probably change the thread name, come to think of it...
I don't know if selling it has anything to do with it, however I do know people have been sent papers from lawyers just from projects even thinking about using HDMI.

As far as I understand if it's anything using the connector and carrying anything close to HDMI signals, then it counts. If you used it as a simple GPIO breakout, that's fine. But there's no easy way I know of to cheat the licensing other than avoiding it all together.
And for whatever reason if you don't use the HDMI name, you actually pay more in licensing. Using the HDMI branding gets you a discount, and if you implement HDCP you get an additional discount.
 

Melkhior

Well-known member
I don't think any licensing is needed for my 'Highly Desirable Macintosh Interface (to the modern world)' though perhaps I should change the acronym ;-)

Anyway, got 1/2/4 bits to work in the rom/firmware, and Speedometer 4.02 confirms it's not fast in any modes.

Here's the Q650 internal video:

Code:
(...)
Color Quickdraw: 2.30 (32 Bit QD)
Display Manager: Present
Bit Depth: 8
Maximum Depth: 8
Primary Screen Size: 1152 x 870
Screen Resolution: 77 X 77
(...)
Black & White: 1.229
4 Colors: 1.217
16 Colors: 1.218
256 Colors: 1.215
32,767 Colors: 0.000
Color Test Average: 1.220

And here's the NuBusFPGA

Code:
(...)
Color Quickdraw: 2.30 (32 Bit QD)
Display Manager: Present
Bit Depth: 8
Maximum Depth: 8
Primary Screen Size: 1920 x 1080
Screen Resolution: 72 X 72
(...)
Black & White: 0.747
4 Colors: 0.610
16 Colors: 0.470
256 Colors: 0.365
32,767 Colors: 0.000
Color Test Average: 0.548

Although to my surprise, it's faster than some older machines in the 4.02 database - I'm guessing some of the tests are CPU-bound rather than video-memory-bound, and the 68040 in the Q650 helps vs. the 68030 in a IIci or LCIII. Unfortunately I don't have a 'real' NuBus video card to compare with. I wonder if the higher latency of the internal DDR3 is a factor or not in the overall performance.
 

demik

Well-known member
Changing resolution dynamically would require changing the video clock dynamically in the FPGA, which is doable but difficult, so it's not on the TODO list (if you change monitor, changing the FPGA bitstream is easier and more efficient design-wise)

That's how the VideoMacPacHack works. It send a different bitstream in the FPGA (thread here) Not a bad idea at all.

Although to my surprise, it's faster than some older machines in the 4.02 database - I'm guessing some of the tests are CPU-bound rather than video-memory-bound, and the 68040 in the Q650 helps vs. the 68030 in a IIci or LCIII. Unfortunately I don't have a 'real' NuBus video card to compare with. I wonder if the higher latency of the internal DDR3 is a factor or not in the overall performance.

The Quadra onboard videos are fast for that era. They are directly connected to the 040 system bus. All unaccelerated NuBus cards are slower than onboard Quadras. The Iici is a slow vampire onboard GPU.

Thank you for providing the sources of that awesome project, that will make for an interesting lecture ! I'm especially interested to know how you handled the simultaneous memory access from the NuBus and HDMI Output. How much is the DDR3 latency ? I know some people had trouble with that on other retro projects.
 

Melkhior

Well-known member
It send a different bitstream in the FPGA (thread here) Not a bad idea at all.
During use? I was thinking more of "halt the computer, take the board out, load the new bitstream, replug the board and the new monitor"...

I'm especially interested to know how you handled the simultaneous memory access from the NuBus and HDMI Output.
It's handled for me by the Litex infrastructure. The DDR3 controller has a crossbar of ports; in the current design, two ports are used, one through Wishbone to the rest of the SoC (through which the host request goes; NuBus->NuBus/Wishbone Bridge->Wishbone Bus->Wishbone Memory Adapter->Litex Native DDR3 controller, and then all the way back), the other directly to the framebuffer via a DMA loop and a big FIFO to hide latency issue (refresh cycles, ...)

The solution is not perfect but it's simple. If you have another port/source of request to the DDR3 controller that is really pushing the memory, you can starve the framebuffer and the picture will shift from the missing data... although I've only seen it happening when playing with dedicated hardware to manipulate the framebuffer memory in 32-bits mode - lower depth means the fixed-size (in bytes) FIFO can buffer a lot more pixels and so has a lot less issue, and the dedicated hardware can really hammer the memory subsystem.

How much is the DDR3 latency ? I know some people had trouble with that on other retro projects.
I have no idea of the number... I think the primary issue is really the upper bound rather than the average - refresh cycles can cause some requests to be delayed... that's why the FIFO is needed (and large). The original design of the framebuffer is from Litex, I just built on it to get the features I wanted (indexed mode with CLUT, hardware cursor, switchable bit depth, ...).

Edit: more link
 

Melkhior

Well-known member
Turns out, 32 bits is slow, but not as slow as I feared; X11 on NetBSD/sparc is probably slowed down by a lot of CPU-based XRender. MacOS 8.1 is probably a lot more frugal in its use of CPU and read-back bandwidth.

For kicks, the monitor control panel on the NuBusFPGA "goblin" framebuffer running in 1920x1080 and set to 'millions of colors', captured by "apple-caps-4" and then converted to JPG by GraphicConverter 3.9.1:
moniteur.jpg
I'm quite happy with that :)
 

Johnnya101

Well-known member
So here are the results of a Speedometer test with a Thunder/24 nubus card from LEM.

computer 1-bit 2-bit 4-bit 8-bit
IIcx, "Toby" 0.252 0.248
IIcx, ROPS 24XLi 0.245 0.242
IIcx, Thunder 24 0.296 0.283 0.276 0.398

I think higher is better. Going off this, looks like yours is right around there? Am I comparing them correctly?
 

Melkhior

Well-known member
@Johnnya101 Yes higher is better. The IIcx is probably quite limited by the CPU for the Toby. Low-end-mac has Speedometer 4.02 results for some systems, I suspect the IIfx is the closest comparable 'dumb nubus video' system, so it seems the performance I get is fairly decent overall.
 

chue

Member
I don't think any licensing is needed for my 'Highly Desirable Macintosh Interface (to the modern world)' though perhaps I should change the acronym ;-)
You can always rearrange it a little... how about 'Macintosh Highly Desirable Interface' :)
 

Johnnya101

Well-known member
That's what I'm seeing too. Definitely not a rocket or anything, but it ain't a Toby either. Looks comparable to the mid/higher end Radius PrecisionColor Pro cards with acceleration on?

Might have to build one...
 
Top