NuBusFPGA: HDMI on NuBus Macs

mcayland

Member
I'm the author of the Q800 MacOS patchset for QEMU and came across this thread by accident when searching for something else - @Melkhior this is an amazing project, and I am pleasantly surprised that the Declaration ROM support was good enough to prototype the real one :)

One of the main reasons for adding a proper Nubus implementation with Declaration ROM support was to allow for adding extra Nubus cards to QEMU, particularly a virtual VGA card with extended resolutions, and also a virtio-9pfs card to allow file-sharing with the host similar to BII/SheepShaver. Writing the virtual devices for QEMU is fairly easy for me, but alas I don't have much background in writing Classic MacOS drivers and Declaration ROMs (plus I have a day job!) which means it will take me some time to get around to this.

Would either yourself or anyone following the thread be interested to help with writing the MacOS drivers? I am happy to put together suitable virtual devices for QEMU if there is sufficient demand, plus provide advice and debugging tips from the QEMU perspective as needed.
 

Melkhior

Well-known member
@mcayland First, thanks for the Q800 support in QEmu that's really nice to have and as I mentioned, a big help toget NuBusFPGA off the ground :)

For the virtual NuBus card - I already have my virtual version (without acceleration) of the board; it's not public but that's just because it's a bit of an ugly ad-hoc version to help me prototype the software on a more comfortable platform. I can share it if you're interested. It has the framebuffer (about 95% of which is just macfb copy/pasted!) and some super-minimalistic virtual disk support (which is just "map the memory in the superslot area"). [One thing that I didn't do properly is disabling the internal video ; I just return '0' for the sense to MacOS doesn't use it, but Qemu still open the window. The other is that there is no support for changing resolution, as my hardware doesn't support that.]

For the DeclRom, in my case it embeds the drivers, so everything should already be here - although it may require Linux (never tried on any other OS), and definitely require Retro68 for compilation. The DeclRom itself has a lot of 68k assembly, but mostly for data & stubs for going into C code. That should contain both the basic framebuffer driver & the basic RAM disk driver (which just block-copy the requested block from the superslot area to/from the MacOS buffer).
 

mcayland

Member
@mcayland First, thanks for the Q800 support in QEmu that's really nice to have and as I mentioned, a big help toget NuBusFPGA off the ground :)

For the virtual NuBus card - I already have my virtual version (without acceleration) of the board; it's not public but that's just because it's a bit of an ugly ad-hoc version to help me prototype the software on a more comfortable platform. I can share it if you're interested. It has the framebuffer (about 95% of which is just macfb copy/pasted!) and some super-minimalistic virtual disk support (which is just "map the memory in the superslot area"). [One thing that I didn't do properly is disabling the internal video ; I just return '0' for the sense to MacOS doesn't use it, but Qemu still open the window. The other is that there is no support for changing resolution, as my hardware doesn't support that.]

For the DeclRom, in my case it embeds the drivers, so everything should already be here - although it may require Linux (never tried on any other OS), and definitely require Retro68 for compilation. The DeclRom itself has a lot of 68k assembly, but mostly for data & stubs for going into C code. That should contain both the basic framebuffer driver & the basic RAM disk driver (which just block-copy the requested block from the superslot area to/from the MacOS buffer).

Thanks for the link, that's very handy. My long-term plan is to take the current NDRV used for QEMU PPC at https://gitlab.com/qemu-project/QemuMacDrivers and split it so that both an NDRV and a 68k Declaration ROMs can be built from the same project and integrate them with QEMU. I've had a look at retro68 and it looks really promising, however I reached out to the author via email and didn't get a reply, so wasn't too keen to go ahead and convert the existing driver from Metroworks 😐
 
You can always rearrange it a little... how about 'Macintosh Highly Desirable Interface' :)
I think there is a way around the HDMI license by putting it into DVI mode. I think thats how the GBA Consolizer and gets around it and why not all TVs work with audio for that mod. Couple of other things out there I think are labeled that way as "Digital Video" or "Digital Out"
 

Melkhior

Well-known member
I think there is a way around the HDMI license by putting it into DVI mode. I think thats how the GBA Consolizer and gets around it and why not all TVs work with audio for that mod. Couple of other things out there I think are labeled that way as "Digital Video" or "Digital Out"
Yes, HDMI is backward compatible with DVI (same signaling), and what most FPGAs output (including NuBusFPGA) is typically DVI on a HDMI connector. It's a lot easier to output DVI (which is basically a VGA signal in a digital form using TMDS signaling) than HDMI (which has packetized data).

It is possible to output 'real' HDMI (including sound), but the licensing is a lot more of a problem. There's a FOSS/H implementation available on GitHub, with extra comments on the licensing. Theoretically, NuBusFPGA could use that, and with an appropriate 'sound component' for Sound Manager would output audio.
 

Jockelill

Well-known member
This is an absolutely wonderful project! Just amazed by it! I for one would love a Nubus HDMI (I’d settle for DVI also 😁) card.

I have one suggestion (although expensive… and maybe it has already been suggested?), to increase the performance of the whole thing. Why not make use of 2 Nubus slots? This is for example how the early 3Dfx voodoo cards works, they two cards are identical but uses a small flat cable (SLI-cable) between them. Both cards are then interlaced and take care of every second signal and connected together using a special cable.

I know too little about the Nubus to say that it could also work here, but in theory it should double the bus performance (which I believe is the biggest bottle neck here). It would of course require both HW and SW mods to make it work, but I thought it is at least an interesting idea.
 

Melkhior

Well-known member
I know too little about the Nubus to say that it could also work here, but in theory it should double the bus performance
Unfortunately, no, it wouldn't. NuBus is a 'real' bus - that is, most of the signals are shared by all the devices (including the host, which is just another device from the point of view of NuBus). When two devices are talking to each other, the bus is occupied and everybody else has to wait. The available bandwidth is fixed and has to be shared by everybody, so using more than one board won't help, and bus bandwidth is the primary performance issue. Using block mode would help a bit (more _efficient_ use of the available a bandwidth by lowering overheads), but the host won't do it so it has to be initiated by the device, which is only useful in limited cases. Also, it's not supported on Macintosh II systems (only on Quadra and newer).

Multi-cards setups were usually to double the computational performance, rather than to improve bandwidth. PCI (old parallel one, not modern PCI express) is also a 'true' bus so adding PCI board doesn't help with overall bandwidth either.
 

Jockelill

Well-known member
Ok, bummer:(, and I assume in this case in these days computational power isn’t really the issue:).

In some high end dvd-players they use an upscaler to convert 576i to 1080p, maybe that could otherwise be a solution, that the computer feed the signal at a lower bit rate and then do the upscaling on the card. Of course it won’t be “true resolution” then, but could at least make the picture look decent on a more modern LCD.
 

Melkhior

Well-known member
@Jockelill Indeed compute power is less of an issue the days, even using a soft-core inside a FPGA you can get pretty decent acceleration (some numbers previously in the thread). Upscaling isn't going to look good for a computer display, in particular with the very crisp, not antialiased graphics that were in use back in the day. You really want 1:1 or at least integer mapping (i.e. 960x540 could look good on 1920x1080 by using 4 LCD pixels for each Mac pixel, 2:1 in both direction). Even unaccelerated at high resolution like 1920x1080, 8-bits color is quite usable; it's about twice the pixels compared to Apple's higher resolution so you need twice the time to change everything. But for localized update (i.e. a window), it's basically the same speed as a vintage NuBus device. Adding acceleration makes scrolling much smoother, even for large windows.

@uyjulian Compression cuts down the bandwidth requirement, but then you need some CPU time to compress the data if you don't have dedicated hardware on the CPU side... and on a 68k, it's going to kill any benefits from the reduced bandwidth. Adding more acceleration on the device side (which is mostly a software issue, figuring out how to implement it on the Mac side) is probably more efficient: the 68k is only used to give orders to the acceleration engine through NuBus, then everything happens on-device at 'modern FPGA' speed.
 

Trash80toHP_Mini

NIGHT STALKER
You really want 1:1 or at least integer mapping (i.e. 960x540 could look good on 1920x1080 by using 4 LCD pixels for each Mac pixel, 2:1 in both direction). Even unaccelerated at high resolution like 1920x1080, 8-bits color is quite usable
I've always wondered if the Q630/PPC Performa Video Slot native pixel doubled 320x240-640x480 setup involved any interpolation in the doubling process. Picture appears to be much too clear in 640x480 than 320x240 would be if mapped directly to four identical pixels by the process you've described. ISTR there being plenty of processing power inherent in the design? There's absolutely no involvement of the CPU system/Quickdraw outside of providing a blank window for the video feed to be piped into it. Screen caps prove the window is empty/so far as the system is concerned;

Since we would appear to have cycles to spare on the VidCard while it waits for NuBus arbitration during page buffer load, could an interpolation routine be added into the mix for smoothing transitions and sharpening edges of output from page buffer to display over HDMI? Wouldn't think there'd be any performance hit in the process, given raw CPU capability downstream from NuBus?
 
Last edited:

Melkhior

Well-known member
@Trash80toHP_Mini No sure what the "Q630/PPC Performa Video Slot" is. If it's for the video input of the machine, then you really want interpolation here; video benefits from interpolation when upscaled where as 'synthetic' images such as a vintage Mac's desktop will just look blurry.

As for spare cycles; there isn't any per se in the video output - it's all dedicated hardware:
  • The timing hardware generates the support signals for VGA/DVI at the pixel clock frequency (148.5 MHz for 1920x1080); it also signals the beginning and end of each displayable frame (the 1920x1080 visible part inside the 2200x1125 video frame) so that downstream 'video' hardware can do its job (optionally, it also signals where the hardware cursor is, but that's not supported in MacOS).
  • The address generator generates the addresses in the VRAM needed for displaying ; it basically loops over the 2073600 bytes (in 8-bits) that represents a visible frame in chunk and 128-bits (the width of the DRAM controller port). The data from each address are enqueued if a (fairly large) FIFO so that data are always available even if the DRAM controller gets busy with something else and answer with high latency every once in a while.
  • Finally, the actual 'video' part takes the timing & synchronization signals and the FIFO and generates the appropriate VGA/DVI data signals (using the FIFO data along with the required CLUT look-up in 1/2/4/8 bits, value expansion in 16 bits and passthrough in 32 bits, or optionally the cursor data with its own CLUT) and output them synchronized with the other signals (hsync, vsync)
  • The PHY part (VGA or DVI) outputs the signals appropriately ; it's fairly direct for VGA (the analog conversion is handled externally to the FPGA) but the TMDS signals in DVI requires a serialization step running at 5x the pixel clock (so 742.5 MHz for full HD)
Doing a lower resolution framebuffer on a higher resolution display (i.e. a 1152x870 image letter/pillar/windowboxed on a 1280x1024 display) should not be very hard, as you just need a second set of signals from the timing hardware to tell the video hardware to output black (or whatever background color), that's basically a larger-scale version of what the hardware cursor already does.

However doing interpolation would add a fairly complex step - you need a proper 2D-view of the picture to be able to interpolate in both direction. The current hardware is completely linear, sending one line after another. It's theoretically doable, but I wouldn't know how to do it in the current design. Doing integer replication of pixels is somewhat easier, but not trivial either: horizontally you need to duplicate at the granularity of the pixel wixth, which may be 1/2/4/8/16/32 bits, and also duplicate each line. The second (line) can be handled by the address generator, but the first (pixel) would need to be handled in the 'video' part probably.

I don't really see a lot of value in interpolation to display a Mac desktop. Implementing windowboxing to support lower resolution would be easier and probably more useful (enabling the use of 'standard' Mac resolution for specific use case like games on a higher-resolution LCD).
 

cheesestraws

Well-known member
Agreed. Interpolation is a bad idea. Windowboxing or, for high DPI displays, pixel doubling will produce far nicer looking results.
 

Trash80toHP_Mini

NIGHT STALKER
However doing interpolation would add a fairly complex step - you need a proper 2D-view of the picture to be able to interpolate in both direction. The current hardware is completely linear, sending one line after another.
'nuf said, didn't realize linearity of the process comes into play. You'd need to hold a mimimum of three to five lines in memory to have enough information to give the groups of four pixels a "better" distribution values than you get by doubling s single pixel data point across the four pixels.

I was thinking in terms of sub-pixel, enhanced image generation from a fully static image. We're talking research I did way back when the Q630 was brand new here, dunno where that stands now. At any rate, that's entirely tangential for linear hardware.

Agreed. Interpolation is a bad idea. Windowboxing or, for high DPI displays, pixel doubling will produce far nicer looking results.
Agreed, enlarging the frame buffer in even numbered increments is the way to go. Though wasn't talking about twerking the LCD out of native resolution the way image degradation occurs running non-native 4:3 resolutions at full screen on the likes of a PowerBook's display. Phosphor latency on a CRT image is forgiving.

That might explain what's likely the illusion of better than 320x240 resolution for TV images in motion as pixel doubled onto the kiddo's nice big Trinitron?
___________________________________________________

Sorry for what turned out to be moot point detouring, just had to ask. Thanks for the detailed info in your reply @Melkhior :)
 

Melkhior

Well-known member
No update in a while; I mostly back-ported the acceleration engine to the SBusFPGA for NetBSD/sparc and expanded it to get some X11 acceleration there. And now, I'm trying to figure out how to get that part to work in NetBSD/mack68k as well.
 

Jamieson

Well-known member
I picked up a SPARCstation IPX recently. Where can I find more info about your SBUS FPGA project?
 

Melkhior

Well-known member
I picked up a SPARCstation IPX recently. Where can I find more info about your SBUS FPGA project?
Only the github at this time; it was my first project so the PCB is a bit ugly and it doesn't have a 'proper' video output (I use a custom 'pmod' module to get a 2-bits-per-channels VGA output using only 8 pins). One of the target of the NuBusFPGA was to test the video output using DVI-in-HDMI, and perhaps do another version of the SBusFPGA with a more capable FPGA board... (with the current availability and cost of chips, there's no ETA). It does have more peripherals than the NuBusFPGA, including a OHCI-compliant USB 1.1 host, a proper micro-sd card slot (bootable), and a DMA-based RAM disk than can be useful for e.g. faster swap. All are supported in NetBSD 9 only, not Solaris or SunOS.
So far it's only be tested in SPARCstation 20, I really should get around getting a NetBSD on my own IPX and testing it in there.
Edit: it's somewhat off-topic for 68kMLA, there's a thread on TD that could be useful.
 

Melkhior

Well-known member
@Trash80toHP_Mini Thanks for the link; some nice information, but the issue with the 8-24GC is the (massive) amount of work needed for its internal QD implementation. That's near impossible to replicate. And block transfer aren't really needed if the accelerator and framebuffer are on the same card - and I don't own a NuBus video card to see if the NuBusFPGA could usefully do block transfers to/from a second board.

On a side note, those PDS connectors have a _lot_ of signals. The FPGA board I have doesn't have enough pins to talk to it [looking at the LCIII PDS, with all the '030 signals and then some] and do anything useful in terms of I/O, unless ignoring some signals and giving up on some potential functionalities. Even the FPGA board I'm considering as an 'upgrade' would be a bit tight. Also, the LCIII is _cramped_ and no FPGA daughterboard will fit in the case (the SBusFPGA and NuBusFPGA are both oversized due to the daughterboard, but many machines are big enough to accommodate them anyway).
 

Trash80toHP_Mini

NIGHT STALKER
De nada! Just thought the general info would be useful, some other stuff on that particular card:

Apple TIL: Macintosh Display Cards 824 & 824 GC: Rev A/B Differences

Acceleration on a card? re: missing daughtercard for LaPorta's SuperMac ColorCard/24:

ColorCard-Accle-Daughtercard-00.JPG

GAL-Based-QD Accelerator-00.JPG

MacOSMonkey

It was a GAL-based accelerator board. Without it, the card is a standard 24-bit 640x480 frame buffer with all of the typical SuperMac features, as shown above. Fun to see the old ad.
Has anyone got that daughtercard? Hi res pics would be greatly appreciated. Hoping it will yield some clues about the general machanics of an early implementation of QuickDraw Acceleration?

Might dissecting GAL formulas be helpful? If not, sorry about the tangent, M.
 
Top