QuadraFPGA: HDMI for the 68040 PDS slot

eharmon

Well-known member
Since you're hanging off PDS now, have you considered providing the 040 an L2 cache as well?

HDMI and cache acceleration would be hard to say no to. Of course, there's a ton of other possibilities too...

(Relatedly, I'm in for a group buy of connectors if anyone wants to do it).
 

Melkhior

Well-known member
Since you're hanging off PDS now, have you considered providing the 040 an L2 cache as well?
I haven't investigated external caches for the '040. It's theoretically possible, but I'm missing some of the required signals (I didn't have enough pins for /MI.SLOT or /CIOUT for instance). And the amount of memory available internally in the FPGA isn't that big. DDR3 is way too slow (it barely matches the performance of onboard memory when used as memory expansion on the IIsi).

I've looked into adding external SRAMs for data but for the '030 - and for it a really fast cache implementation is "interesting': there's no time to actually do all the needed lookups in one cycle, so you either have one wait state and do things 'cleanly', or you just assert the termination signal /STERM and if it turns out the data wasn't in the cache you force a retry of the bus cycle.... ugh! I didn't look if a complete timing analysis of a zero wait state external cache for the 68040 was available (for the 68030 it's available in the datasheets of cache tag chips, they did all the work for the customers! the '030 was really successful).

Was there external caches for the '040 available on PDS slots back in the day?
 

NJRoadfan

Well-known member
Completely silly question but... would this card theoretically work in a 6100 using the PPC 601 PDS-to-040 PDS riser the DOS card came with?
 

Phipli

Well-known member
I have one of those. I've never been able to prove that it's working properly. The difference in benchmark results are marginal to the point that they could be MOE, but I haven't been super methodical in my testing.

Edit: it's also incompatible with my silicon express cards, which I value more highly, so it lives in an antistatic bag in the desk drawer.
If the benchmarks you're running fit in the stock 040 cache, you won't see a performance difference at all. If it mostly fits, you won't see much difference.

Try MacBench 4.0's big productivity benchmark that runs lots of software snippets.
 

lobust

Well-known member
If the benchmarks you're running fit in the stock 040 cache, you won't see a performance difference at all. If it mostly fits, you won't see much difference.

Try MacBench 4.0's big productivity benchmark that runs lots of software snippets.

I think I've tried that already, or I might have only downloaded MB when I started playing with the PPC card...

When I get a chance I'll try again and actually document it!
 

cheesestraws

Well-known member
I have one of those. I've never been able to prove that it's working properly. The difference in benchmark results are marginal to the point that they could be MOE, but I haven't been super methodical in my testing.

FWIW, I have a couple of CPU socket cache cards and those seem to make only marginal benchmark differences also - so chances are it is working, it's just not very effective.

(We should stop derailing this thread now - maybe create another to discuss if you like?)
 

Melkhior

Well-known member
Completely silly question but... would this card theoretically work in a 6100 using the PPC 601 PDS-to-040 PDS riser the DOS card came with?
Good question. If that faithfully recreate the behavior of the '040 bus, I don't see why the basic framebuffer wouldn't work.

The acceleration is currently patching 68K QD, so that wouldn't work (it doesn't in the NuBusFPGA in a 7100 for instance, though the framebuffer itself work).

I have one of those. I've never been able to prove that it's working properly. The difference in benchmark results are marginal to the point that they could be MOE, but I haven't been super methodical in my testing.
From some testing I've done, the memory subsystem of my Q650 seems pretty quick - memory wasn't as slow compared to the CPU back then as it is now. So a cache would have to be really fast (basically being as fast as the CPU bus can be) to provide any meaningful benefits.

Fore a 'new' one, there's modern SRAM that can do that for data (and then some, 10ns async SRAM are somewhat affordable), the question is whether it's possible to create a tag subsystem/SRAM control that is fast enough for a 33 MHz '040.
 

zigzagjoe

Well-known member
Good question. If that faithfully recreate the behavior of the '040 bus, I don't see why the basic framebuffer wouldn't work.

The acceleration is currently patching 68K QD, so that wouldn't work (it doesn't in the NuBusFPGA in a 7100 for instance, though the framebuffer itself work).


From some testing I've done, the memory subsystem of my Q650 seems pretty quick - memory wasn't as slow compared to the CPU back then as it is now. So a cache would have to be really fast (basically being as fast as the CPU bus can be) to provide any meaningful benefits.

Fore a 'new' one, there's modern SRAM that can do that for data (and then some, 10ns async SRAM are somewhat affordable), the question is whether it's possible to create a tag subsystem/SRAM control that is fast enough for a 33 MHz '040.

With my socketted Carrera 040 for SE/30, I played around with disabling/enabling the external 128K cache to characterize benefits. I want to say it's zero wait state cache, but I never actually put LA on this to verify.

There's definitely a performance improvement recorded in benchmarks, but it was a lot more situational due to the larger, better internal 68040 caches. Only 3 subtests of System info reflected anything, and 2 speedometer tests though of course, this is not neccisarily a reflection of real-world impact. I really should break doom out and test with that, as it is very bus-bandwidth-hungry and loves cache.

That was a 45mhz 68040 accessing a slow 16mhz 68030 bus via slow translation logic with a slow memory controller that doesn't support bursting: More or less a worst case scenario. A Quadra's bus is head and feet faster, so I'd really expect the impact of adding a L2 cache to be minor / not really worthwhile as compared to the 68030 where a zero wait state cache has noticable impacts across the board.
 

Melkhior

Well-known member
A Quadra's bus is head and feet faster, so I'd really expect the impact of adding a L2 cache to be minor / not really worthwhile as compared to the 68030 where a zero wait state cache has noticable impacts across the board.
I tend to agree. But if a zero-wait-state cache could be made by a FPGA/external SRAM combo (or even a small prototype using the FPGA for data as well), it could be built with some level of associativity rather than the pure direct mapping of the era. That might help a bit. And using modern SRAM (and not worrying about BOM cost too much...), it could be a bit larger as well, probably up to 512 KiB (evenlarger would be prohibitely expensive).

But it's something more interesting to implement for the 68030 indeed.
 

Jockelill

Well-known member
We all love success stories right ;)?

Well, here is my QuadraFPGA running in my Q650 :D. I cannot take much credit other than soldering the through mounted stuff, replacing a SMD-resistor and flashing the thing :). The rest is pure French magic from the wizard @Melkhior ! Incredible happy for this!!!

In the machine, the card is really tiny! Gonna work on a little bracket for it also:
040FPGA1.jpg
040FPGA4.jpg
Some benchmarks:
040FPGA3.jpg
My machine is "stock" except for a hacked ROM to add ROM disk and large SIMM support (has 8MB onboard, 2x128MB and another 32MB)

And of course also something "useful". Playing SimCity 2000 in NATIVE 1920x1080 with 32bit color depth!!!

040FPGA3.jpg

The card is overall very snappy, moving windows around definitely faster than with the NubusFPGA. Next step (after a necessary recap of my PSU) will be some overclocking experiments to my machine :)

The Quest for the holy grail of ultimate 68k machines lives on!
 

Attachments

  • 040FPGA2.jpg
    040FPGA2.jpg
    2.1 MB · Views: 69

Jockelill

Well-known member
Very cool! How difficult was it to build and is there any special equipment needed?

TLDR: Difficult, expensive and with some major caveats!

The longer story:
The carrier card itself, is very straightforward, solder on headers, solder on 040 connector, install FPGA-card, done. BUT, and a big such, programming the FPGA is a little bit difficult (you need to use either an Intel Mac with terminal or a Linux machine, and general knowledge of terminal commands, java compilation, firmware flashing etc). Reason why it will not work on an ARM - Mac is because the USB4JAVA library used by ZTEX is not supported on ARM-MacOS! Took me some time to fiugre out just that :). Getting the 040 PDS connector is somewhat of a challenge, I've only found them at one place with an MOQ of 1, https://www.questcomp.com/part/4/p50-140s-rr1-tg/301253758 but they are 16$/pcs.

The 040 connector has a fairly tight pitch of 1.27mm and three overlapping rows (160pins in total), so it's quite a lot to solder and you need a thin tip.

The major drawback is also the overall cost, just the FPGA is about 250$, connector about 16€ and carrier card about 60-70$ depending on where you get it, so you're looking at something like 350-400$ just to build it (with freight costs and stuff). The FPGA can be reused for all versions, but must be reflashed for every carrier card since the FPGA bitstream differs.

Another drawback would be that only 1920x1080 is supported as a "true resolution", for every other resolution it will be windowed with an increasingly larger black box around it. I'd say down to 1280x1024 is still very much useful, but the classic 640x480 will be very small even on a 27"-32" screen and not very playable for old games. I know @Melkhior has looked very exensively at this, and there might be a fix for it in the future (FPGA Experts where are you??!). On the bright side there are surprisingly many games and programs that actually do support the full resolution. Both Simcity 2000 and Civilisation II will be full screen!! And also Excel, Word, Photoshop etc

I'd say for now it's very much in a beta phase, but to the point where it still offers a lot of usefulness!! I'm running my Q650 and IIci with FPGAs since quite some time here, I started with the NubusFPGA in my IIci (also from @Melkhior) and now switched to the QuadraFPGA for my Q650. I also have the IIsiFPGA which I will explore some time after the holidays.

Another issue is that the "low-cost" 2.13a model is no longer available. Ztex replaced it with a 2.12 version, but @Melkhior had some serious issues with it, so it's a big no go for now (we are wiating for feedback from Ztex). The carrier will work with all other models of the 2.13, so you can also use the 2.13b, but the price is higher at 229€:

If you still want to build one, feel free to reach out to me on a PM, and I can share all my experience. JLC has an MOQ of 5 boards (whereof 2 must be mounted), but there might be others interested to share the cost.

Compatible machines should be all machines with 040 PDS, so Q650, Q700, Q800, Q900 and Q950. The Q840AV does not have a PDS, so here you can only use the slower NubusFPGA. Q800/650 are very good candidates, or the Q950 since all of them can be safely overclocked to 40Mhz and beyond (@eharmon can tell you all about overclocking :D )

@Melkhior has done some other really cool stuff like overmapping the onboard DDR3 memory from the FPGA with the machines RAM, so you can increase your Q650 all the way up to 776MB of RAM (with an hacked ROM to support 128MB Simm modules).

If you "just" want a new nice graphics card to get HDMI from your vintage Mac, I'd honestly recommend getting something like a scaler instead, safer, easier and works "out of the box" (well, sort of...), but IF you like to hack, tweak, tinker, self inflicted pain gets you going and you get high on the sensation of partial successes no "normal people" would ever grasp, well, then your're definitely in for a real journey!!
 

Melkhior

Well-known member
Very cool! How difficult was it to build and is there any special equipment needed?
The carrier board itself was completely assembled by JLCPCB, other than the through-holes connectors (two 2x32 2,54mm pitch header, and the 140-pins KEL connector). Then you manually solder the 268 TH pins, that's the only manual step. @Jockelill had an extra step as in the first batch, there a 10k pull-down that's too weak (there's a stronger pull-up on the motherboard) so must be bypassed into a 0 ohm. The current Git should have the proper value for that resistor.

However, the issues to make one are:

(a) the KEL connector (here they are actually Robinson-Nugent compatible parts, now made by 3M) are very difficult to source. @Jockelill got a few from an american obsolete part dealer, which if you order from Europe has prohibitive shipping costs. New ones from 3M gave a MOQ of 200 and a list price of about $15 each... If @Jockelill hadn't sent me a couple connectors the design wouldn't exist.

(b) the required FPGA board (ZTex 2.13a) is no longer manufactured, as of last summer (July 2023). There's a replacement (2.12b) and some higher-end alternatives (2.13b, 2.13c, ...), but so far there's no definitive "this will work" solution. And they are about 30-35% more expensive than when I started using them :-(

All other components are in current production and widely available. The level shifters (CB3T family) are the majority of the cost.

Like the others (NuBusFPGA, IIsiFPGA-, it's really experimental stuff.
 

Melkhior

Well-known member
Another drawback would be that only 1920x1080 is supported as a "true resolution"
Technically, the bitstream can be generated for other resolutions to match the display (1920x1080 is the maximum the FPGA can handle and so common on cheap HDMI LCDs it's a good default), but that resolution becomes the 'upper bound' and all lower resolutions are as you described window-boxed.

Changing the display resolution 'for real' requires live reconfiguration of a PLL/MCMM (the bit of hardware responsible for generating the clocks at the appropriate frequency from a reference clock), and that is much more difficult, though theoretically possible.

@Melkhior has done some other really cool stuff like overmapping the onboard DDR3 memory from the FPGA with the machines RAM, so you can increase your Q650 all the way up to 776MB of RAM (with an hacked ROM to support 128MB Simm modules).
Not yet! That works in the IIsi, but it's not supported on the Quadra. The hardware works, but the ROM needs work - and memory setup is a lot more complex on the Quadra than on the IIsi :-( But I do have a Q650 with a ROM slot now ;-), so it's on the TODO list.
 

demik

Well-known member
Another drawback would be that only 1920x1080 is supported as a "true resolution", for every other resolution it will be windowed with an increasingly larger black box around it. I'd say down to 1280x1024 is still very much useful, but the classic 640x480 will be very small even on a 27"-32" screen and not very playable for old games. I know @Melkhior has looked very exensively at this, and there might be a fix for it in the future (FPGA Experts where are you??!).
I would argue that if @Melkhior think it's difficult, then it's really difficult
Changing the display resolution 'for real' requires live reconfiguration of a PLL/MCMM (the bit of hardware responsible for generating the clocks at the appropriate frequency from a reference clock), and that is much more difficult, though theoretically possible.
From my limited tinkering / brainstorming with this, pixel multiple resolution doesn't require a reconfiguration of PLL and friends, but a different framebuffer memory reading algorithm.

Unless I'm missing something, in this specific case the only thing that would work is qHD (960x540).
smaller is useless. It's sort of doable with bitshifting address lines (ignore LSB) or similar, which should give an 2x upscaled 540p

nHD (640x360) is also kinda doable but more complex already.
Anything other than integer scaling will probably require DSPs and will only provide blurry output anyway…

Other stuff would give blacked vertical or horizontal lines, which is what MacOS is doing anyway.
 

Jockelill

Well-known member
I would argue that if @Melkhior think it's difficult, then it's really difficult

From my limited tinkering / brainstorming with this, pixel multiple resolution doesn't require a reconfiguration of PLL and friends, but a different framebuffer memory reading algorithm.

Unless I'm missing something, in this specific case the only thing that would work is qHD (960x540).
smaller is useless. It's sort of doable with bitshifting address lines (ignore LSB) or similar, which should give an 2x upscaled 540p

nHD (640x360) is also kinda doable but more complex already.
Anything other than integer scaling will probably require DSPs and will only provide blurry output anyway…

Other stuff would give blacked vertical or horizontal lines, which is what MacOS is doing anyway.
qHD would be nice! That in combination with window boxed 640x480 would be extremely useful for playing old adventure games :)

And I would 100% agree with your argumentation! :D :D
 
Top