• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Development of Nubus graphics card outputting to HDMI?

Trash80toHP_Mini

NIGHT STALKER
Love that diagram! A good diagram is worth a thousand pictures. :approve:

Went back to the basics of documentation assembly last night. Developing For The Macintosh NuBus-CERN-CM-P00062891  lists Inside Macintosh Volume V as indispensable, so I went ahead and bought a softcover copy when I couldn't search it out in .PDF form. Took a good look at it last night and I can see why it's on that list,. From a quick skim, the chapter on Slot Manager appears chock full of nuggets we'll be needing to flesh out the incomplete design example in DCaDftMF. Finding an expanded article on Slot Manager in the Developer CD series would be a gold mine.

 

Gorgonops

Moderator
Staff member
just programming your display memory address generation to do the necessary crazy-stride through RAM on the output side
... it actually just occurred to me an alternate 4th-dimensional way to handle this that wouldn't change the scan on the *output* side would be to do address translation on the input side. IE, you'll need to have a read/write buffer on the Mac side anyway (even if you did 16 bit PDS instead of Nubus you're still going to have the SDRAM controller and the video arbitration in the way), so to handle this you could implement address translation circuitry so each read/write to display memory is remapped to the correct location in the framebuffer.

The advantage of this approach is it keeps the output RAM bandwidth requirements from changing, they stay the same for every mode whether it's portrait or landscape. But it has some pretty crazy side effects on the Mac side. The worst case I can come up with is actually 1-bit B&W mode; for every 32 bit word (32 pixels) written/read by the Mac you have to read/update a single bit out of 32 different memory locations. The read part also applies on the outside, but that shouldn't be a problem if you build the hardware to handle true color because in that mode you'll have to read two 16 bit words for every pixel anyway(*); 1 bit color rotated "only" requires the same number of reads as 16 bit color, IE, a word per pixel. But if you do it on the inside it would add a lot of latency for every read/write operation. Of course, the Mac side is going to be so slow compared to what the hardware needs to be able to handle for the video output would it necessarily be a problem?

(*) Note what I said here means I was probably wrong when I said this:

(if you have 16 bit wide RAM like the FPGA board has then 32 bit color in portrait needs twice as many reads per pixel as true color landscape.)
Not sure where that came from, either way you're still reading two contiguous 16 bit words for each pixel, I think I must have confuzzled that with the fact that the transaction bandwidth requirements *do* go up for every mode less than 16 bit color compared to landscape, because for all of them you'll need to do one fetch per pixel and extract the bits you need, verses getting multiple pixels for each read. The one gotchya I wonder about is since we're dealing with SDRAM I wonder if it's faster reading from linear addresses verses random ones? (My eyes glaze over trying to remember whatever I try to remember whatever I ever knew about "page mode" and such.)

 
Last edited by a moderator:

Trash80toHP_Mini

NIGHT STALKER
Why aren't you doing the pixel mombo within the FPGA between input and output sides, or are you? I wouldn't think that could incur any noticeable impact on the process in terms of speed penalty? 

 

Gorgonops

Moderator
Staff member
Why aren't you doing the pixel mombo within the FPGA between input and output sides, or are you? I wouldn't think that could incur any noticeable impact on the process in terms of speed penalty? 
Doing the mombo within the FPGA is exactly what I was talking about; my musing is whether it makes more sense to do it on the "inside" (IE, read/write transactions from the Mac get remapped so though the framebuffer continues to look linear to the Mac it's actually drastically non-linear in memory) or on the "outside" (the algorithm for generating RAM addresses for pixel reads hops all over the framebuffer instead of just reading linear memory addresses and shoving them to the output.).

All you really need to do either is some fast hardware multipliers to essentially do a "matrix translation". The advantage of doing the mapping on the inside is you only need to use the multipliers when the Mac read/writes to the framebuffer, so in theory at least if it's "slow" it might be easier to absorb. (Do the math and you'll see how staggeringly high the RAM refresh bandwidth is for a megapixel screen; in True Color it's 240 megabytes a second. No Mac under discussion here, even a Quadra 950 with PDS, can touch that sort of speed across its bus. So, again, *if* the translation is expensive it might make sense to not have to do it on the output side. My guess is, though, that the FPGA can probably deal with it either way.)

 
Last edited by a moderator:

dlv

Active member
Doing the mombo within the FPGA is exactly what I was talking about; my musing is whether it makes more sense to do it on the "inside" (IE, read/write transactions from the Mac get remapped so though the framebuffer continues to look linear to the Mac it's actually drastically non-linear in memory) or on the "outside" (the algorithm for generating RAM addresses for pixel reads hops all over the framebuffer instead of just reading linear memory addresses and shoving them to the output.)
My guess is since the FPGA is going to spend the majority of its time reading from the framebuffer and pushing pixels, that needs to be really fast or at least take priority. One way to make it fast is to use techniques like burst reads from RAM, which we may not be able to take advantage of if we're constantly skipping across memory. Or at least not without a lot of added complexity. There are likely other timing considerations e.g., DRAM bank refreshes.

All you really need to do either is some fast hardware multipliers to essentially do a "matrix translation". The advantage of doing the mapping on the inside is you only need to use the multipliers when the Mac read/writes to the framebuffer, so in theory at least if it's "slow" it might be easier to absorb.
Right. I think with the introduction of FIFO buffers, the matrix transform might even be effectively transparent.

(Do the math and you'll see how staggeringly high the RAM refresh bandwidth is for a megapixel screen; in True Color it's 240 megabytes a second. No Mac under discussion here, even a Quadra 950 with PDS, can touch that sort of speed across its bus. So, again, *if* the translation is expensive it might make sense to not have to do it on the output side. My guess is, though, that the FPGA can probably deal with it either way.)
Can you elaborate this? Where are you getting that number from (240MB/sec)?

 

Gorgonops

Moderator
Staff member
Can you elaborate this? Where are you getting that number from (240MB/sec)?
Assuming a "megapixel" display (1024x1024... which of course is a weird square shape, for a real world resolution think, I dunno, 1280x800 or 1152x870, those are both about a million pixels) in 32 bits (standard Apple display cards use 32 bits for 24 bit color because it's a lot easier to have a word-per-pixel instead of ending of the alternative of having 1.33 pixels per word, doing that means ugly math) that's four times a million, aka, 4MB, that has to be read for every frame. Assuming 60 frames per second that's where we get 240MB/s. (The real streaming speed is actually a little higher because there's some blanking time on the borders and between frames, so that 4MB per frame is actually going to need to be supplied in a bit under 1/60th of a second.)

It is kind of a scary number, isn't it? 1920x1080 is almost exactly two megapixels, @32 bit that's half a gigabyte per second. (!)

This is why I started thinking that if you wanted to do some sort of transform then, yeah, you give yourself a lot more breathing room if you do it on the Mac side. Nubus is *theoretically* capable of bursting up to 40MB/s but that's never going to happen in a Macintosh. Even if you were on PDS in a 68040 Mac doing a block transfer from RAM I suspect you'd have problems sustaining numbers still substantially south of 100MB/s. That's a guess but a semi-educated one.

(* edit: Some HDMI displays might be happy taking 30FPS, but I'm not sure I'd rely on it for a computer display?)

 
Last edited by a moderator:

Gorgonops

Moderator
Staff member
Even if you were on PDS in a 68040 Mac doing a block transfer from RAM I suspect you'd have problems sustaining numbers still substantially south of 100MB/s. That's a guess but a semi-educated one.
Here's a link to a Motorola 68040 manual:

http://www.bitsavers.org/components/motorola/68000/MC68040_Designers_Handbook_1990.pdf

Chapter 9 has memory access time calculations. They give me a headache, but my rough interpretation is that the normal transfer modes of a 68040 would allow a 33mhz 68040 to transfer to and from onboard cache at around... 66MB/s?, while burst mode would peak around 106MB/s? (The manual says this would require 20ns-ish SRAM, which is obviously much faster than the 60ns FPM you have for main memory in a Quadra.) So, yeah, totally, bus speed is piddlingly slow by comparison of what needs to stream out the HDMI port.

Color me more and more impressed at what the video systems inside $5 Raspberry Pi Zeros are capable of doing with *shared memory*. Jeeze.

 

dlv

Active member
It is kind of a scary number, isn't it?
A little bit...

After a quick Google search, it appears the Spartan 6 doesn't quite support 1080p at 60Hz. And in fact, I see now the VA2000 lists 1080p @ 30Hz support as experimental. This is not a show stopper, of course. There is still plenty to learn. 

It appears Lukas is developing a successor to the VA2000 called the ZZ9000, which so far appears to be a carrier board for the MYiR Xilinx Zynq-7020 FPGA development board, with a lot of interesting features including full 1080p support, but release of schematics and sources is pending. This is one of the dev boards I was originally considering but my desire to stay close to the VA2000 project, plus its significantly higher price ($120) led me away from it.

 
Last edited by a moderator:

Gorgonops

Moderator
Staff member
Yeah.

Anyway, forget all the crazy stuff for now, definitely. I'd say pick a nice, easy target that gives you plenty of headroom slop like 480p in a few color depths (say 1 bit, 256 color indexed, and high/true color) and concentrate on the problems that *need* to be solved before creating new ones. ;)

 

Trash80toHP_Mini

NIGHT STALKER
Anyway, forget all the crazy stuff for now, definitely.
Very sensible. How about adding 8-bit grayscale to your proposed spec? That would that be a baby step toward the CLUT ridden mess of 8-bit gaming, no? Out of curiosity, at what point did games begin to require a color monitor?

 

Gorgonops

Moderator
Staff member
I *think* grayscale on the Mac is just implemented by loading the CLUT table with all gray values, there isn't a direct gray only mode where the pixel value is just directly fed into the DAC.

 

Gorgonops

Moderator
Staff member
... or, maybe more precisely, I think loading the CLUT with grays is how cards like the Toby do it. It's certainly possible there exists "direct-map" gray cards, I haven't remotely digested the information in the Apple docs sufficiently to be confident in saying you *have* to have a CLUT for grayscale. (It does make some sense to have one, though, if you're supporting 16 or fewer grays and want to be able to tweak the distribution for anti-aliasing.)

 

Trash80toHP_Mini

NIGHT STALKER
LOL! Just posted another of my morning musings in a new hacks thread, definitely inspired by you, dlv. I think such may be built right into QuickDraw. If it works on non-color Macs I'd think it might apply sans CLUT? Dunno, not awake enough to research further.

http://lowendmac.com/2013/scuzzygraph-and-scuzzygraph-ii/

CLUT might be built in the box though, but worth mentioning? Eight colors/grays ought not need CLUT?

 
Last edited by a moderator:

bigmessowires

Well-known member
Cool idea, I'll be following this project! 

I'm a firm believer that you need to crawl before you can walk before you can run. I like the idea of targeting the simplest design possible first, just to get SOMETHING working, even if it's slow or limited. You can always make improvements later. If the initial goals are too lofty, it's easy to get bogged down in complexity, get discouraged, and give up without anything to show for it. At least that's what happens with half my projects. :)  

 

Gorgonops

Moderator
Staff member
I think such may be built right into QuickDraw. If it works on non-color Macs I'd think it might apply sans CLUT? Dunno, not awake enough to research further.
The ScuzzyGraph is not a standard graphics card by *any* measure. Most to the point, it's not compatible with Color Quickdraw. As has been discussed in previous threads it's basically more correct to think of that device as a kind of "live print preview" than a normal display. (The "8 colors" it supports are the named colors that B&W Quickdraw had knowledge of in order to print to an ImageWriter printer with a color ribbon.) In the context of designing a Nubus (or PDS card for anything other than *possibly* an SE or Portable) it's the very definition of non sequitur. That said, sure, it probably does *not* have a CLUT because, well, the only colors it supports are those 8 fixed ones, making a CLUT kind of unnecessary.

Now with that all out of the way: Obviously any Nubus card that supports a color monitor is going to have a CLUT. Unless you can think of a card that *only* supports 16 or 24 bit color modes, no possibility whatsoever to fall back to 256 or less, I'm going to feel comfortable standing by that. The "Gray Area" I don't know about is the few specifically-targeted-at-grayscale-only-monitors cards like the "Macintosh II Portrait Video Card" or the "Macintosh Two-Page Monochrome Video Card". Both of these cards support 1, 2, and 4 bit grayscale *only* on a single type of fixed-frequency monitor. Obviously they're not going to have a "CLUT" per-se because they only have a single output channel (and therefore no "C"), what I don't know is if there's still a knob to populate a "LUT" in the path between the ram output and the DAC in order to allow the grayscale palette be tweaked. (This was a thing that at least some systems with only a "few" grays, say 4 bit, did. The reason is that the human eye has a non-linear response curve to brightness that makes it easier to distinguish between light shades of gray than dark ones. Therefore if you're doing anti-aliasing it can be useful to adjust the palette appropriately.) Maybe Apple didn't include that refinement, in which case, sure, you won't need a routine in your driver to load a CLUT or anything like, I guess.

That said I'm not sure there's a lot of point here . A 1-bit mono mode also lacks a CLUT so shouldn't that be adequate to accomplish the "Hey, I made a CLUT-less card" milestone? Also note that when I called out the missing CLUT code in the manual it wasn't because I was thinking that loading a CLUT table in and of itself will be an impossible thing to accomplish, I was just generally commenting on disappointment that more concrete examples on how to communicate with the hardware registers *period* weren't in the manual.

 

Trash80toHP_Mini

NIGHT STALKER
The ScuzzyGraph is not a standard graphics card by *any* measure. Most to the point, it's not compatible with Color Quickdraw.
Either I didn't get the notion across or my head hadn't been de-muddled enough yet to even try. I think I was leading up to the application of native QuickDraw routines the ScuzzyGraph made us of, not using anything about the ScuzzyGraph hardware. More making a suggestion to ransack its drivers for hidden treasure or at least follow their lead a bit in research? Color QuickDraw would be an extended version of QuickDraw and should(?) still support those rudimentary grayscale/color routines spun into its makeup in support of the printers of yore, no?

Implementing CLUT free grayscale of QuickDraw alongside B&W in the first run might be worth exploring?

 

Gorgonops

Moderator
Staff member
Color QuickDraw would be an extended version of QuickDraw and should(?) still support those rudimentary grayscale/color routines spun into its makeup in support of the printers of yore, no?
I have no doubt that Color Quickdraw still has backwards compatibility hooks to that 8-color syntax to avoid breaking anything, but... *sigh*

This is a thing that's been talked to death several times before. First off, there *is* no grayscale there; remember you kept thinking for the longest time that the Macintosh SE somehow was secretly grayscale because of some confuddlement with the fact that this 8-color thing existed in QuickDraw and therefore that somehow meant that if you could turn the right screw grayscale would come out of an SE despite all evidence to the contrary? This link:

http://mirror.informatimago.com/next/developer.apple.com/documentation/mac/QuickDraw/QuickDraw-59.html#MARKER-9-65

Explains the "colors" in original Quickdraw. (And, yes, they still exist as a backwards compatible construct in Color Quickdraw.) Here are the colors it supports:

The basic QuickDraw color values consist of 1 bit for normal black-and-white drawing (black on white), 1 bit for inverted black-and-white drawing (white on black), 3 bits for the additive primary colors (red, green, blue) used in video display, and 4 bits for the subtractive primary colors (cyan, magenta, yellow, black) used in printing. QuickDraw includes a set of predefined constants for those standard colors:

CONST
ÝwhiteColor =Ý30;
ÝblackColor = 33
ÝyellowColor = 69;
magentaColor =Ý137;
ÝredColor =Ý205;
ÝcyanColor =Ý273;
ÝgreenColor =Ý341;
ÝblueColor =Ý409;



These are the only colors available in basic QuickDraw (or with Color QuickDraw drawing into a basic graphics port). When you specify these colors on a Macintosh computer with Color QuickDraw, Color QuickDraw draws these colors if the user has set the screen to a color mode.
Notice, explicitly, that there is no "grayscale" here. Period. Essentially this syntax was designed to give a programmer the syntax to represent color as a sort of spot-color separation which on a B&W macintosh would be dithered but, yes, when real color Macintoshes came along would magically show up in color. (Yes, it was actually possible to write software that would appear in color on the Macintosh II before the Macintosh II ever existed.) And the fact that the ScuzzyGraph actually worked with at least some software indicates that the use of  "basic" graphics ports for drawing screen entities continued for at least a while, no doubt because:

There are three advantages to using basic QuickDraw's color system:

  • It is available across all platforms, so you don't have to check for the presence of Color QuickDraw.
  • It is much simpler to use than Color QuickDraw.
  • It works well on an ImageWriter printer with a color ribbon.


But note, seriously, the word "Rudimentary" is used multiple times in this chapter to describe its level of functionality. This is not "real" color and it is not grayscale at all.

I simply do not know if it's possible to craft the necessary declaration ROM resource structures to indicate that you've plugged into a Color Quickdraw-equipped Macintosh a video card that's only capable of displaying "basic" Quickdraw objects. I will bet a shiny vintage nickel that no card like this ever existed. Also the Scuzzygraph distinctly was *not* a Nubus card and did *not* have a Declaration ROM so... seriously, I'm having a really hard time understanding why you think it would be relevant here.

 
Last edited by a moderator:

bigmessowires

Well-known member
Some questions from the uninitiated - what is the basic theory of operation for an unaccelerated NuBus video card? Does the Macintosh ROM already have routines that do pixel-by-pixel line and polygon and text drawing, and it just writes those pixels into a frame buffer whose address is mapped to the NuBus card's address space? NuBus is different from a processor direct slot or hardware mapped directly into memory space, so I guess there must be more to it than that.

Once you have a frame buffer in memory on the card, outputting a VGA signal from that is pretty easy and there are a million examples. I don't know anything about HDMI, but I'm guessing that's also a solved problem and it's not anything Mac specific. So basically the challenge is to build a card that sits on the NuBus and speaks NuBus protocol and that presents a region of memory to the CPU to be used as a frame buffer? What kind of device driver software is needed?

Is it certain that an FPGA is needed here at all, or could a fast enough microcontroller do the job? I see that NuBus is only 10MHz while inexpensive microcontollers can run at 120MHz+, so it might be possible to handle bus traffic in software. Bbraun did this for his Macintosh SE video card, which is some flavor of STM32 microcontroller connected to the SE PDS, no FPGA needed. 

 

Gorgonops

Moderator
Staff member
what is the basic theory of operation for an unaccelerated NuBus video card? Does the Macintosh ROM already have routines that do pixel-by-pixel line and polygon and text drawing, and it just writes those pixels into a frame buffer whose address is mapped to the NuBus card's address space?
Yes. This was chewed over some before, but the basics are that to Quickdraw all framebuffers are linear, with packed-pixel chunky (not planar) color space, and essentially all you do is point Quickdraw to where it lives and what its dimensions are and it goes to town.

NuBus is different from a processor direct slot or hardware mapped directly into memory space, so I guess there must be more to it than that.
Not really. NuBus is still just a bus, memory attached to NuBus is mapped to specific locations in the CPU's memory map, and the CPU has direct access to it. Nubus has arbitration and all that but it's invisible to the CPU, the Nubus chipset handles the overhead and just tells the CPU that it's dealing with non-synchronously-clocked memory and has to suck up wait states as it takes its own sweet time. Logically speaking a properly designed PDS and Nubus video card can't be easily told apart so far as Quickdraw is concerned.

Strictly speaking it does get more complicated than that because where exactly the screen buffer goes depends on whether the Mac is in 24 or 32 bit addressing mode, etc, but *generally* to a Mac a video card is just RAM.

s it certain that an FPGA is needed here at all, or could a fast enough microcontroller do the job? I see that NuBus is only 10MHz while inexpensive microcontollers can run at 120MHz+, so it might be possible to handle bus traffic in software. Bbraun did this for his Macintosh SE video card, which is some flavor of STM32 microcontroller connected to the SE PDS, no FPGA needed. 
I wouldn't strictly rule it out, but I don't think it's a particularly great approach either. BBraun's card acts as a "snooper" looking for updates to the block in the SE's main memory where the framebuffer is stored and copies updates on its own sweet time, it doesn't have to respond in real time to any bus arbiter. NuBus isn't clocked that fast but some of the things it would have to respond to are *very* time critical. I've seen several failed attempts to map a microcontroller directly onto CPU busses as slow as a 6502 and trying to write software that can flawlessly stay synchronous with it. It's a tall order.

 

Gorgonops

Moderator
Staff member
copies updates on its own sweet time,
Actually, I want to add an asterisk to that because even though I read the theory of operations of that device not that long ago I'm having trouble remembering if the copy operation could be "delayed" or if it had to catch writes as they happened. Still, the general point remains that it's "shadowing" the actual framebuffer, not acting as it, which means:

1: it only cares about writes. If the Mac wants to read the state of a pixel it gets it from the real RAM, which also means:

2: if it misses a transaction that's just a transient artifact in its capture output. If it were the only copy that would be a fatal error.

I really think people are worrying too hard about NuBus. There's a sample implementation in the manual and all the scary parts fit in a small handful of 74xx parts and PALs. (And the PAL equations are there.) Famous last words, but how hard can it be? (And conversely, why would a software implementation be easier?)

 
Last edited by a moderator:
Top