• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Development of Nubus graphics card outputting to HDMI?

dlv

Active member
Doh. Obviously I meant to include that the part I'm referring to starts on page 332.
Thanks. That read more like marketing for the 8*24 GC but was still interesting. What I was looking for is Imaging With QuickDraw (PDF), which I highly recommend since it has answered a lot of questions already. 

I still imagine it'd be a very tall order to implement a QC accelerator without some source code to reference. I am curious how other vendors that made accelerated cards pulled it off; the fact that they did does imply there's some kind of reference implementation? 
A quick search revealed Apple donated the source code of the 68000 QuickDraw implementation to the Computer History Museum. It's a fine software-only reference implementation (although may not implement Color QuickDraw) - and might even be able to be used largely as-is if we had a 68k processor or FPGA core available to us - but that's only part of the challenge. What's missing w.r.t. accelerated graphics cards is the interface between an accelerated QuickDraw implementation and the hardware, and how an accelerated QuickDraw is enabled and used. Also, how QuickDraw operations are queued for execution. That's likely to be highly implementation-specific, so I don't expect to find documentation. Reverse-engineering the 8*24 GC may be necessary (but would require in-depth knowledge of several things, including the Am29000 processor).

 
Last edited by a moderator:

Gorgonops

Moderator
Staff member
Thanks. That read more like marketing for the 8*24 GC but was still interesting.
Yeah, I know it's mostly fluff, but it's the closest thing to a description of how the 8*24 at least *did* things like have the capability to redirect gworld buffers onto local memory on the card, etc. Certainly isn't anywhere close enough to actually write an implementation.

What I was looking for is Imaging With QuickDraw (PDF), which I highly recommend since it has answered a lot of questions already. 
I hadn't seen that before, I've saved a copy for later.

The one comment I'd have about it so far that's discouraging is searching for variations on the word "acceler" produce a tiny number of hits which mostly refer to a flag you can set that force-prevents your offscreen gworld from having the option of being loaded into local memory on an accelerated card.
 

A quick search revealed Apple donated the source code of the 68000 QuickDraw implementation to the Computer History Museum. It's a fine software-only reference implementation (although may not implement Color QuickDraw) - and might even be able to be used largely as-is if we had a 68k processor or FPGA core available to us
Alas I really can't imagine it being much use, at least without a lot of help. The documentation really seems to go to great length to stress that Color Quickdraw incorporates a whole raft of concepts not in the original "Basic" Quickdraw, which is the 68000 version. Maybe that's pessimistic; if the API documentation is good enough maybe it's doable...

What's missing w.r.t. accelerated graphics cards is the interface between an accelerated QuickDraw implementation and the hardware, and how an accelerated QuickDraw is enabled and used. Also, how QuickDraw operations are queued for execution.
Yes, that's the massive mystery, and I have no idea where the answers to that are. As noted, there were a number of third parties selling cards with what their advertising literature would call "Standard Quickdraw Acceleration", and I don't think that all of them used AMD 29000s, which would mean they're not just running licensed clones of the 8*24 code. Therefore it seems it *must* follow that there has to be some documentation out there for how to grab the necessary hooks.

One thing I wonder vaguely about is if there might be another form of "QuickDraw Acceleration" that's implemented in the form of an extension that leverages dumber fixed-feature graphics card acceleration features like block transfers and basic line drawing/fill primitives but *doesn't* depend on having a full QuickDraw implementation that understands GWorlds? No idea.

 

Gorgonops

Moderator
Staff member
So that's one mystery solved, that at least on Mac video cards that support both color and mono monitors grayscale modes use the CLUT to map shades of gray, not some alternate "direct DAC" path.
On the flip side, that "Imaging with QuickDraw" PDF does mention at least a few cases (it specifically mentioned grayscale PowerBooks) in which Grayscale *is* implemented in the form of a dumb directly-mapped DAC, no CLUT register, so technically if you wanted to do just a Grayscale card it's a valid config.

I was doing some more leafing through the "Designing..." card the other night just reading the video driver chapter in a little more depth and it mentioned that technically hardware gamma control registers are another thing you *could* put into a card. If I understood the gamma discussion correctly, however, it looks like most Apple cards do gamma correction for Indexed modes by modifying the color values they write to the CLUT. (IE, if your palette says X-Y-Z and your gamma correction is Q what actually gets written to the hardware CLUT is Xq-Y-q-Zq where lowercase "q" is the value of the gamma curve Q for the unadjusted brightness of XYZ. Or something vaguely not like that at all.)

 

trag

Well-known member
Perhaps this is already obvious to all those involved, and if so, I apologize for being the nerd jumping in unneeded...

Re Quickdraw Acceleration:

Many (most?) ROM/Toolbox routines are called as unimplemented instructions.    The 68K architecture has a feature where there's a whole slew of unused op codes.   If your software makes a call to one of those opcodes, it triggers a routine in the CPU much like an interrupt handler, where the program counter jumps to a location set in a vector file.

So a bunch (all?) of the (Color) Quickdraw routines are called using these unimplemented instruction codes.  The CPU handles those and goes to grab a program counter vector off of the address/vector table, and the program counter vector/address has been set up to point to the corresponding Quickdraw Routine in the ROM.

In order to implement Quickdraw acceleration, or acceleration of any other routing contained in the ROM and called with this method, one simply loads a driver/extension at boot time that modifies the address/vector table.   Specifically, go to that table and substitute the address for your replacement routine for the address in ROM of the stock routine.

In the case of Quickdraw acceleration, this would probably take the form of a routine that sends a code/instruction and any necessary data to the video card, and the video card has logic that, for example, handles rotating the entire image 90 degrees in the memory of the video card, rather than having to read it all out the CPU memory, operate on it and write it all back to the video card.

So you shouldn't really need special guidance, although it would be nice.     It should be enough to identify the (Color) Quickdraw routines that are available.   Decide which lend themselves to hardware acceleration.   Learn enough about how they are normally called (any associated data structures/arguments, etc.) and then write your own subsitute routine that sends the requisite data to the video card hardware and add logic on the video card to do the processing.

I'm pretty sure the Quickdraw routines are documented well enough to provide the necessary information.  An example to copy or reverse engineer, would contain what?    Maybe a clear list of which routines are worth accelerating?  

 

Gorgonops

Moderator
Staff member
I'm pretty sure the Quickdraw routines are documented well enough to provide the necessary information.  An example to copy or reverse engineer, would contain what?    Maybe a clear list of which routines are worth accelerating?
There's a section in that Quickdraw manual starting at page 3-129 titled "Customizing QuickDraw Operations" that based on a reference earlier in the chapter might be a clue as to what operations Apple considered off-load-able? (I get the feeling that certain parts of QuickDraw aren't atomic enough to override sucessfully, but, yeah, I have no idea.) It's actually not that long of a list, so... maybe it is hypothetically doable?

I hate to drag up this terrible canard because I think the idea of slapping a Raspberry Pi-like single board on everything is totally overused, but... just hypothetically speaking, considering that realistically you're probably not going to be able to push more than, I dunno, 10MB/s through NuBus, if it might actually be realistic to consider an architecture for an accelerated card that consists of a CPLD that implements the NuBus logic, a few 32 bit data and address buffers, and a very fast 8 or 16 bit multiplexed bus that goes to a "Pi-like board" (perhaps something like the BeagleBone, which contains some realtime I/O co-processors called PRUs) that handles video output with its dedicated GPU hardware and makes available a ton of CPU cycles to do... whatever. I imagine there would be substantial latency for bus transactions like individual byte/word reads from the framebuffer , but "substantial" might not be that significant in the grand scheme of things.

Anyway, that's a dumb idea, forget I said it.

 

trag

Well-known member
Anyway, that's a dumb idea, forget I said it.


Actually, it's kind of an interesting idea.   I don't like it, because I too abhor the fad of wanting to glue a Raspberry Pi on everything.  Also, I just think programming things in hardware language is more elegant.  But then, I'd rather program in assembly than C, and I do everything in my power to avoid going higher level than C.    When one programs in assembly, one controls the result;  programming in anything higher level than assembly is just making suggestions.   Building hardware logic on an FPGA is even better....

But, my emotional shortcomings aside, the Pi already has all that logic and stuff for driving a display on board and its cheap.   Finding a way to feed it a Frame Buffer from the Mac and make it come out of its already built video port is a very tempting morsel if the main desire is to build a working video card on least effort.

Re: latency.  I don't know what speed the Pis typically run at, but with the host machines running at 16 - 40 MHz, and more relevantly, the NuBus running at 10 MHz, one can fit an awful lot of Pi cycles into every host cycle.

On the other hand, didn't dlv write earlier that one of his goals for this project is to learn to use a hardware description language?

 
Last edited by a moderator:

Gorgonops

Moderator
Staff member
On the other hand, didn't dlv write earlier that one of his goals for this project is to learn to use a hardware description language?
Yeah, and really, I think it's a good approach for making a basic card. It's a great way to learn the nitty gritty of how framebuffers actually work, I think a "hardware" implementation of NuBus handshaking is likely to be more successful than trying to entirely bit-bang it, there's a lot of other potentially cool projects that having a working programmable logic implementation of NuBus could enable, etc...

The "fast SBC grafted to the bus" idea was just an idea if the project really did move to trying to do acceleration, QuickDraw or otherwise since, yeah, the speed disparity is so huge that *particularly* if much of the bus handling were offloaded you might be able to make it almost as fast as a full hardware framebuffer. But, well, the fact remains that if you've built the "dumb" framebuffer in an FPGA first then the bus logic at least becomes an already solved problem. Which would be great.

 

modulusshift

Active member
I kinda figured that any card that'd come close to modern resolutions would need acceleration. I was hoping that this discussion would decide on 030 PDS because, selfishly, I've got an SE/30 myself, and also, the increased bandwidth might make higher resolutions more possible. But honestly, the more time the CPU is sending data to the PDS the less time it has for other calculations, so it'd just slow down the rest of the computer even more than a fully saturated Nubus. Gotta keep your resources in mind, I guess. No wonder Apple seems to heavily imply that even PDS cards should be talking pseudo-Nubus.

I really hope I can keep up with you all enough to contribute in some way. I'm not an experienced programmer by any means, but I'm definitely going to be keeping a lookout for opportunities to help out as this progresses. Bare minimum I'm gonna be sanity checking any code for more obvious bugs when that starts coming along, if I can't do anything else.

 

Trash80toHP_Mini

NIGHT STALKER
I don't like it, because I too abhor the fad of wanting to glue a Raspberry Pi on everything.  Also, I just think programming things in hardware language is more elegant.
If there weren't the clear precedent in the Macintosh IIfx design employing a pair of 6502s to offload I/O processing tasks I might be more in your camp as regards the Pi ploy. For that reason alone I don't feel like it's cheating to suggest the inverted Pi a la mode approach to making ice cream. Its onboard I/O hardware can deposit a double scoop on top of the pie on top of the ice cream between it and the 6502 sprinkles on the IIfx logic board.

As early as 1989, developing in C for prototyping code for such hardware and later retrofitting whatever required assembly for speed seemed the thing to do. Dunno if that approach might help spread the software development load to a greater number of participants for this project?

edit: never liked programming, but I can see how a greater number of members reading each others research finds and approaches to QuickDraw acceleration and sharing well documented routines at a common higher language level might be fruitful? I'd suggest doing it in parallel in a dedicated thread to avoid muddling both projects.

 
Last edited by a moderator:

trag

Well-known member
Heh.   Your point is still valid, but IIRC, the I/O coprocessors on the IIfx don't actually do anything in the Mac OS.   They only get used in Apple Unix (forgot the name.)    Again, IIRC, Apple never got around to putting support into the Mac OS to make use of the coprocessors and they just sit there and operate in pass through mode.

I would love to find that that memory is wrong.

 
Last edited by a moderator:

Gorgonops

Moderator
Staff member
I would love to find that that memory is wrong.
So far as I'm aware your memory is basically correct.

Technically the high-end original Quadras (900/950) have the same I/O coprocessors (condensed into a higher integration part?) as the IIfx but, likewise, I don't think they actually do anything with it.

 

Trash80toHP_Mini

NIGHT STALKER
Functional limitation to operation under A/UX makes sense to me. Apple never did multiprocessing until they bootstrapped it off a clonemaker's development work, no? No hooks in the OS to make any use of coprocessors. IIfx and Q900/950 were targeted A/UX platforms, IIfx for general use and the latter for ANS application. So embedding 6502s in their VLSI goodies makes sense.

Doubled tangent there aside, I think it's rather delicious that a slice of Pi used to offload some processes running under MacOS into LINUX. Maybe not directly, but the fickle finger of fate does provide a bit of amusement now and then.

 

Gorgonops

Moderator
Staff member
In theory at least if you cooked up NuBus transceiver logic that could be interfaced to a "Pi-esque" SBC that could push packets at some reasonable fraction of the practical, real-world speed of the bus (which until someone can show me evidence to the contrary I'm going to peg at maxing out at around 10-15MB/s despite the *theoretical* capacity of a Nubus burst transfer being around 40MB/s) then so far as I know there's no reason you couldn't have one Nubus card pretend to be, I don't know, say a video card, a network card, and a storage device all in one slot. The only limitation I can think of is if there's some restriction on how much I/O space you can have on a card that's also a framebuffer, or if the Slot Manager software framework has some limitation relating to drivers for multifunctional devices or... whatever.

As to "offloading processes" how possible that might be depends on whether you're talking about writing your own software or making something that magically accelerates existing software. In principle I could see, I dunno, writing replacements for things like the SANE numerical libraries that can offload some FPU functions to the much more powerful FPU you have hanging off the card, but how practical that is and how much gain you could possibly get out of it would probably depend a lot on how much massaging it would take for the native Mac data formats to take advantage of the alien hardware. You could certainly present new device functionality like, say, SSL/web accelerator APIs or a web media format processor you can throw JPEGS and MPEGS at to convert into bitstreams you can handle more easily on a feeble CPU, but obviously this requires all new software.

In any case, this is totally outside the "make a video card" scope defined in this thread.

 
Last edited by a moderator:

Trash80toHP_Mini

NIGHT STALKER
Yep, it is that. But keep the Futura II SX VidCard/NIC Daughetercard design example in mind as the main thread of development progresses. I wish bbraun would come back in from out in the world, he explained some of how that worked under a single SlotID to me long ago. You've also got the example of the DuoDock II logic board which is a multifunction Slot E PDS card with an independently implemented NuBus side car along for the ride.

 

Gorgonops

Moderator
Staff member
he explained some of how that worked under a single SlotID to me long ago
From what I remember from scanning the "Designing..." document I didn't really think there was any explicit barrier to piling as many functions as you want onto one card (within reason) so I'm not really surprised that a video+Ethernet card was a thing that already existed. Since I didn't know for sure, however I threw the "RTFM" warning in as a precaution.

As an aside, I've noticed that the phrase "standard QuickDraw acceleration" is something that appears nowhere outside of LEM video card profiles. A jaded part of me is starting to question how many of the cards so described are actually "accelerated" in the same sense the 8*24GC is.

 

Gorgonops

Moderator
Staff member
Hope this makes for useful information!
It is excellent how that manual has a lot of the declaration ROM code in it, along with some descriptions about how the ROM, VRAM, and I/O spaces are broken up over the Nubus slot space. (Yes, it's a PDS card, but it's following the slot manager conventions.) In theory I suppose it's mostly what's in the Toby documentation in the Apple manual, but it still might help fill in some of the gray areas.

 

balrog

Member
There’s also a .sit file containing source code that you can only see if you select “show all files”. Be sure to check that out as well!

 
Top