Xceed Memory Width and Other Finicky Details?

trag · Jul 28, 2011

Anyone know what the part numbers on the memory chips are?

So I was toying with the idea of recreating the Xceed again and reading through the 221706.pdf document which gives an overview of the video card's architecture and I can't get it to correlate with the images of the actual card that I see on line.

Oh, it correlates up to a point. The text description of the interface to the SE/30 PDS bus matches what I see on (images of) the board and makes sense when I look up the logic part numbers.

But the interface with the VRAM doesn't match.

In the text it states:

The VRAM is organized as two 32 bit wide interleaved banks. ... The data then enters a 64 to 32 bit mux comprised of 8 74F257s. ....

In 8 bit mode, the pixel data is routed through a 32 to 8 bit mux. The mux is comprised of 4 74F253s....

After the mux, the data is registered in a 74F374 to provide enough setup time for the pixel mixer PAL.

However, when I look at an image of the board, such as the one at the top of this thread:

http://68kmla.org/forums/viewtopic.php?f=7&t=12795&start=0&bookmark=1&hash=c1f43397

what I see is four 74F574s. No 257s. No 253s.

Now, I suspect that what they did is switch from a theoretical implementation of 64 bit wide memory, to a real-life on-the-board implementation of 32 bit wide memory and so the eight muxes went away, and the 574s replaced the 253s and the 374s. It's not an exact substitution, but with clever control of the 574s one could do it that way.

But that would mean that the memory is only 32 bits wide, and with the memory technology of the day, it would be difficult to get enough bandwidth for the 16M color at 1024 X 768 mentioned on Gamba's page.

Also, the memory chips have an awful lot of pins to be only 8 bits wide.

Anyone know what the part numbers on the memory chips are?

Anyone certain that these cards supported color depth greater than 8 bits. The chips I see on the board would make perfect sense if these cards only supported 8 bit color.

wally · Jul 28, 2011

I have only access to a Color 30, not the Color 30HR which has a different video spec. The Color 30 model has a limited video spec, 640x480, 8 bits max pixel depth or 640x870, 4 bits max pixel depth. On the Color 30 board, the memory is four Micron Technology MT42C8128DJ-8.

trag · Jul 28, 2011

I have only access to a Color 30, not the Color 30HR which has a different video spec. The Color 30 model has a limited video spec, 640x480, 8 bits max pixel depth or 640x870, 4 bits max pixel depth. On the Color 30 board, the memory is four Micron Technology MT42C8128DJ-8.

Thanks, Wally. Those are 128K X 8. And, looking at the Flickr page you link to in the thread I referenced above:

your card looks like all the other ones I've managed to find pictures of, so far.

So, unless there are extra components on the back of the board (seems unlikely since noone ever provides photos of the back) I'm going to assume that every image I've seen is of a similarly limited board.

On the other hand, the specifications also say those are multi-ported RAM, which isn't mentioned at all in the Maverick/Xceed description. That would explain all the pins, though.

Which makes me go, "Hmmmmmm."

These cards are pretty simple Frame Buffers, as far as I can tell.

A Frame Buffer is a chunk of memory, some digital to analog converters (DACs) and controlling logic which stores a digital representation of what you want on the screen and outputs it in a way that a display can understand. If you have a 640 X 480 image in 8 bit (one byte) color, then you get 640 X 480 = 307,200 bytes of information in the frame buffer memory.

I'm going to call "frame buffer memory", "video memory" from now on.

Then the thing must sling it to the display one pixel at a time. Let's stick with an example of 8 bits to the pixel, but keep in mind that it could be anything from 1 to 24 bits per pixel.

How does the image get into the video memory? The host computer writes it there, of course. So the video memory has an address in the computer and the computer can write to and read from the video memory, creating the image one sees on the screen.

A display expects a new pixel (analog data from the frame buffer) on (almost, more on this later) every beat of the pixel clock. So the frame buffer must be able to read the data from the frame buffer memory and send it through a digital to analog converter (DAC) and then on to the display at a rate fast enough to keep up with the display drawing the pixels.

So, for example, a 1920 X 1280 display at 60 Hz, is drawing 1920 X 1280 = 2,457,600 pixels sixty times per second = 147,456,000 pixels per second. It's actually worse than that. That's the number of pixels per second total, but other aspects of the timing means that the time between pixels is actually much shorter than you would expect.

Displays draw one line of pixels at a time. And then pause for a little while during the "horizontal blanking period". Then the next line is drawn and so on until the end of the display is reached. At that point, there is a vertical blanking period.

So going back to the 1920 X 1280 pixels X 60Hz example. On the face of it, you'd expect the pixel clock to be 147,456,000 Hz, because that's how many pixels need to be delivered per second. But, because of the clock cycles spent in the blanking periods, the pixel clock for 1920 X 1280 X 60Hz is actually something closer to 200 MHz.

The nice thing is that the frame buffer circuitry can do other things during these blank intervals, such as handle requests from the host machine to read or write the video memory.

According to the notes available, that's exactly what the Xceed cards based on the Maverick (that chip off by itself on the top right of the board) chip do.

When the SE/30 asks to read or write to the video memory, the Maverick will delay the SE/30 if the Xceed card is currently outputting a line of pixels. This is going to affect performance at least a little bit.

For a small display, there might not be that much writing to video memory, so such collisions might not happen very often. With multi-ported memory they might not happen hardly at all, but I wonder if the Maverick chip is aware of that.

Anyway, the point of all that is that if you're going to build a video card, one of the first things one must do is consider the memory bandwidth, which is driven by the display resolutions and color depth and refresh rate to a large extent.

If one must spend all of one's memory bandwidth driving the display, then the host machine is going to get slowed down waiting to access the video memory. And even with a surfeit of bandwidth, unless one uses a scheme such as dual ported memory (expensive) or FIFO buffers (somewhat less expensive) the host machine will still get stalled while the frame buffer is reading the data to write each line of pixels.

But how much does that matter in practice?

wally · Jul 28, 2011

Years ago I had to make a cable to get mine to work. After some research, I found that US patent 5,307,083 had most of the necessary information for the cable, and also contained a very informative theory of operation.

trag · Jul 29, 2011

Years ago I had to make a cable to get mine to work. After some research, I found that US patent 5,307,083 had most of the necessary information for the cable, and also contained a very informative theory of operation.

Ah, yes, I had that on hand. I hadn't read it recently. The cable description is nice. The theory of operation is useful, but mostly redundant with the other document I've been looking at. However, the patent did describe the 4 X 128K X 8 multiport memory, while the document I've been reading has that bit about 64 bit wide memory bus muxed down to 32.

I've been playing around with the design concept for a long time in an iterative fashion.

The design factors that affect the design concept are the memory bandwidth, the cost of components, the cost of assembly (strongly related to number and type of components), whether to run the data bus through the FPGA (or a second FPGA) or have it only on the PCB and massaged as needed by discrete logic components, and the number and type of I/O pins available on the FPGA and how that affects the FPGA package and cost.

The memory bandwidth is driven by whatever requirement you choose for the maximum supported resolution and refresh rate.

Fantasy Requirements

For example, I was thinking of something that would:

1) Drive both an external monitor and the internal gray scale simultaneously.

2) Drive the external monitor at 24 bit color up to 1920 X 1280 at 60 Hz.

3) Always respond to reads and writes from the host (SE/30) without delay

First concept

Number 2 means that one needs better than 150 megawords/sec memory bandwidth just to drive the external monitor.. Plus additional memory bandwidth is needed to allow for host reads and writes of the video memory. The SE/30 PDS slot only runs at 16 MHz, so call it 168 megawords/sec.

But any affordable memory is going to have refresh cycles so that's more bandwidth.

DDR2 (or DDR for that matter) can provide the desired bandwidth. Individual chips are fairly expensive, but the DIMM modules are cheap. FPGA designs from Xilinx on the Spartan 3A can run DDR2 up to 200/400 MHz.

But, if we use DDR2 memory the FPGA must be connected to the DIMM data pins to capture the double data rate output, which means more I/O pins for the FPGA. This very quickly turns into a design with a $40 FPGA in a BGA package (soldering issues).

However, it confers an advantage at the same time. Number 3) isn't really possible without read/write buffers for the data, which necessitate extra components if the data bus exists only on the printed circuit board. That means more cost in chips and more complicated and expensive assembly. If the data bus is brought inside the FPGA, then all the data buffers can be implemented in the FPGA.

So, $40 FPGA, a DIMM socket ($3) and a $20 DIMM for memory, a handful of 5V to 3.3V level translators because the SE/30 PDS slot is 5V and FPGA I/O pins don't do 5V signaling any more, a high speed 3 X 10bit DAC $8. Call it $20 for the circuit board and $10 for the PDS slot connector, and we're at about $100 in materials before even considering the grayscale board or cabling or various cable connectors.

On the bright side, this would yield a very high performance design with at least 512 MB of video memory (Yikes!) running at 333 MHz. It could be taken to 400 MHz, but 333MHz is a nice factor of 20 times the SE/30's clock speed. Only about 10 MB of that RAM would get used. And if the 484 pin Xilinx package is used, there are about 90 pins still available on the FPGA for other things....

But the cost would be something like the original list price on the Xceed card.

Second Concept:

There really isn't any memory on the market cheaper than DDR2 right now, unless one wants to try using 72 pin SIMMs or something. Any alternatives to the above based on different memory is going to cost as much or more because of the VRAM.

However, I have a couple of thousand synchronous GRAM (SGRAM) chips on hand that are 32 bits wide and run at about 100 MHz. They need refresh cycles.

The top resolution (requirement #2) would be reduced to something like 1280 X 1024 X 60Hz. It wouldn't look good on a CRT but it'd be okay on an LCD where the slow (display) refresh rate isn't so bad.

And this reduces the FPGA I/O to something that will fit on a 208 pin QFP or even a 144 pin QFP. The FPGA is only controlling the address and control lines. It's not touching the data lines in this concept. The data bus only exists on the PCB, not in the FPGA.

The FPGA cost drops to $10 - $20 and is simpler to solder down. The ~$25 cost for the DDR2 goes away. That saves about $45 - $55 in materials. I have some cost for the SGRAM, but realistically, it's just sitting in my attic.

But there's no way to buffer the data without adding extra chips and you just can't get much depth of buffering without using a purpose-built FIFO buffer, and ones that are wide enough and fast enough are ~$20 - $50 all by themselves. Without buffering, the card must read the video memory the entire time a horizontal line of pixels is being written. Otherwise some pixels will be skipped.

So, one either gives up number 3) and limits the host (SE/30) to accessing video memory during blanking periods, because the video memory is busy being read all other times, or one adds enough extra components to the data bus to bring the cost back up to the first concept.

So, this concept is about $50 cheaper and performs the basic function. But the video memory is limited to the actual memory needed for the video, and the host computer will be delayed most of the times that it wishes to access the video memory. Requirement #2 is reduced. Requirement #3 is ignored.

Reference designs for DDR2 controllers are common and even included in the Xilinx tools. A controller for the SGRAM might have to be written from scratch. This adds difficulty.

Third concept.

Like the second concept, but add a second QFP package FPGA which handles the data bus. Some control signals will pass from the address/control FPGA to the data FPGA but there are still plenty of pins. All of the buffers can be implemented in this second FPGA allowing reads and writes of the video memory while pixels are being written.

This still adds about $20 back into the cost, but it's still about $30 cheaper than concept 1. But it adds a substantial amount of complexity getting the two FPGAs to act in concert.

Maybe $30 cheaper than concept #1. Is superior to Concept #2 by meeting Requirement #3 but costs about $20 more. No BGA soldering. Added design complexity over both concept #1 and #2.

Fourth concept

Like the third concept but go back to the DDR2 memory idea. So one QFP (pins around the edges) FPGA controls address and control lines. A second FPGA handles the data bus and contains first-in-first-out (FIFO) buffers.

The cost is about the same as concept 1. Two $20 FPGAs cost about what one $40 FPGA costs. The two FPGAs have pins around the edges instead of being BGAs, which is a plus, but there are two of them. And the design complexity is still higher. Reference designs for DDR2 controllers don't split the design across multiple FPGAs.

It's an interesting compromise getting rid of the BGA chip problem, at about the same cost, but creates new problems, and doesn't yield the 100 left over pins.

Fifth concept

I have some old slow SRAM on hand. It probably runs at an equivalent of maybe 15 - 20 MHz. Drop Requirement #3. Reduce Requirement #2 to 640 X 480 @ 75 Hz and some higher resolutions at 8 bit color depth, and it will still be a close thing. Maybe use 64 bits width of SRAM to double the bandwidth.

Requirement #2 is reduced to the minimum possible level. Requirement #3 is blown. Cost is about the same as concept #2, although I might be able to find a cheaper DAC, but at $8 DAC cost the maximum possible savings is $8. Complexity is greatly reduced as SRAM does not need any refreshing and the addresses to memory are not multiplexed. The number of address pins needed increases, but only by ten or so.

===============================================

Okay, I'm not sure why I blabbed all that out on the page, but it's been running around my brain the last few days. I guess I blabbed it out here to get it out of my head.

I really like the idea of #1, although I wonder what performance would be like on an SE/30 if it was trying to drive that much screen real estate. But the idea of having 512 MB of RAM just sitting there at 300+ MHz with a spare 90 pins on the FPGA and probably about half of the internal resources of the FPGA still avaiable.... How hard could it be to recreate the Daystar RAM Charger?

But #4 would probably be a much better place to start. Figure out how to interface with the Macintosh OS and bus without having to deal with the complexity of fancy memory with refresh requirements and such.

Still, you can see that no matter what direction one goes, the costs just don't get cheap.

JDW · Jul 29, 2011

Here are some photos of the Micron Xceed Grayscale setup I acquired not long ago (click the magnifying glass to zoom in on them):

https://picasaweb.google.com/jameswages/SE30MicronXceedGrayscaleVideoWithDaystarSnapOnAccelerator

(I will start a new thread on that later, explaining how the highly coveted system fell into my hands.)

Gorgonops · Jul 29, 2011

This is a really ignorant suggestion, as I haven't done any research at all, but...

My vague understanding from reading random documentation is that *essentially* all you need to provide in a driver for QuickDraw to use a piece of video hardware is essentially just a descriptor block which indicates where in memory the framebuffer is and some attributes describing its pixel/color depth layout. *If* that's the case... would there be any value in the idea of instead of creating a video card from "whole cloth" deploying an as-simple-as-you-can-find VGA chip that supports linear framebuffers behind a (hopefully simple) bus adapter? The idea just occurs to me because I know that some of the last add-on video cards for LC PDS machines used PC-centric VESA local-bus VGA chipsets in an as-dumb-as-possible layout. Unfortunately I suppose just about any currently available "mainstream" VGA chip would use a PCI-or-better interface, and that would probably be a lot harder to adapt to the 68030 bus than VLB was.

Eh, I suppose it's so easy to do VGA with an FPGA there's no point. Never mind. I think I just had this momentary infatuation with the idea of completely bodging together a prototype which was little more than an ISA slot adapter with an early "Windows Accelerator" linear framebuffer VGA card wedged in it. ;^)

Fantasy Requirements
2) Drive the external monitor at 24 bit color up to 1920 X 1280 at 60 Hz.

It might be amusing to cook up some tests to see just how long it takes a 16Mhz 68030 to redraw a framebuffer of that size. My guess is "quite a while". (The 68030 isn't even scalar, let alone super-scalar. Taking into account a total of one reference it might be capable of around 6 MIPs, which I suspect would translate to even a simple block copy from system RAM to that framebuffer taking the better part of two or three seconds. Maybe substantially less if there's a block move command, but that wouldn't have much of a bearing on *rendering* a display.) I absolutely guarantee the blurb you saw about the original cards supporting "16M color at 1024 X 768" is absolutely wrong. I haven't seen the manual for the one that supposedly supported 16M colors, but the manual for another cards shows that 8 bit color/grayscale was supported at the lowest resolutions and it tapered off rapidly after that. If the original cards ever shipped with more than 1MB of RAM I'd be *very* surprised.

Personally I'd shoot for a prototype that was #5 on your options list but eh, I'm an idiot. I do think with any of these you may be overthinking the avoid-memory-contention-delays-at-all-costs part of the equation. So far as I know just about any "inexpensive" video card induces wait states so it's not as if it'd be a unique mis-feature of your design. (And let's not even think about the non-insignificant number of Macs that used contended *system RAM* for video memory, meaning the system could potentially spend a lot of its time waiting even if it *wasn't* accessing the framebuffer.) If you do lower your expectations to, say, supporting at max "Megapixel" monitors at 8 bit color (Or any other resolution that fits in 1MB of RAM), which would cut your display bandwidth requirements down to the 50-70MB/sec ballpark, perhaps you might find some relatively "cheap/trivial" solution to avoiding contention... maybe some sort of cycle-sharing arrangement?

(With a low-enough bandwidth cap I'm sort of wondering if you could dedicate most of the FPGA's pins to the SE/30 interface and use a narrower 8-16 bit interface to the framebuffer. Since you can clock the FPGA and suitable RAM so much higher than the SE/30 you might be able to get away with a solution as simple as running the framebuffer at effectively twice the speed of the bus interface and using an even/odd cycle for display reads and CPU access. The FPGA would "stretch" the cycles on the SE/30 side so as far as it was concerned it was the sole owner of the memory mapped into the slot, and by going "narrower and faster" on the RAM side you save enough pins to keep it all down to one relatively cheap FPGA. I don't know much about FPGAs, but I sort of wonder if you might even find one that could do a 1MB framebuffer internally and not even need an external chip.)

Anyway, blawblawblaw.

trag · Jul 29, 2011

But #4 would probably be a much better place to start. Figure out how to interface with the Macintosh OS and bus without having to deal with the complexity of fancy memory with refresh requirements and such.

Darn, the short editing period. I meant to write, "But #5 would probably be a much better place to start."

#4 would be a nightmare as a place to start.

Here are some photos of the Micron Xceed Grayscale setup I acquired not long ago

Very nice. If you ever have them out of the SE/30 again, any chance of putting them on a scanner to produce images in which the chip part numbers can be read? Or do you have higher res. versions of photos #1 and #7 from that spread?

I think it would be interesting to do a comparison of the chips on #1 with a Daystar PowerCache '030 card to see if there is a one-to-one correspondence in the chips used. With a IIcx adapter, one should, in theory, be able to install a PowerCache '030 in an SE/30 CPU socket (physical layout problems ignored). And the IIcx adapter has no chips on board. It's just wiring. So, the cpu socket Daystar upgrade may contain exactly the same stuff as a PowerCache 030.

I'm not sure what use that info would be, other than being kind of interesting...

On photo #7, you seem to have a version of the video card which is considerably different from the one I've been finding images of. Do you know what the part number is on your video memory chips? There are sixteen of them there, and a bunch of chips downstream. It looks like it could be much closer to the description I read, and less like the description in the patent. In other words, yours might have the 64 bit wide memory arrangement. Do you know what resolutions are supported by your card?

Fantasy Requirements
2) Drive the external monitor at 24 bit color up to 1920 X 1280 at 60 Hz.

Click to expand...

It might be amusing to cook up some tests to see just how long it takes a 16Mhz 68030 to redraw a framebuffer of that size. My guess is "quite a while".

Back of the envelope... 1920 X 1280 = 2,457,600 pixels. If millions of colors, then it's one word per pixel, so 2,457,600 words. The SE/30 bus is 16 MHz, but I don't think it can manage better than a write every two cycles. At least the timing diagram I saw seemed to indicate that the address goes up one cycle and the data doesn't go up until the next cycle.

In theory it might manage 8 Mwords per second. However, that's leaving no time for calculations. If it's just moving data from memory to the frame buffer, then, even if it never waits for the frame buffer, it will still be slowed down by the motherboard RAM. What is the memory (RAM) performance on an SE/30?

Still, unless the RAM is a lot slower than the CPU, it might be able to write a frame buffer that size a few times per second. But it wouldn't be doing any rendering, as you mention later.

Personally I'd shoot for a prototype that was #5 on your options list but eh, I'm an idiot.

No, not at all. I mean, you're not an idiot. I think that's a good idea.

I thought about it last night. There may be an even simpler option.

The Xilinx Spartan 3A(N) Starter Kit is $189 ($199 for 3AN) and it includes a VGA connector and some demos which drive video. There is also a DDR2 chip soldered on the board and some DDR2 controller demos.

The easiest start might be to focus on interfacing the Spartan 3A Starter Kit to an SE/30 in a way that will cause the SE/30 to write to the Spartan 3A as if it is a video device.

My vague understanding from reading random documentation is that *essentially* all you need to provide in a driver for QuickDraw to use a piece of video hardware is essentially just a descriptor block which indicates where in memory the framebuffer is and some attributes describing its pixel/color depth layout.

I think I remember the same thing from reading "Designing Cards and Drivers for " So the first step would be getting the Spartan 3A to interface with the SE/30 PDS slot and provide a firmware thingy (can't remember the name) that tells the Mac that there's a video card here. Then have the Mac write data into the Spartan 3A and use the code from the demos to output that through the VGA port on the Starter Kit board.

Once that is working, one can expand and modify as desired. This has several advantages. Using the resources on the Starter Kit means that there's no need for extra circuit boards beyond something to connect the SE/30 PDS slot to the Spartan 3A Starter Kit FX100 connector. No need to try to implement the DAC or the DDR controller or any of that stuff early on.

Start with something low resolution and shallow bit depth and use the block RAM in the FPGA for storage. Maybe 640 X 480 X 1 bit to start.

I don't know much about FPGAs, but I sort of wonder if you might even find one that could do a 1MB framebuffer internally and not even need an external chip.)

There is 360K of block RAM available on the Spartan 3A included in the Starter Kit. 460K if you don't mind eating up your other resources, which, of course, you can't, not entirely.

But for early experiments, this would work fine. Each bit of color depth for 640 X 480 needs 307,200 bits, or 38,400 bytes. One could implement 640 X 480 X 8 bit video resolution just using the block RAMs in the FPGA.

So the initial challenges would be to

1) Build an interface board from the SE/30 PDS to the Spartan 3A FX100 connector.

2) Program the FPGA to recognize when it is being addressed by the Mac and to simulate a Declaration ROM (hey, I remembered what it is called!) to the Mac.

3) To respond to read and write requests from the Mac to the frame buffer memory described in the declaration ROM.

4) To modify the Starter Kit Demos so that the image output to the VGA port comes from the memory that the Mac is reading and writing.

It doesn't look so bad when laid out like that. Once that's working, more intricate stuff can be added. The trick is getting the first thing working.

All programming boils down to debugging an empty file until it does what you want...

(With a low-enough bandwidth cap I'm sort of wondering if you could dedicate most of the FPGA's pins to the SE/30 interface and use a narrower 8-16 bit interface to the framebuffer.

Hmmm.

The SE/30 data interface is 32 bits wide. The frame buffer might as well be at least that wide. If one cuts the frame buffer from 32 bits to 16 bits, half of the bandwidth is lost and the supported resolutions either go down, or the memory must run faster.

Consider these two cases:

Case 1) The data bus runs through the FPGA

Case 2) The data bus does not run through the FPGA

In Case 1, narrowing the video memory saves FPGA pins. There's a sort of cliff beyond which that doesn't matter. More in a bit.

In Case 2, narrowing the video memory doesn't really save anything.

It's a lot simpler to only support 8 bit color than it is to support 24 bit and 8 bit. But it's not simpler because the memory is narrower. It's simpler because you don't have to provide for a mode where 24 bits go to the DAC per pixel and another mode where only 8 bits go to the DAC per pixel.

So, Case 2 isn't very interesting. If the data bus isn't on the FPGA, the width of the data bus just doesn't affect complexity much.

Let's look at Case 1.

The cliff I mentioned above is when you pass about 150 I/O pins on the FPGA. That's the most you can get out of a non-BGA package. If you need more than 150 I/O pins, you're either using a BGA package, or you're using more than one FPGA.

Let's count the pins.

The Case 1 FPGA must interface with these Address & Control lines:

1) SE/30 PDS slot address and control lines; about 32 + 10 lines, but could be 10 + 10 with some extra chips

2) The video memory address and control lines; 15 - 20 depending on the type of memory

3) The DAC control lines; 6 - 10

4) Other miscellaneous (e.g. Flash for Declaration ROM); 20

So that's about 94 lines for address and control. That leaves 56 for data.

(Note: We could reduce the address and control lines a bit by adding extra logic on the circuit board, for example, use latches to store the address when the Mac is addressing the Flash ROM, then we would save about 16 address lines. But this would cost more and defeat the purpose of putting the data bus on the FPGA. The idea is to keep the component count to a minimum.)

The Case 1 FPGA must interface with these Data lines:

1) SE/30 PDS slot data: 32

2) DAC: 8 for 8 bit color; 24 for 24 bit color

3) Flash ROM: 8

4) Video Memory ?

The number of I/O pins left available for video memory is 56 - (32 + 8 (or 24) + 8 ) = 56 - 48 = 8.

So, maybe we could build a card with 8 bit wide video memory and get it all on an FPGA in a 208 pin QFP package. Let's look at some bandwidths. These are all the minimum theoretical bandwidths just to support the display and ignore refresh and the need of the host computer to access the video memory. All assume 75Hz refresh and 8 bit color depth:

640 X 480 23 MHz

832 X 624 39 MHz

1024 X 768 59 MHz

1280 X 1024 99 MHz

I don't think you can get any of those with cheap FPM or EDO memory. A -60 rating isn't even going to support 16 Mwords/second real transfers. Some of the more expensive SRAM will hit those levels, but it's expensive. One would have to go to Synchronous DRAM (SDRAM), the old PC100, PC133 memory to get up to those speeds, but at this point in time, it's more expensive than DDR2 memory.

And buying individual DDR2 memory chips is more expensive than buying a whole 64 bit wide DIMM.

I could use old slow SRAM in my fifth concept in the earlier post because I was using 32 bits or 64 bits of width. Here, we're only using 8 bits of width, because that's all the pins left on the FPGA.

I guess one could install a DDR2 DIMM and only use 8 bits of the width... That would work.

Except that DDR2 requires some supporting signals to time the signal properly, so those eight bits would need more than 8 I/O pins. Sigh.

You see how tough it is to fit all address, control and data lines into a non-BGA FPGA even if you narrow the video memory to 8 bits?

So, the only case in which narrowing the video memory width really saves you FPGA resources, is when the data bus is on the FPGA and when you stick to 8 bit wide video memory, and that will be either expensive or not feasible -- or you could support 4 bit color and below.

As soon as you go to 16 bit video memory, you've blown your FPGA pin budget and are into BGA packages with higher pin counts. Once you're at that point, you may as well zoom right past 320 and 400 pins and go straight to 484 pins. This gets you 372 I/O pins. More than twice as many as with the 208 pin QFP package.

Since you can clock the FPGA and suitable RAM so much higher than the SE/30 you might be able to get away with a solution as simple as running the framebuffer at effectively twice the speed of the bus interface and using an even/odd cycle for display reads and CPU access.

I'm not sure what you're getting at here.

trag · Jul 29, 2011

Here are some photos of the Micron Xceed Grayscale setup I acquired not long ago

Click to expand...

Very nice. If you ever have them out of the SE/30 again, any chance of putting them on a scanner to produce images in which the chip part numbers can be read? Or do you have higher res. versions of photos #1 and #7 from that spread?

Never mind. I found the elusive ;-) magnifying glass function. Very nice photos.

Gorgonops · Jul 30, 2011

(With a low-enough bandwidth cap I'm sort of wondering if you could dedicate most of the FPGA's pins to the SE/30 interface and use a narrower 8-16 bit interface to the framebuffer.

Click to expand...

Hmmm.

The SE/30 data interface is 32 bits wide. The frame buffer might as well be at least that wide. If one cuts the frame buffer from 32 bits to 16 bits, half of the bandwidth is lost and the supported resolutions either go down, or the memory must run faster.

*snip*

Okay, it does appear the pin count argument deep-sixes putting a "cheap" FPGA inline, which is the thing I was thinking sounded "interesting". Although... it does seem to me you might be able to cut out, say, the declaration ROM lines? I can't really think why they'd need to pass through it. How much decoding does the ROM really need... although further along those lines, and this is me being ignorant again... what about the possibility of multiplexing some of the lines? I'll lump the idea in with my explanation for the below:

Since you can clock the FPGA and suitable RAM so much higher than the SE/30 you might be able to get away with a solution as simple as running the framebuffer at effectively twice the speed of the bus interface and using an even/odd cycle for display reads and CPU access.

Click to expand...

I'm not sure what you're getting at here.

So, this is a lame line of speculation which spawns from my limited experience with the 6502. (In which it's possible to share memory devices without contention if said devices can make use of the fact that the 6502 uses the bus in a predictable 50% duty cycle. There's some really strange examples of devices leveraging this. For instance, some old Commodore disk drives contain two 6502 variants, one which runs DOS and communicates with the host computer while the other CPU drives the disk hardware. Both CPUs are on the same bus and share access to a memory buffer. There's no contention because they're 1/2 clock out of phase with each other. And of course most 6502 home computers that used main memory for video refresh took advantage of this same property to make the video hardware accesses to RAM non-contending.) Along those lines, my thought was the thought that if the FPGA was "bridging" between the framebuffer and the PDS slot then you could use whatever arbitrary memory architecture was most efficient on one side of the FPGA, and if the resulting architecture had enough bandwidth to both feed the display and handle the worst the 68030 could throw at you you could render contention a non-issue by essentially leaving a notch (it wouldn't have to be a full 50% by the sounds of it) in your display timing where a read or write by the CPU could easily be buffered without the CPU ever knowing about it. The only problem with this is because you want to support various resolutions timing isn't going to be completely predictable so a simple "even/odd" or regular fraction won't do, but... the FPGA has block RAM, why not use it to create a line buffer for feeding the DAC? (I guess that *would* mean you'd have to dedicate 24+ I/O pins to feeding the DAC instead of putting it on the RAM data lines, but... wait, it sounds like you already accounted for those.) That lets RAM run completely asynchronously and gives you a lot more wiggle room.

It sounds like because of pincount issues a simple bridge with one cheap FPGA isn't possible, but perhaps you might be able to do this with some hardware that would be cheaper than another FPGA? (I just looked at Digikey and saw some 64 I/O pin CPLDs for $3 a pop, for instance? What we're after here is a buffer/bidirectional latch for data and address information supplied by the Mac, maybe there's even some 74xxx logic that would do.) This idea goes that you have the FPGA wired in a semi-private bus to RAM in whatever configuration you see fit and clocked as high as you need to achieve "plenty of headroom left over for a 16Mhz read or write" bandwidth numbers. PDS control lines also terminate directly on the FPGA, but the data and address lines terminate on these latches, and on the inside said latches are tri-stated to the same I/O lines that drive the framebuffer RAM. Let's pretend in this lashup that the PDS's base clock is 16Mhz, while the FPGA is running, I dunno, four times faster. When the Mac performs a bus cycle the process would go something like this: (And this I'm making up because I don't know exactly what a bus cycle looks like on the PDS. Substitute reality for my crud.)

In the first 1/4 of the PDS's "public bus" cycle the Mac puts an address onto the PDS and fires off the strobe. Latch stores the address bus contents and FPGA acknowledges that the Mac just wanted something. (presumably there are other bus lines which change simultaneously.) It makes a note to prioritize a memory cycle within this window to service the Mac's request, and meanwhile it lets a cycle reading the FB for screen refresh stuff a word into a line buffer run and complete. In the next 1/4 of the PDS's cycle what the FPGA does may depend whether the Mac wanted to read a memory location or to write it. If it's a write it might either go ahead and let the address sit in the latch until later in the bus cycle and just ack that it's received the address if necessary (I don't know if data and address are valid simultaneously on the 68030 bus, or if there are separate data and address strobes during a write) or it might take this cycle to quickly tristate RAM, use the same lines to read the address and data off the latches into a buffer on the FPGA (assuming we can't just shunt it straight into the RAM chips, which would be even better), tristate the latches again, run a memory cycle to write it into RAM, and remind itself set the bus acknowledgement signals appropriately according to PDS timings. If the request was for a read then the FPGA then ASAP tristates RAM, reads the address latch, switches back to memory to run the read, back to latch to write the read value into it for the Mac to receive when it's ready, and again, reminds itself to set status lines appropriately. And after all this there's still enough time in the "external" PDS cycle the FPGA then reads another word to stuff in the screen refresh output buffer.

I imagine the above block of blawblaw was confusing, but... essentially it boils down to wondering if with a few buffer/latches you could save 72 FPGA pins while still effectively isolating the PDS from your VRAM. (Minus some pins for triggering tristate conditions on the various peripherals attached to the "private" bus.) Using the completely made-up cycle timing in the paragraph above you should be able to get at least one and possibly two RAM-refresh reads on any cycle the CPU is touching VRAM and four accesses if the CPU isn't, which hopefully would be enough to prevent any "DAC Starvation". Something I'm sort of depending on in this is that you could use some block RAM on the FPGA to implement a read-ahead line buffer for the video output. In addition to solving timing/synchronization issues this would let you keep your output pipes stuffed full whenever the Mac isn't interested in reading or writing VRAM, and if you had that... heck, you might be able to use quite slow external memory and still never have a contention problem. When you figure the Mac is going to be spending quite a few clock cycles "thinking" when it's rendering a screen even if a CPU access was completely blocking during a bus cycle if you can read ahead during all the other cycles you might never have a problem.

There's a lot of "ifs" up there. You sounded pretty adverse to using any additional buffers or latches if you could get away with it, but I sort of don't think you can if you want the framebuffer non-contending. Essentially what I outlined above is an idea that's similar to your... option #3/4, but using dedicated latch hardware instead of another FPGA. (Although you could use an FPGA for that hardware, if that was ultimately easier. Mostly the difference is instead of splitting address and data lines between two FPGAs we'd be leaving one in essentially complete control of the memory bus and reducing the other to being a pretty dumb slave.) I don't know anything about interfacing RAM more sophisticated than plain-old SRAM so I don't know if it's feasible to share the address and data lines like I described... maybe it isn't. I'm sort of assuming you can tristate the RAM so it can give up the bus to another peripheral easily. I'm sure you'd probably need a buffer like a 74244, and perhaps timing restraints don't make it doable when dealing with something like DDR. However, there is after all *very* fast SRAM available, like the stuff used for motherboard caches, and if it were as simple to interface as the slower stuff and eliminated the complexity of using DDR it looks like you could do 4MB of 1Mx8 10ns SRAM for around $2-$4 a chip. I know that comes to $8-$16 for a pretty small framebuffer, but it's more than enough for an SE/30. And of course I don't know if doing a line buffer in the FPGA's onboard block RAM is feasible either, although, again, if it were that might solve your problems right there if the Mac's effective PDS duty cycle is low enough. If it were you might be in a position to eliminate those latches again and simply pump your 10ns RAM at light speed when the Mac doesn't want the bus and then slowing it down to normal PDS rates when it does. (You'll still probably need tristate buffers to isolate the PDS when you're pounding on VRAM, but of course you'll know that better than I do.) If by some bad chance you did end up draining your line buffer, well, throw a wait state on the bus. It's not the end of the world.

Anyway, again... I *am* an idiot when I come to this stuff, but there's some lame ideas to ponder anyway. :^)

Xceed Memory Width and Other Finicky Details?

trag

Well-known member

wally

Well-known member

trag

Well-known member

wally

Well-known member

trag

Well-known member

JDW

Well-known member

Gorgonops

Moderator

trag

Well-known member

trag

Well-known member

Gorgonops

Moderator

Similar threads