Fast! CompactFlash for Macintosh PDS/NuBus

David Cook

Well-known member
it seems to be one of the first genuinely outstanding developments I've seen in quite some time. It's on par with @dougg3 's custom ROM SIMM and @bigmessowires ' FloppyEMU (both of which were first developed almost 15 years ago and 12 years ago, respectively). In other words, very exciting stuff!

I concur. This is really a huge leap forward. It's not just the speed -- SCSI can be flaky on repaired machines.

I remain optimistic that this tariff war is short term. The world is truly a global economy and there's no going back.
 

LaPorta

Well-known member
Regardless of whatever political stuff there is, just build it and price it with the margin you want above cost, and we will buy it. It costs whatever it costs.
 

CC_333

Well-known member
The world is truly a global economy and there's no going back.
Agreed. I'm not sure I share your optimism, however, and on behalf of all sensible people in the US, I feel like I need to apologize to everyone else for all the nonsensical whipsawing 😞

Regardless of whatever political stuff there is, just build it and price it with the margin you want above cost, and we will buy it. It costs whatever it costs.
Pretty much, unfortunately. Life must go on, no matter how costly it gets. If you have the means, it probably won't be too bad.

Those who don't, however, will probably get mostly locked out. No matter how long it lasts (the tariff war could end tomorrow and still hurt because the damage already done to the economy and the US' reputation is irreversible, at least in the medium term), it's definitely not a fair situation....

c
 
Last edited:

zigzagjoe

Well-known member
We will see what happens. It annoys me as I wanted to hit as practical a pricepoint I could manage with this, and to have to increase the price for these reasons... ugh.

Anyways, I got the production qualification boards. All seems well; I redesigned the logic to improve compatibility. Next is further testing in my "done" SE/30s as well as some quick System 6 testing. The hardware I am confident in, but despite no issues being reported by testers I still want to be very sure about the software (ROM driver). I want to make these as data-loss-resistant as I can, at least as anything can be on Mac OS with its habit of corrupting filesystems.

Hopefully should have some of the sample batch posted for sale in the next week or two.

1744428747693.jpeg1744428753110.jpeg

I made my personal little socket CF card also. Unfortunately, it's no good for the IIcx... I forgot the nubus decode works out to the entire nubus space, not just the slots that physically exist on the machine. So there's not a way to make this work without a redesign that won't happen as it was just something to amuse me. It does look awful cute with a booster on top but this can't physically fit into a SE30 due to the heatsink. No plans to make these, tis too silly (and a PITA).

1744429051042.jpeg
 

nickpunt

Well-known member
I love this and want to be on the list! Especially love the idea of baking this in with other improvements; my long term hope is various mods are consolidated into fewer components like Bolle's riser with ethernet or your grayscale+CF example.
 

zigzagjoe

Well-known member
@zigzagjoe Could a PDS splitter work in a IIci? Or do you think it's better to just stick with NuBus?

Conceptually, it should be possible as on the IIci the CACHE signal can suppress NuBus cycles. So there shouldn't be an issue using an unused slot space for a PDS-type card so long as no interrupt is required and the logic is set to kibosh those accesses.

But, truthfully, the IIci isn't a super interesting machine to me, and rather than continue to be pigeonholed into one or two machines by how easy the 68k bus is to work with directly, I'd rathert develop something new and different that's more broadly applicable. So, that would mean NuBus.

I do need to find out how quickly the IIci can issue nubus cycles as a bit of a decision point for me. If anyone with a LA is bored, get a NuBus video board, set it to 8 bit color, and probe the C10, /START, and /ACK cycles while doing a Copybits graphical test using System Info or MacBench.... the average time between /ACK and /START would indicate roughly how quickly the IIci is prepared to issue NuBus cycles. A scope can also be used to measure this, but it would be more difficult as you'd need to capture many transfers and average the time in order to get an idea.
 

olePigeon

Well-known member
@zigzagjoe What about doubling as a RAM card for a RAM disk? Daystar (and a few other companies) used to make the Daystar RAM PowerCard that offered (I think) 128MBs of RAM disk space.

I don't mean to pigeonhole, but I am a pigeon.
 

adespoton

Well-known member
Good news for Americans: those tariffs are now exempted for computers and phones. So I'm presuming, at least this week, they're also exempt for components?
 

zigzagjoe

Well-known member
@zigzagjoe What about doubling as a RAM card for a RAM disk? Daystar (and a few other companies) used to make the Daystar RAM PowerCard that offered (I think) 128MBs of RAM disk space.

I don't mean to pigeonhole, but I am a pigeon.

The CF bits would be enough of a challenge without adding more :) I have not yet tangled with making a DRAM controller of my own, but it is on the to-do list for someday. I like the idea of a Booster card with some integrated fast RAM, but while I have a loose understanding of how DRAM do, I've avoided actually getting into the nuts and bolts.

Good news for Americans: those tariffs are now exempted for computers and phones. So I'm presuming, at least this week, they're also exempt for components?

Who knows - it's already being un-walked back (forward?). Whatever. I am assuming nothing as I've already been hit by tariffs a few times and avoided them in other scenarios. As long as the De minimus exemption exists, that gives me a little time, but once that goes away it's anyone's guess.
 

uyjulian

Well-known member
If DMA is not being used, I wonder if implementing some sort of compression e.g. lzo or lz4 for transferring sectors would make transfers fasters and allow squeezing more bandwidth out of the NuBus interface.
 

zigzagjoe

Well-known member
If DMA is not being used, I wonder if implementing some sort of compression e.g. lzo or lz4 for transferring sectors would make transfers fasters and allow squeezing more bandwidth out of the NuBus interface.
Nah, there's not nearly enough CPU time to make something like that work out.

Realistically the limited max throughout of nubus isn't a major limiter, and I am also testing the worst case (mac II class nubus). The main benefit of NuCF from what I've determined is that small accesses are much "cheaper" to execute. Mac OS loves to make single sector sequential accesses, especially during early boot; these are cheap on CF even without any kind of prefetch. However, on SCSI with the limited CPU performance of the 030 machines those hundreds of small accesses add up to waste a lot of CPU time in the driver, scsi manager, scsi controller, and the drive itself. Even with the limitations of the early NuBus implementation (all later machines should be faster!) the benefit of CF over SCSI remains for these small accesses as they are not limited by bus speed.

That performance gain is mostly due to fewer software and hardware layers to go through. However, as CPUs got faster and the SCSI implementations better, the CPU limitation on small transfers and SCSI hardware limitation on large transfers becomes less of an issue. So while the Mac II machines see 3x or greater performance improvement on CF over SCSI disks, in Quadras the PIO mode CF merely becomes somewhat faster rather than several times faster under all conditions. Less compelling.

DMA: There isn't a huge point to DMA for this type of activity on a 68030 based machine, as due to how the 030 bus arbitration works a CF transfer will block the bus halting the cpu for extended periods of time. The situation is a little different on the 040 machines which can have nubus cards execute bus master transfers while the CPU can run within its caches, so it can do some productive work while locked out from main memory.

So you can kind of see how the design "divide" comes down.

Design A) PIO based NuBus implementation for 68030s

Similar to my current design, expanded to 32 bit width in hardware. I expect i would see most if not max performance practical on the 030 machines, and probably also the Quadra 700/900. Conceptually still fast on later machines, just not a quantum improvement over SCSI. Relatively simple hardware by comparison, and would use existing driver.

Design B) Bus mastering UDMA NuBus: Bus mastering, block transfers, and ATA UDMA on 040 machines

After Q700/900, NuBus block transfers to RAM are were implemented, so bus mastering block transfers would be required for the "next" level of performance gain on an 040-class system. That next level of performance would come with markedly increased hardware complexity to support the new NuBus features, and the ATA/CF interface would probably need to support UDMA-class transfer modes in order to get data fast enough. Also a huge increase in complexity!

I'll probably experiment with design A, possibly with the option of bus mastering for my edification. I don't expect to get around to design B. Given that prototype cards can get away without being spec-compliant, I may just put a pair of CPLDs acting as the inverting bus drivers/registers/etc needed for NuBus on a prototype board to play with. Maybe a final product could result, maybe not... it's an experiment. But this is all well down the road :)

---------------------------

Current updates: I've finished hardware testing in all of my SE/30 machines with various combinations of IO boards and accelerators. No concerns. I need to double check IIsi but don't expect issues. I'm also working on final software testing. I still need to knock out a quick hardware tester/initial firmware tool so I can actually test cards in a streamlined fashion, but most of that's already written as that code was required to bring up the design in the first place.

1744691547492.jpeg

One of the tested scenarios :) one of my booster 2.0 accelerators, a NuCF PDS, and a 30Video HC (GS) on top. As always the chassis card supports need to be bent out of the way and insulated but that's all. I can't guarantee 3 card stacks will always work, but I expect this particular one to be kosher more often than not.

It is more difficult to remove the CF card than I'd prefer as it can only be gripped by the sides, but I think this layout remains the most sensible for the moment (on a standalone PDS card, anyways).

1744691642773.jpeg

I also tried to see if I could salvage my socketed NuCF board so I stacked it in my socketed carrera design. This did actually work, but it wouldn't tolerate any additional IO cards with the Carrera running. The socket NuCF forgoes the buffers for the CF (as it was intended for IIcx...) and the anemic drive of the Carrera's FPGAs aren't enough to hit the CMOS levels (required by CF) on a heavily loaded bus.
 

Fizzbinn

Well-known member
This might be an odd question but thinking about this incredible project and my past experience replacing the IDE hard drive in a PowerBook 150 with a CF card got me thinking about the similarities. My understanding is Apple essentially implemented an IDE (ATA) controller on the 68030 bus but I think it operates in PIO 2 (8MB/s) mode, while you are doing PIO 4 (16MB/s)? In theory could your card support and IDE PATA HD? have you essentially re-implemented (in a more performant way!) what they did back in the day to add "IDE" support to the Mac platform?
 

zigzagjoe

Well-known member
That one is complicated.

At maximum speed putting all else aside, the cycle time works out to 180ns on a 15.6672mhz bus. Or, 6 bus cycles per 4 bytes of data. This is leaving no time for instruction execution or storing data anywhere but a register. So that electrical signalling rate would be PIO-3 timings. In reality the effective transfer rate on SE/30 works out to 11 system clock cycles including the transfer to RAM (4 cycles), for a transfer rate of ~ 5.7MB/s assuming minimal delay before the card is ready to supply data (safe assumption).

On a 20mhz system clock (ie. IIsi/IIfx) the timings are a bit faster than PIO-3 at 150ns cycle time, and a 75ns IOWR pulse. By default this requires PIO 4 then, though realistically as the various PIO levels just guarantee certain minimum timings there's nothing wrong with running a bit slower (PIO 3-ish timings) since the card supports 4 (or beyond). So with the same 11 clocks to move 4 bytes of data from card to RAM that works out to ~ 7.3 MB/s max transfer rate.

With a more complex hardware design it might be possible to take one or two cycles off the CF access time, but that would very much be a diminishing returns situation given that bus cycles are cheap on PDS and the improved small IO performance is what really helps us. However, on NuBus where bus cycles cost more that 32 bit expansion makes more sense (ie. support longword reads in hardware).

This is why bus mastering (DMA) doesn't make a lot of sense in the 030: making idealistic assumptions (32 bit CF access in 4 cycles) this would lead to 8 cycle transfer rates or 7.8 MB/s assuming the DMA hardware didn't require any time to sequence the additional accesses. However, the CPU would be equally unable to do anything productive during sustained transfer and care would need to be taken that the card doesn't hold the bus long enough that IRQs start piling up. Meanwhile, on the Mac OS side just about the only thing productive that'd really be possible with that extra CPU time is mouse movement. The card's just tripled in complexity (or more), and the driver would have to be more complex with not much gain to show for it... so the PIO method of access is appropriate enough here, IMO.

IDE support: Conceptually since CF is a superset of ATA as long as the drive implements the basic commands my driver uses, and does not require IORDY, then it'd likely just work. Some timeouts would need to be increased as you can get away with ~ 1 second timeouts on a CF as typically if the card doesn't complete the action in 100ms it's never going to do so. But a HDD is a good deal slower, so the driver would need to change. There would be more sense in an interrupt-based design, too. CF cards are fast enough that there's minimal delay before it's time to read data, waiting ~100us to begin a transfer where each sector takes around 100us to transfer ... why bother relinquishing the CPU.

Practically I plan to stick an IDE connector on whatever prototype board I get made next so I can do some limited testing. No plans to go for a full implementation of IDE either electrically or software wise though, that'd be *a lot* of work.
 

Attachments

  • 1744857798195.png
    1744857798195.png
    169.5 KB · Views: 19
Last edited:
Top