Mac SE/30 to SCSI chip pinouts

phreakout · Oct 3, 2012

Hey, everybody. While I'm in the process of repairing an SE/30 logic board (SCSI failure), I figure now or never would be a good time to post here for reference of this kind of data. Basically, which pins on the NCR/Zilog 53C80 SCSI Controller Chip link up to where on that logic board. I've included my notes for all of this in 3 jpeg files.

If you have any questions, feel free to send me a PM. Mods, feel free to sticky this. I'm hoping this will be helpful to those of us doing repair to that model board.

73s de Phreakout. :rambo:

tt · Oct 4, 2012

Speaking of the chip, would it be possible to swap it out with another faster component to boost SCSI spends, or is it dependent/restrained by other parts of the system?

techknight · Oct 4, 2012

yes, BUT unless you modify ROM to contain a new SCSI driver to handle the new IC, you would have to use an IC that had the same command and control set as the original.

Best way would be to emulate a 53C80 in an FPGA.

tt · Oct 4, 2012

I was thinking maybe there's a faster version with the same package/protocol. Emulation sounds interesting though.

phreakout · Oct 4, 2012

I agree with Techknight. However, it shouldn't matter what SCSI chip you use as long as you adapt its pinouts to what is assigned to the original. The only possible limitation would be the fact that you're dealing with 8-bit data transfers between the chip and logic board, plus the speed may be regulated by the internal clock crystal. You could weird-wire a socket of choice and stick a new chip in there, but I doubt you'd notice that much of a speed boost.

What would be nice is if some of you out there could do the following: Open up your retro-mac, locate the SCSI controller chip and list the chip info on top. We all know that the NCR/Zilog 53C80 is common amongst most 68k Macs, but I'm not sure what type of chip was used on most SCSI only based PPC Macs (pre-G3). I'll check my PM7500 and see what it is. Although, I don't care much for mine, since I just use a PCI Adaptec AHA-2940-U2B card and an 80 GB U160 LVD/SE HDD.

73s de Phreakout. :rambo:

olePigeon · Oct 4, 2012

yes, BUT unless you modify ROM to contain a new SCSI driver to handle the new IC, you would have to use an IC that had the same command and control set as the original.

Well, then, sounds like someone needs a special little ROM SIMM. :lol: Arrr!

trag · Oct 4, 2012

What would be nice is if some of you out there could do the following: Open up your retro-mac, locate the SCSI controller chip and list the chip info on top. We all know that the NCR/Zilog 53C80 is common amongst most 68k Macs, but I'm not sure what type of chip was used on most SCSI only based PPC Macs (pre-G3). I'll check my PM7500 and see what it is. Although, I don't care much for mine, since I just use a PCI Adaptec AHA-2940-U2B card and an 80 GB U160 LVD/SE HDD.
73s de Phreakout. :rambo:

You won't find any useful information in the PM7500...

I'm not sure when Apple made the switch, but at some point they went from the 53C80 to the 53C96. Most of the Quadras seem to have a 53C96. I asked and someone wrote that the firmware for the 53C96 is fairly different from that needed for the 53C80. Different register addresses and funtions I guess. I've never looked into it myself.

The Q660AV and Q840AV do not have a distinct SCSI chip. They use the AMD AM79C950K. That is code named CURIO by Apple. It is a combination ethernet, serial port, SCSI chip. It basically combines the ethernet MAC, 85C30 serial chip and the 53C96 or 53C94 SCSI chip into one component. I have never found a datasheet for it.

The NuBus PowerMacs contain the CURIO chip. The PM8100 uses a NCR/SYMBIOS/LSI LOGIC 53CF96 for its second Fast (10 MBPS) SCSI bus. Note the 'CF' in the part number as opposed to 'C'.

The PCI Power Macs still use the CURIO chip for the slow internal/external SCSI bus, but they now use a custom Apple chip for the Fast internal SCSI bus. The code name for that custom chip was MESH, but I suspect that it is just the IP for the 53CF96 put onto silicon under Apple's name. I'm not certain of that, but it seems a fair bet. They already had the code to support the 53CF96 in the 8100. Why develop a different custom chip for the PMx500 family?

After that, while SCSI was still around, it was usually integrated into a larger chip.

The 53CF96 is hard to find and usually expensive when available. Still, it would be interesting to replace the drivers in ROM for the 53C96 with the drivers for the 53CF96, do the chip swap on the logic board (Quadras, I guess) and see if you can double the SCSI performance.

If doing that, I would start by looking in the ROM of the 8100, because it needs to have drivers for both the CURIO's 53C94/6 and the 53CF96. A comparison of the two might be informative. Unfortunately, they may also be in PPC code instead of 68K code. Still, one could compare the 8100 CURIO 53C94/6 driver with the Q840AV CURIO 53C94/6 driver for reference.

That sounds like a lot of work for a not very big improvement in storage performance.

Gorgonops · Oct 4, 2012

Just to note:

When paired with a DRAM controller typical of the period the 68030 usually takes about five or six clock ticks to read a 32 bit quantity from RAM. If we assume 5 clocks then the maximum memory bus performance of the SE/30 is: 16Mhz X 4 = 64MB / 5 = 12.8MB/s. (It might be interesting to write an assembly benchmark to test this, I got these figures from the datasheet of a DRAM controller used in some contemporary computers.) So even if you could interface, with zero wait states on a full 32 bit wide bus (the 53C80 is an 8 bit chip) a fast/wide SCSI controller the SE/30's RAM would be the bottleneck during sustained reads from cache. (And this is assuming the controller's using DMA, rather than depending on the CPU to do a load/store/write operation for each byte/word. Does the SCSI driver implementation in the SE/30 even support DMA? Pretty sure most of the early Macs used PIO.)

The theoretical speed of the SCSI chip in the SE/30 is about 3MB/sec. I don't know exactly how fast the bus that Apple hung it off of is, but honestly, I'd be surprised if the SE/30 could even saturate that 3MB/sec running flat out. Don't really see the point of trying to sub something faster there.

techknight · Oct 5, 2012

well, id rather have SCSI working at all than any performance.

Seeing that RAM is a bottleneck, plus some other design limitations as mentioned. Performance would be squeezing the lemon a little bit.

But i think the target is replacing dead SCSI chips. Assuming they are dead. I guess they can die. i haven't found one yet. But anyway, i guess use a newer chip and fix the ROM, or use an FPGA to emulate the 53C80. Only 2 options i see.

Gorgonops · Oct 5, 2012

The NCR 53C80 was a really common chip, somehow I can't imagine it being *that* hard to source a replacement. It showed up in PC SCSI controllers fairly often in addition to Macs. Granted I haven't tried looking for one. (All the oldskool ICs I've purchased recently have been *really* widespread parts like small SRAMS and 6(5/8)00 series CPU and PIO chips.)

phreakout · Oct 5, 2012

Personally, I'm all for reliability over speed. But if you can squeeze out more MB/s of speed, more power to you.

No, what I've posted is for reference. The whole idea is that we can confirm that the chip was bad or put the blame on badly corroded pads/traces. I say, if the latter, you can save yourself the need to swap chips by simply recreating the paths affected.

Now, some of you are probably wondering how I was able to map all that out. Well, I had help. Someone, a while back posted somewhere on this forum the complete schematics of the SE/30. So that became my reference after I saved a copy for myself. It seemed difficult, but in fact, it's not that bad; there are listed descriptions of which and how many pages are linked together for each connection. But I still took the time to verify using a continuity tester and probing around on a board.

Incidentally, if interested in a copy of the schematics, send me a PM.

73s de Phreakout. :rambo:

trag · Oct 5, 2012

The NCR 53C80 was a really common chip, somehow I can't imagine it being *that* hard to source a replacement.

It has become old enough, that it is getting more difficult. A quick check on Ebay shows an effective price of about $30/chip.

http://www.questcomp.com has them for about $10 each (a little more for 1, a little less for more) and in stock. That's for the 44 pin PLCC. They show the DIP as in stock (5 and 8 in two categories), but it has a request a quote next to it.

I have a bunch of 53C96 on hand, which are NOT pin compatible with the 53C80. And by a "bunch" I mean 660. But they're in a sealed brick, and I'm not willing to break it open unless someone has a seriously cool use for them going. I doubt there is much use for them. They won't make a substantial improvement over the 53C80, as they're the C, not the CF 53C96. I wish I had lucked into a brick of 53CF96...

bbraun · Oct 7, 2012

When paired with a DRAM controller typical of the period the 68030 usually takes about five or six clock ticks to read a 32 bit quantity from RAM. If we assume 5 clocks then the maximum memory bus performance of the SE/30 is: 16Mhz X 4 = 64MB / 5 = 12.8MB/s.

I can't claim to be authoritative on this, but I've been playing around with the 030 timing diagrams, reading a bit, and wrote a little test code, and here's what I've found:

Guide to the Macintosh Family Hardware says:

In the Macintosh SE/30, a card in the 68030 PDS can access system RAM and ROM at the same rate as the main processor: 15.67MB per second.

AFAICT, that corresponds to 4 clock cycles per 32bit read, which seems to match the memory access diagrams I've been looking at.

And that assumes all aligned accesses.

I wrote some test code earlier this week which is doing RAM to RAM copies at slightly over 5.2MB/s. Since that's 2 ram accesses, that's about 10.4MB/s throughput. The test flushes the data cache and does 8 32bit copies, of the form:

Code:

move.l (a0)+,(a1)+

looping every 256KB. At the end of 256KB, it flushes data cache (probably inadvertently blowing the instruction cache in the process by calling the _FlushDataCache trap rather than manually manipulating the CACR), and repeats the loop 1000 times.

I might go back and refine the test a bit and see if I can account for the missing 1/3rd of the bandwidth, but if the instruction cache is blown and the instructions need to be refetched, combined with looping and cache flushing overhead, it seems approximately in the right ballpark.

But, this might be pretty close to real world performance, since this is pretty much what the BlockMove() OS trap does (BlockMove has some extra code to handle unaligned access, 24 vs 32bit addressing, etc.), which is used most everywhere as the "fast" memory copy. Unfortunately, BlockMove() flushes the instruction cache for compatibility with 68000 code, since it's used to move CODE (and other executable) resources into memory and executed. They added BlockMoveData() as a way to indicate it's just data, not executable code, but adoption was not ubiquitous, and thanks to abstraction layers, you might not know it's data, so have to err on the side of caution. Plus, unaligned accesses.

Gorgonops · Oct 8, 2012

Guide to the Macintosh Family Hardware says:

In the Macintosh SE/30, a card in the 68030 PDS can access system RAM and ROM at the same rate as the main processor: 15.67MB per second.

Click to expand...

AFAICT, that corresponds to 4 clock cycles per 32bit read, which seems to match the memory access diagrams I've been looking at.

And that assumes all aligned accesses.

I had a gut feeling the SE/30 might be a *little* faster than the times printed in that DRAM controller datasheet. The PDF had a table in it detailing the cycle times and the number of necessary wait states for various speed DRAMs at different CPU clock speeds, and the DRAM speeds in said table topped out at 150ns. The SE/30 specifies 120ns or faster, so... there you go, one less wait state. In any case, the difference between 12Mb/sec and 15Mb/sec doesn't much change the result that you couldn't realistically expect a faster SCSI chip to make a material difference in the performance of an SE/30. "Low End Mac" had some benchmarks claiming to have gotten about 2MB/sec with a modern drive; I'd wager that said drive was probably keeping the entire data set of the benchmark in cache so the benchmark's 2MB/sec was basically a measure of either how fast the SE/30 can work the 8 bit bus the SCSI controller is sitting on or how quickly it can execute the "read sector" loop in the SCSI driver. (or some combination thereof.)

It is slightly amusing to note that the theoretical performance of the SE/30's RAM is identical to that of a PIO Mode4/UDMA Mode 0 ATA controller. Hang a 1998-vintage bus-mastering IDE controller off the Mac's PDS slot and write a *really* tight DMA-utilizing driver for it and you'll have virtual memory with the same sustained read performance as RAM. (Obviously random access latency would be higher.)

bbraun · Oct 8, 2012

For kicks, here's some code to copy memory from one location to another:

Code:

BigLoop:
       move.w count, d2
       move.l src, a0
       move.l dst, a1
SmallLoop:
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       move.l (a0)+, (a1)+
       dbra d2, SmallLoop

       dbra d1,BigLoop

Each of the memory copy moves is 8 cycles (the (aX)+ is 4 cycles each) * 8 moves = 64. The dbra is 10 cycles. 74 cycles for each run through the inner loop. I run through that 8192 times for 256KB, which should be 606208 cycles for the inner loop. Plus 10 cycles for the outter dbra, 16 for the initializations, and I run through the outer loop 1000 times for 250MB, which should come out to 606234000. The SE/30 has a 15.6672MHz clock, so the 250MB memory copy should take a theoretical 38.69447 seconds, running with data cache disabled, and all instructions running out of the instruction cache. Note that about 5 seconds of the almost 39s of theoretical running time is spent just in the dbra looping construct.

I tried running with interrupts disabled, and pulling time from the RTC before & after in order to measure time, but I couldn't get that working. So, I'm running with interrupts enabled, which includes the 60Hz timer for the Vertical Blanking tasks which include updating the global Time variable, updating mouse cursor location global variables (and if the mouse moves, updating the displayed cursor), etc.

The code above, with wrapper timing code, ran in approximately 42s for me, which would be almost 6MB/s RAM to RAM copy, or a total memory bandwidth utilization of about 12MB/s. Which doesn't seem too bad considering setup overhead, instruction fetch, looping overhead, and interrupt handling. The theoretical for the code being run is about 1MB/s more total utilization.

Anyway, that's just for kicks. SCSI's performance is pretty well capped due to SCSIMgr if nothing else. Then there's the plethora of drivers depending on how you formatted the drive, and aside from some cursory benchmarking, I don't think anyone has disassembled the various drivers to see what they're actually doing and what the various performance limitations are.

For performance related projects going forward, avoiding SCSI if for no other reason than to get away from SCSIMgr and the driver problems, seems to be the way to go. Unfortunately, all of the logic for partitions, which are required for making larger media useful due to filesystem limitations for the earlier machines, is wrapped up with SCSIMgr. It would be possible to implement a partitioning scheme with whatever driver ends up being written to handle newer interfaces, it's just more to do.

But that's just performance, which is nothing more than an interesting intellectual exercise given the technology found in dumpsters these days is faster than anything you'll get out of an SE/30.

trag · Oct 10, 2012

I guess these old processors aren't pipelined at all?

So a simple memory access takes a cycle to fetch the instruction (if it's in cache and not in RAM) and a cycle to decode, and a cycle to put the address on the address bus and a cycle to receive or write the data, or something like that? And maybe more than one cycle to fetch the instruction if it's a multi-word instruction, perhaps with a 32 bit address embedded in it?

Is that close to how it's working?

I'm so used to thinking of a memory access taking one cycle these days, that I've always considered the SE/30 as having a theoretical bandwidth of 32 bits X 16 MHz = 64 MB/s. So wrong. Sigh.

Gorgonops · Oct 11, 2012

I guess these old processors aren't pipelined at all?

Technically the 68030 *is* pipelined. (Going by the definition of "pipelining" meaning that the CPU is divided into functional units which can operate somewhat asynchronously with each other and execute certain tasks in parallel.) Even the 68000 is "slightly pipelined", in that it has the ability to prefetch the next instruction and decode it while the execution unit it is processing the current one. (The 68030's reference manual covers the details regarding it's pipelining; skimming, the short version seems to be that it prefetches and decodes up to three instruction words ahead, and those fetches can be coming from the internal instruction cache which also works independently.) Pipelining doesn't necessarily imply "Superscaler" (actually *retiring* more that one instruction at once) nor does it say anything about how many cycles a given step in the process takes. The 68040/80486 generation were the first consumer 32-bit CPUs that could execute a "typical" instruction in a single clock... not counting Acorn's ARM, of course.

(Even the lowly 8086 had some ability to run the bus interface and the execution unit in parallel in order to streamline processing of multi-word instructions, although this was somewhat hamstrung by the fact the CPU used the same ALU hardware used by the execution unit to do effective address calculations, meaning the "pipeline" would stall on every non-immediate memory access. The 80286 solved this problem, which was why it was four to six times faster than the 8086 clock-for-clock for some operations, making it arguably the biggest single performance leap in x86 history.)

I'm so used to thinking of a memory access taking one cycle these days, that I've always considered the SE/30 as having a theoretical bandwidth of 32 bits X 16 MHz = 64 MB/s. So wrong. Sigh.

Reading the application manual it looks like in theory the very fastest you can do a single 32 bit word memory transfer with the 68030 is in two cycles; manual says at 16Mhz you'll need ~45ns SRAM. The CPU also supports a "Burst Mode", *only* used for filling the onboard caches, that with sufficiently fast RAM takes 2 cycles for the first word and one cycle for each of the next three words. (Manual says you'll need 35ns or better SRAM, and you'll have to arrange it essentially as a 128 bit "wide" array that takes a single address at the start of the cycle and auto-increments for each subsequent read cycle.) I'm guessing Apple's bus simply ignores the burst modes and ties everything down to the maximum timings the DRAM controller can handle. So... if you were to re-engineer the SE/30 from *scratch* you could potentially push the numbers up to the 32MB/sec ballpark if you really, really wanted to. But if you look at the instruction timing tables you'll see that is pretty literally squeezing blood from a stone.

Look on the bright side: For 1987 a ~15Mb/sec memory bus was respectable, and was about the limit of DRAM technology at the time. The first 68040/80486-class machines didn't have much faster DRAM to work with, which was why caching was so important in the early 90's timeframe. (running from cache those machines could run rings around a 68030 but for a bulk memory copy it's all about the DRAM controller, and with 70-80ns RAM still typical you still might only be getting somewhere in the high 20's MB/sec without interleaving.) Context is everything when judging those machines.

Gorgonops · Oct 11, 2012

For performance related projects going forward, avoiding SCSI if for no other reason than to get away from SCSIMgr and the driver problems, seems to be the way to go. Unfortunately, all of the logic for partitions, which are required for making larger media useful due to filesystem limitations for the earlier machines, is wrapped up with SCSIMgr. It would be possible to implement a partitioning scheme with whatever driver ends up being written to handle newer interfaces, it's just more to do.

If someone *were* to someday graft an IDE (or flash) memory device directly onto an old Mac via an expansion port/CPU piggyback type arrangement my ignorant suggestion for how to support it would be via a replacement .Sony floppy driver. That was actually the approach taken by the HD-20, and it's also used in Mac emulators like BasiliskII and vMac. (In those cases the replacement .Sony is patched into the ROM image, thus allowing direct booting of large volumes as if they were "floppies".) You skip a lot of SCSI-related whohaw doing that, and the code seems to be fairly well understood.

trag · Oct 12, 2012

Thank you for the explanation, Gorgonops.

Is that burst mode which fills the cache a programming choice? In other words, is that an instruction which the programmer chooses to use or not use? Or is it something the CPU does automatically, when the cache needs filling -- presumably with some provision for memory systems slower than 35ns to signal that the CPU will just have to wait for the next data element.

I ask, because the interesting question is whether the burst mode is ever used by the 68030 in the SE/30. If it tries to use the burst mode to fill the cache, and the SE/30's bus responds with some kind of signal that it's going to get things slower than that, then that is something we could work with.

But if the burst mode just isn't in the code at all, then using it would require rewriting the whole Mac OS and possibly the applications as well. And that's really not very practical.

In the former case, it might be possible to install a fast memory system (FPGA to DDR2 DRAM at 333 MHz) in the PDS slot and then somehow (PMMU?) map the RAM accesses to address space decoded in the PDS slot by the FPGA.

Re: SCSI Manager. I don't know how useful this is, but in case it has some use, I think, start the thread here:

http://68kmla.org/forums/viewtopic.php?p=61401#p61401

Mac SE/30 to SCSI chip pinouts

phreakout

Well-known member

tt

Well-known member

techknight

Well-known member

tt

Well-known member

phreakout

Well-known member

olePigeon

Well-known member

trag

Well-known member

Gorgonops

Moderator

techknight

Well-known member

Gorgonops

Moderator

phreakout

Well-known member

trag

Well-known member

bbraun

Well-known member

Gorgonops

Moderator

bbraun

Well-known member

trag

Well-known member

Gorgonops

Moderator

Gorgonops

Moderator

trag

Well-known member

Similar threads