• Hello MLAers! We've re-enabled auto-approval for accounts. If you are still waiting on account approval, please check this thread for more information.

Maximum SiliconExpress IV Throughput

eharmon

6502
Continuing my SiliconExpress experimentation, I've been doing some benchmarks on my Quadra 650 board with ZuluSCSI.

Read
Native SCSIZuluSCSI (RP2040)4,700KB/s
SiliconExpress IV (8-bit) - 1.6.5ZuluSCSI (RP2040)8,123KB/s
SiliconExpress IV (16-bit) - 1.6.5ZuluSCSI Wide8,959KB/s
Write results are generally ~30% slower.

I need to try the SCSI 4.3 firmware to see if it's any different.

Wombat boards have a NuBus implementation that leaves a bit to be desired (no double data rate transfers @ 20MHz except between cards), which TIL 9305 implies should give you 8-10MB/s to the logic board and a theoretical 20MB/s out of the logic board (if the destination device could accept block transfers at zero wait). So that seems ballpark to what we're getting.

Still, surprisingly rough! So it seems on the earlier Quadras there's only a small boost from moving to 16-bit (~10%).

The official documentation always seemed ambiguous to me if later devices really supported 20MHz transfers to the logic board. Has anyone benchmarked a Quadra AV or 6100/7100/8100 and a SiliconExpress?
 
I benched spinning disks ages ago on Jackhammers and SEIV (I have SEII's and other Nubus SCSI cards as well). There should be results from back then on the forum if you search.
 
An interesting data point - unfortunately I didn't document much of it - I did exactly what you've done here a couple of years ago with a SCSI2SD v6. What stands out to me is that I got the same results as you through the SEIV, but my native reads were significantly slower than yours at ~2800

I do have an 840av now and still have the SEIV, but I have little spare time for this kind of thing right now...

Here are my old threads on the topic, someone else posted some Q950 and PM8100 benchmarks in the SEIV one:


 
An interesting data point - unfortunately I didn't document much of it - I did exactly what you've done here a couple of years ago with a SCSI2SD v6. What stands out to me is that I got the same results as you through the SEIV, but my native reads were significantly slower than yours at ~2800

I do have an 840av now and still have the SEIV, but I have little spare time for this kind of thing right now...

Here are my old threads on the topic, someone else posted some Q950 and PM8100 benchmarks in the SEIV one:


Interesting, so from those benchmarks, you can't really break the 10MB/s barrier on an 8100 either. I also noticed the same dip with a PowerPC card in the Quadra.

The RP2040 Zulu is quite a bit faster than the SCSI2SD v6. I wonder if the card's read ahead cache makes up for lower transaction performance, which is why you see a bigger jump on the SCSI2SD.

I want to run three more benchmarks:
  • Run the Q650 @ 40MHz. It won't speed NuBus but it will tighten memory performance which might squeak a little more speed out. Doubt it makes much of a difference.
  • Switch to the SCSI 4.3 firmware. Theoretically this improves burst performance as we're less bottlenecked on the CPU. Again, I doubt it makes much of a difference, but maybe it brings back PPC perf.
  • Assign the card to a Radius Rocket, which should allow direct NuBus 90 transfers. The chip on the SE IV is definitely capable of 20MB/s, so if anything can do it, that should.
 
Interesting, so from those benchmarks, you can't really break the 10MB/s barrier on an 8100 either. I also noticed the same dip with a PowerPC card in the Quadra.

The RP2040 Zulu is quite a bit faster than the SCSI2SD v6. I wonder if the card's read ahead cache makes up for lower transaction performance, which is why you see a bigger jump on the SCSI2SD.

I want to run three more benchmarks:
  • Run the Q650 @ 40MHz. It won't speed NuBus but it will tighten memory performance which might squeak a little more speed out. Doubt it makes much of a difference.
  • Switch to the SCSI 4.3 firmware. Theoretically this improves burst performance as we're less bottlenecked on the CPU. Again, I doubt it makes much of a difference, but maybe it brings back PPC perf.
  • Assign the card to a Radius Rocket, which should allow direct NuBus 90 transfers. The chip on the SE IV is definitely capable of 20MB/s, so if anything can do it, that should.
The earlier rockets don't support 20mhz transfers; the extra pins to support 2x transfers aren't wired. Rockets all support block transfers and can perform block transfers between cards (even if the host doesn't support block transfers) but it's at the standard 10mhz rate. Maybe the later Stage II Rockets support 2x. You might check the SCSI card too - if those pins aren't wired, then no 2x transfers.
 
Not in front of me right now, but I was able to get ~18MB/sec on a Quadra 840AV with a 10kRPM 16-bit drive attached to an SEIV. I think the 8100/100 and 8100/110 will perform within ballpark. The 6100, 7100, and 8100/80 have an older version of the NuBus controller IIRC which may inhibit performance.
 
The earlier rockets don't support 20mhz transfers; the extra pins to support 2x transfers aren't wired. Rockets all support block transfers and can perform block transfers between cards (even if the host doesn't support block transfers) but it's at the standard 10mhz rate. Maybe the later Stage II Rockets support 2x. You might check the SCSI card too - if those pins aren't wired, then no 2x transfers.
Yeah, I'm gonna try a Stage II. The docs claim they're wired for 20MHz operation, but I'll take a look at the board!
Not in front of me right now, but I was able to get ~18MB/sec on a Quadra 840AV with a 10kRPM 16-bit drive attached to an SEIV. I think the 8100/100 and 8100/110 will perform within ballpark. The 6100, 7100, and 8100/80 have an older version of the NuBus controller IIRC which may inhibit performance.
Interesting. So maybe they really are faster, or maybe there's an interaction between the SE IV and a ZuluSCSI Blaster. There's a number of improvements in later machines that could definitely explain performance improvements, but it's rather ambiguous:
  • The Quadra AVs have a MUNI controller and the x100 a BART, which should be a an updated version, implying the x100's should be more capable but...you never know.
  • The Quadra AV Developer Note notes "faster data transfer rates to and from the CPU bus" and "NuBus '90 transfers between cards at a clock rate of 20MHz". The x100 Developer Note notes that BART supports "transferring one-cycle or four-cycle transactions".
  • I haven't found a copy of the 8100/110's Developer Note, but the 8100's spec sheet claims "Three internal NuBus expansion slots; the 8100/110 also supports higher-performance burst mode between NuBus cards". Calling it out implies something new, but it still ambiguously claims the performance boost is only "between cards".
  • For all these newer machines, there's quite a few DMA improvements as well which could reduce overhead.
It's pretty bizarre how poorly documented it is. While PCI was on the horizon, you'd think better NuBus would have been something worth marketing! I'm inclined to believe the AVs and x100 are faster (and maybe 8100/110 even more).
 
Very interesting! How much does the maximum throughput affect user perception? That is, how much does it shave off of boot time or launching an application?

I don't know enough about SCSI, but can you make a call asynchronously to one drive, and then to another drive, and when the results are ready from either drive it will ask for attention? Or is the bus reserved until the first drive returns its result?
 
Very interesting! How much does the maximum throughput affect user perception? That is, how much does it shave off of boot time or launching an application?

I don't know enough about SCSI, but can you make a call asynchronously to one drive, and then to another drive, and when the results are ready from either drive it will ask for attention? Or is the bus reserved until the first drive returns its result?
I haven't timed it (someone else might have stats), but it's definitely noticeable. Not as much as the low latency from solid state, but it helps large programs load.

SCSI is pretty complex. There's interactions between async bus communication, async drivers, and bus disconnect. I'll go first so someone can correct me 😃:
  • Async bus is slower overall but frees the processor. This isn't really valuable when you have a SCSI ASIC as it's already offloaded the processing.
  • However, you really want an async driver, since the OS will wait for data to return regardless of bus communication on a sync driver. I believe this was introduced in SCSI Manager 4.3. Even on a sync bus, the ASIC handles the interaction and DMAs the data back, so you get the sync performance without drawback.
  • This gets a little funky with the SE IV, as the older (non-4.3) firmware directly drives the card over NuBus can handle this itself (it's not a SCSI driver at all). I'm not sure how it behaves currently.
  • Finally, devices normally hold the bus while they're processing transactions (as they remain selected). This means you'll get slower speeds if you have a number of devices busy on the same bus as they conflict. For instance, if you're emulating a few drives at once. Disconnect support means a drive can detach from the bus while it's operating to allow a transaction to be sent to another drive. I believe this is a prerequisite for async operation on 4.3.
  • 4.3 is only in ROM on Quadra AV and newer (or PPC upgrades). However, it's in the OS on 7.5+ and supports older devices, and was provided as an extension for developers which you can copy back to 7.1 (or maybe even 7.0). Devices need to be formatted with both a "regular" and 4.3 driver, and it'll swap over while it boots.
  • Since 4.3 added other features (like 16-bit SCSI IDs), this means older devices can't boot from anything above 7, though I believe the other drives will mount up once the OS starts.
I think I went on a tangent there...but it's useful stuff to know for boot perf!
 
Very interesting! How much does the maximum throughput affect user perception? That is, how much does it shave off of boot time or launching an application?

I don't know enough about SCSI, but can you make a call asynchronously to one drive, and then to another drive, and when the results are ready from either drive it will ask for attention? Or is the bus reserved until the first drive returns its result?
Throughput doesn't help much for user perception. You can compare it to SSDs today: faster random read access with flash storage is by far and away most important for user experience so even eMMC is perceived to be much faster than a HDD. In a vintage context, throughput was really only necessary if you needed to ingest (or output) a lot of data quickly for realtime video / audio production or needed increased storage capacity.

SCSI Manager 4.3 supports aynchronous access and AFAIK adds command queuing too, given drive support. So, to your question, possibly, if everything was fully implemented on both the application and driver side. There's a dev note on SCSI Manager 4.3 that you might find interesting. In the real world, you'd want to try to avoid contention by keeping your boot/application disk on the system bus and put the data drives only on the fast bus.

While I was working on NuCF, I had a test case with slow timings to slow down maximum sequential throughput to about the same as the internal SCSI. Even with that in place it was still perceptually much faster due to the improved random access performance on the CF and bypass of the slow SCSI stack. Most user code is going to be requesting random small bits of data rather than big chunks where the sequential performance would really pay off / be noticable.
 
Interesting. So maybe they really are faster, or maybe there's an interaction between the SE IV and a ZuluSCSI Blaster.
I think it's owed to an architectural improvement... the same drive/SEIV on a Q950 reached only about 8MB/s (closer to 9MB/s when overclocked to 40-45MHz or running a 100MHz DayStar 601.)
 
Devices need to be formatted with both a "regular" and 4.3 driver, and it'll swap over while it boots.

Wow. I assumed there would be some flags or a driver call that would ask "Hey, can you run in 4.3 mode".

since the OS will wait for data to return regardless of bus communication on a sync driver.

Ok. That's the heart of my line of questioning. If the OS is really waiting, then a faster bus will make more of a difference to the user's perception, as the bus response time is a bottleneck. Whereas if the OS is able to do other things in parallel, then those a faster bus will only matter somewhat, as theoretically the CPU is busy doing other stuff.

Imagine a file being unstuffed. If the the decompression algorithm can happen in parallel to the SCSI writes, then faster SCSI writes may not greatly improve overall time. However, if the OS effectively waits the CPU until SCSI responds, then faster SCSI would make a big difference.

Thank you for the detailed response.
 
Wow. I assumed there would be some flags or a driver call that would ask "Hey, can you run in 4.3 mode".
FWIW, a driver can support both in a single binary, but it's basically two drivers in one. One old API, one new API. IIRC 4.3 drivers are always required to be backwards compatible (or at least, the documentation instructs you to).

If you have an Apple_Driver43 partition, a 4.3-supporting driver is installed.
 
Back
Top