• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Mac IIci & cache card questions

Unknown_K

Well-known member
Makes me wonder what I did with the OEM cache cards in my IIci's that have Daystars in them now.

How hard would it be to make a few 1-2MB cache cards? I assume the connector would be the most expensive part.
 

David Cook

Well-known member
Isn't that a 23% increase over a non-cache machine?

Indeed! My TLDR is a little too short. I have a more specific descriptor towards the end "Expect almost no boost from a 128KB IIci cache card compared to 32KB.".

So, yes, definitely worth installing a cache card in your bare IIci. In Norton System Info and my complier tests, it doesn't matter if it is the 32K or 128K version.
 

trag

Well-known member
Makes me wonder what I did with the OEM cache cards in my IIci's that have Daystars in them now.

How hard would it be to make a few 1-2MB cache cards? I assume the connector would be the most expensive part.

Cost of 5V static RAM adds up to a bit, I think. But really, the most "expensive" part is the skull sweat to figure out how to build the cache supporting circuitry. Although I think Zane Kaminsky may have already done the basic analysis. I seem to remember him discussing potential performance of IIci and some issues regarding playing with the 68030 bus and the necessary timings.

If one uses real TAG RAM instead of building it out of Static RAM + Fast Comparator, that gets pricey too. TAG RAM is getting close to unobtainium
 

rieSha.

Well-known member
Not if the whole program fits in cache.
It’s not that easy. If it does, and the program runs from the cache, all fine & speedy.

But what if the cache is dirty and has to be emptied? Mac OS isn’t that great in contest/multitasking application switching and I/O (netwok & disk), so it wouldn’t make me wonder if in certain daily workloads the 4 MB cache wouldn’t make a big difference. The 128k IIci cache wasn’t that much faster than the 32k one …
 

Jockelill

Well-known member
Since we're talking the IIcis, here's a fun tidbit. An easter egg exists in the IIci ROM (I seem to recall that many 68k macs had easter eggs). If the system date is set to September 20, 1989 (the machine's release date) and the ⌘ Command+⌥ Option+C+I keys are held during boot time, an image of the development team will be displayed.
 

Attachments

  • E06C4FED-1F95-4FD4-B532-4DC5DCC63A07.jpeg
    E06C4FED-1F95-4FD4-B532-4DC5DCC63A07.jpeg
    3.6 MB · Views: 61

Melkhior

Well-known member
In summary, expect almost no boost from a 128KB IIci cache card compared to 32KB.
Size isn't the only important parameter for a cache; latency plays a big role as well. If the 128 KiB cache has higher latency than the 32 KiB one, then it's performance profile will be different. Cutting down latency was one of the reason so-called 'tag ram' chip (containing not just the tag sram but also comparators and other support functions) such as the SN74ACT2155 were developed. TI had a whitepaper on the subject, Enhancing MC68030 performance using the SN74ACT2155 cache (paywalled, unfortunately), which goes into details into timing consideration and some (partial) examples of cache implementation.
 

Melkhior

Well-known member
How hard would it be to make a few 1-2MB cache cards? I assume the connector would be the most expensive part.
Euro-DIN (a.k.a. DIN 41612) connectors used by Apple during the 68000-68030 era still widely available, even if they are not cheap in small quantity. The 120-pins version for PDS is probably that one on Mouser; NuBus (and the LC PDS I think) uses the 96-pins version (NuBusFPGA uses that one).

But really, the most "expensive" part is the skull sweat to figure out how to build the cache supporting circuitry.
Probably. I've added a TI reference in a previous message; the description of the 2155 is also available in the 1990 TI Cache Memory Management Data Book. There's also a working implementation (with some features missing such as burst mode) as the cache part of German magazine C'T accelerator, the PAK68/3. The relevant articles and bit of PAL code were posted on TD.

Edit: BTW it's not just TI; e.g. IDT has an application note on the subject in the 1991 Static RAM Data Book, also for their custom cache chips.

Edit 2: in fact the IDT stuff seems more complete than the TI stuff - there's even some PAL equation for their version of a cache.
 
Last edited:

Jockelill

Well-known member
So, I finally got to do some benchmarking on one of my IIci. Since I have both a cache card (Apple 32k) and also a Carrera 040 I thought it would be nice to run some quick benchmarks. First picture is a stock IIci at 25Mhz, second is with just added cache card and last is with the Carrera (ok, different processor etc, but still interesting). Everything else was the same.
 

Attachments

  • DEC95053-CF55-4318-982F-B6AA5811B3DC.jpeg
    DEC95053-CF55-4318-982F-B6AA5811B3DC.jpeg
    5.2 MB · Views: 46
  • 06E650E0-8964-47BC-93B7-B15791855EB0.jpeg
    06E650E0-8964-47BC-93B7-B15791855EB0.jpeg
    4.9 MB · Views: 32
  • F66CBD63-0C05-4E2A-BC65-23236F29499B.jpeg
    F66CBD63-0C05-4E2A-BC65-23236F29499B.jpeg
    5.4 MB · Views: 41

MrFahrenheit

Well-known member
So, I finally got to do some benchmarking on one of my IIci. Since I have both a cache card (Apple 32k) and also a Carrera 040 I thought it would be nice to run some quick benchmarks. First picture is a stock IIci at 25Mhz, second is with just added cache card and last is with the Carrera (ok, different processor etc, but still interesting). Everything else was the same.

My experience with 68040 upgrades is that it’s usually about 20-25mhz equivalent speed 68040 machine when running at 40mhz accelerator card speed. Likely one of the reasons they used 40mhz upgrade cards.

That accelerator really made that IIci almost like a Q650.

The cache card adds noticeable improvements to benchmarks, but I wonder if you can actually perceive the speed difference. I know from experience I can tell about 10-20% speed improvement just from “feel”.

I have the 32kb cache card for IIci but haven’t tested them for speed yet (I’ve only run my IIci with the card installed).
 

David Cook

Well-known member
I've upgraded my cache detection algorithm to eliminate the loop bias that resulted in an upward sloping artifact.

False-slope.png

Now, there is an inner loop that tests 256 bytes at a time, and an outer loop that repeats that for a total of 32K reads and then checks if 1 second has passed. The buffer pointer just rolls over to the start of the buffer when it needs to. Thus, all sizes of cache testing now have nearly identical overhead. Previously, larger cache checks were artificially more efficient as the inner loop went the full buffer size (rather than consistently 256 bytes) before dropping to the outer loop.

Test conditions: System 7.5.5. 20 MB RAM (4 MB bank A, 16 MB bank B). Disk cache 128KB. Tests run on external MacSD. 640x480 8-bit color. Appletalk off. Pictured top to bottom: Apple Macintosh IIci Cache Card 820-0351-A (32KB), ATTO Technology Cache C1 (64KB), DayStar FastCache IIci (32KB), Micron Technology Xceed IIci-128 250-0324-010 Rev C (128KB).

Cache-cards.jpg
The new test results are below. There are 4 cards + stock being tested, each twice. But it looks like 5 only lines, because the runs for each setup are so consistent that the lines for each run nearly completely overlap on the graph. The lack of significant variation in each run suggests the differences between each card are real and not noise.

IIci-Cache-Detection-s.png

Ⓐ The 256 byte cache built-in to the 68030 processor is the most efficient, thus every run starts out high. The Y-axis indicates that the 25 MHz IIci is achieving 1.36 million memory reads per second.
Ⓑ All the external caches gradually become less efficient as the test buffer size fills more and more of the cache, thus competing more with code, system, and other usages. Still, size and brand impacts the rate of decline.
Ⓒ When the test buffer size exceeds the external cache size, overall efficiency drops below stock (no cache) performance. The sudden drop is due to the test being a linear read rather than random. In other words, in these conditions, the test loop is constantly asking for a non-cached value, which is then cached, but dropped (due to all the subsequent reads) from the cache before we try reading it again. These are constant cache misses.

Miss efficiency is worse than hit efficiency, such that memory intensive applications could theoretically run slower with a cache card. In practice, the fact that loops and common subroutines will be cached and repeatedly hit means real world performance will likely always be better with a cache card. This test is designed to detect cache sizes, not application performance..

I complied a commercial C program using Metrowerks CodeWarrior 5. It consisted of 45 files with a total of 46,000 lines of code. This is smaller than my previous tests because I didn't want to wait for 15 minutes for each run. This program took about 4 minutes to compile. It demonstrates an approximate 25% gain in a real world scenario.

Sample-cache-performance.jpg

- David
 

Corgi

Well-known member
Just wanted to shout out how great your benchmarking is, including the detail and repeatability. Much appreciated research methods.
 

David Cook

Well-known member
When I posted the cache results, it bugged me that the difference in cache vs non-cache performance was so little compared to real-world results. That is, roughly 1,340,000 bytes/sec vs 1,320,000 bytes/sec = only 1.5% improvement when dealing with 1KB of data. Yet, real world shows 25% gains.

Two reasons:
1. My test loop is small. It easily fits in the 256 byte code cache of the 68030. Real world programs bounce around and thus benefit from an L2 cache to quickly read that additional code.
2. My test reads a byte at a time and then loops. This is legitimate code (example: the strlen routine). However, behind the scenes, the 68030 on the IIci is actually reading 32-bits (four bytes) at a time. The first uncached byte does indeed incur a bus read from the memory chip. But, the next 3 bytes are returned from the processor cache because they had been read as well. Thus, only 1 out of every 4 reads exercises the external cache card or lack thereof.

What happens if I modify my test loop to read a full 32-bit value at time? Furthermore, what if it does this 8 times in a row per loop, to reduce the percentage of time taken by the loop check?

1701926353882.png

The IIci now reads over 6 times faster. More importantly, we see a 8,500,00 / 5,800,00 = 46% improvement due to caching. The real world is going be a mix of memory access sizes, so averaging around 25% makes sense.

Other interesting notes:
* The L2 cache is no longer worse than no cache on constant cache misses (for example at 1 MB).
* I can't explain the dip between 4KB and 8KB
* I can't explain the gradual improvement by the Micron board between 8KB and 64KB. I saw something similar with byte reads. I ran the tests multiple times and the results are consistent.

- David
 

zigzagjoe

Well-known member
When I posted the cache results, it bugged me that the difference in cache vs non-cache performance was so little compared to real-world results. That is, roughly 1,340,000 bytes/sec vs 1,320,000 bytes/sec = only 1.5% improvement when dealing with 1KB of data. Yet, real world shows 25% gains.

Two reasons:
1. My test loop is small. It easily fits in the 256 byte code cache of the 68030. Real world programs bounce around and thus benefit from an L2 cache to quickly read that additional code.
2. My test reads a byte at a time and then loops. This is legitimate code (example: the strlen routine). However, behind the scenes, the 68030 on the IIci is actually reading 32-bits (four bytes) at a time. The first uncached byte does indeed incur a bus read from the memory chip. But, the next 3 bytes are returned from the processor cache because they had been read as well. Thus, only 1 out of every 4 reads exercises the external cache card or lack thereof.

What happens if I modify my test loop to read a full 32-bit value at time? Furthermore, what if it does this 8 times in a row per loop, to reduce the percentage of time taken by the loop check?

View attachment 66358

The IIci now reads over 6 times faster. More importantly, we see a 8,500,00 / 5,800,00 = 46% improvement due to caching. The real world is going be a mix of memory access sizes, so averaging around 25% makes sense.

Other interesting notes:
* The L2 cache is no longer worse than no cache on constant cache misses (for example at 1 MB).
* I can't explain the dip between 4KB and 8KB
* I can't explain the gradual improvement by the Micron board between 8KB and 64KB. I saw something similar with byte reads. I ran the tests multiple times and the results are consistent.

- David

This is great data. I love seeing your deep dive posts on these.

I would be very curious if you want to release a quick version of this for some additional data gathering, as I've got a few scenarios I can test in a SE/30... while not directly relevant to the IIci platform, it'd be interesting info at least to me personally.

Specific scenarios I'd test would be DiimoCache (50mhz 68030 + 64kb), DiimoCache (58mhz 68030 + 128kb), Carrera (45mhz 68040 + 128KB), stock, perhaps with a few variations on caches on/off.

Also, are you doing something to invalidate the contents of the L2 between tests?
 

bigmessowires

Well-known member
Since we're talking the IIcis, here's a fun tidbit. An easter egg exists in the IIci ROM (I seem to recall that many 68k macs had easter eggs). If the system date is set to September 20, 1989 (the machine's release date) and the ⌘ Command+⌥ Option+C+I keys are held during boot time, an image of the development team will be displayed.

The person in the back row, second from the right, appears to be about 16 years old.

I wonder how Apple digitized that photo? We take it for granted today that photos are digital files, but what options existed for color photo scanning in 1989? I remember the ThunderScan, which turned an ImageWriter into a scanner, but I think it was limited to black-and-white scans.
 

Phipli

Well-known member
The person in the back row, second from the right, appears to be about 16 years old.

I wonder how Apple digitized that photo? We take it for granted today that photos are digital files, but what options existed for color photo scanning in 1989? I remember the ThunderScan, which turned an ImageWriter into a scanner, but I think it was limited to black-and-white scans.
If... I was told I had to make a colour scan and only had a B&W scanner, I'd sandwich a Red, Blue and Green film on top of the photo one at a time and scan it three times, then combine them as colour channels and tweak them because the intensity would probably be wrong because I'd grabbed some coloured gels from a mate at the theatre...

But other people might have had other ways.

It would be an interesting thing to try just as a curiosity project.

Edit :

Another thing you could do would be develop colour negative three times through filters (the colours wouldn't be correctly red, green and blue because it is a negative) using a black and white process. You'd get three photographic prints, one for each colour. Then scan them and merge them as before as the three colour channels. This requires more specialist equipment, but anyone doing black and white photography at home would have the needed kit (you'd still need to get the colour film developed normally).

The first method is easier.
 
Last edited:

Melkhior

Well-known member
I wonder how Apple digitized that photo?
Maybe simply with a scanner as today, except likely SCSI. Those were available in B&W and color, though color was expensive.

MacUser from Dec'88 has an advertisement for the JX-450 from Sharp at the low, low, price of ... "call" :) Other ads offer various scanners from several hundreds to several thousands dollars.
 
Top