[Development] 4MB 32-bit SRAM for the MicroMac Performer

Hi everyone,

I want to adapt and expand the logic of the TS Mercury RAM expansion board into an SRAM controller. My goal is to ensure a 4MB 32-bit SRAM bank works correctly with these accelerator boards. At the same time, I am exploring if the Accelerator/SRAM logic could handle a 31.33 Mhz clock.

If I succeed, then the MicroMac Performer logic could be upgraded, and we would have an open-source accelerator for the Plus/SE with 4MB of fast RAM hopefully working at 31.33 MHz with a 33MHz 68030 CPU!

To keep things simple and robust, the controller operates in asynchronous mode (just like the original RAM expansion board), using three full clock cycles for data access and, hopefully, 0-wait states. This is my first attempt at recreating an SRAM controller for these accelerators, so I’m applying everything I’ve learned since I started fiddling with them!

The main logic is split across three PLDs:

RU6: Primarily an address decoder, nearly identical to the original logic.
RU7: Handles SRAM access using standard 68030 dynamic bus sizing logic.
RU8: Functions as the state machine for the 68030 bus handshake.

I’ve drafted the logic equations for these three chips and would love a "sanity check" from the 68030 architecture gurus here.

RU6: Address Decoder
This chip maps the accelerator's local SRAM and isolates it when the CPU attempts to access the video buffers on the system RAM.

DBFR_OE: This signal controls the 74245 data buffers, asserting LOW to enable the SRAM bus path.
  • Video Buffer “Hole”: The logic monitors A23-A15 and FC0/FC1 to detect the address range of the Mac Plus video buffer (the top 32KB of the first 4MB space).
  • RAM Routing: When access is detected within the $3F8000 - $3FFFFF range, the SRAM buffers are disabled and access is routed back to the main accelerator logic to handle the motherboard RAM. It also ignores Function Codes associated with interrupts or non-standard states to keep the bus clean.
RU7: Byte Lane Selection
RU7 maps the 32-bit bus to four 8-bit SRAM banks (/BYTE1 through /BYTE4). It follows the 68030 transfer bus sizing standard, using SIZ0/SIZ1 and A0/A1 to enable the correct byte lanes for aligned or unaligned transfers.

RU8: The State Machine (Timing Core)
This is the heartbeat of the memory controller and, as expected, is a much simpler state machine than the original DRAM controller. It is configured as a Registered PLD to handle timing and asynchronous handshaking via DSACK signals.
  • Wait States: It reads two jumper inputs (WS1, WS2) to allow configuration of 0, 1, or 2 Wait States.
  • Delay Logic: It uses internal registered feedback nodes (rf17 -> rf16 -> rf15) to create a shift-register style delay triggered when /SELECT goes low.
  • 0WAIT Performance: I have 45ns SRAM ICs (CY62158H) on hand, so 0-WAIT should be achievable with a 16MHz clock. For a 32-bit port, the 68030 requires both DSACK0 and DSACK1 to be asserted; RU8 drives both pins low simultaneously.
  • Write Enable: It generates /SRAM_WE only when RW is low, AS is active, and the block is selected.
Note on Clock Sync: In theory, using a 7ns GAL, the logic could generate a 31.33MHz clock using the system oscillator (15.6672 Mhz) as a source. The logic for these "budget" type accelerator boards requires the host system clock and the accelerator clock to be perfectly aligned and synchronized. Any time drift between the two will simply not work!. RU8 is intended to act as a "two-stage frequency multiplier."

Is there a better way to obtain this synchronized 31.33MHz signal, or does this "GAL-frequency multiplier" approach seem feasible? How about a PLL-based Clock Generator?

I have attached the full schematics and EQN files below. Any feedback, tips, or "gotchas" are more than welcome before I order the first PCB prototype!
 

Attachments

Last edited:
I am *super* interested in this for my Mac Plus. getting an 030 *and* the extended memory for swap into this machine, alongside the ROMinator and BlueSCSI v2 WiFi, would be a dream come true! let me know how I can contribute (pre-order, patreon, testing, etc) so this can be realized by the community-at-large.
 
I really appreciate your enthusiasm! However, it’s still a bit too early to think about testing or pre-assembled kits. I haven't seriously considered what I’ll do once the project is finished, though I will likely just publish everything on my GitHub page.

To be honest, I’m doing this mostly for the fun of it—there’s something strangely entertaining about smashing my head against a wall to make my 512Ke a bit more usable!

I am currently diving into schematics and defining an achievable scope for my skill level. My goal is to create an accelerator primarily for early Macs (128K/512K/Plus) based on the simple and clean MM Performer logic, but with several added features and improvements. I’m still at the beginning of the learning curve regarding 68030 architecture, but I feel like I’m making steady progress.

Planned Specifications:

4MB Fast SRAM (32-bit)

SCSI I/O for the Mac 128K/512K

ROM-inator support (using its own socketed PLCC ROM ICs)

Selectable 15.67 / 31.33 MHz clock speed

That’s the core of it. I believe this would be a solid open-source solution. I’m not pursuing absolute maximum performance; adding features like a 64KB cache or extending RAM beyond 4MB introduces a level of complexity that isn't achievable while maintaining the original MM Performer logic.

While I will keep the PDS connector for the SE, there are already sophisticated accelerator clones available for that model. My primary goal is to create a 'vintage solution' for early Macs that is considerably faster than what is available today.

SRAM is the smart choice here; it's reasonably affordable and eliminates the hassle of refresh logic. The SCSI I/O is already rock-solid (based on the Mac Plus logic), and the ROM-inator implementation is working well.

I am still pondering the requirements for doubling the clock speed and whether the Performer logic can handle it. Beyond faster PLDs, it will surely require buffering the address bus. I’ve noticed the Gemini Ultra/Integra and Extreme Vandal accelerators use 74ACT541s for address buffering, which seems like the obvious choice as it doesn't appear to require extra logic beyond the CPU clock. I expect the R/W signal will also need buffering for the motherboard side. To handle the clocking, I plan to use a modern multiplier like the ICS 501 or 511.

Regarding the TS Mercury DRAM expansion: I believe that design pushed the absolute limit of what was possible without buffers, even at 16 MHz. My first prototype using DRAM has been unstable due to thermal drift. While I haven’t acquired a high-quality logic analyzer yet, I suspect the issue is a combination of limited driving capabilities on the RAM ICs and poorly matched bus traces. Between that and the ROMs/SCSI, the bus is simply being overloaded. Switching to a buffered design and SRAM should overcome these hurdles!
 
In case you didn't know - the MC68030UM has a section "12.5 Static RAM memory Banks" that can be interesting for this use case. Also, manufacturers of cache tag SRAMs had application notes with detailed timing requirements for higher-speed MC68030 (some links here). A SRAM memory bank is just a SRAM cache that never misses, so some of the analysis might be relevant.
 
In case you didn't know - the MC68030UM has a section "12.5 Static RAM memory Banks" that can be interesting for this use case. Also, manufacturers of cache tag SRAMs had application notes with detailed timing requirements for higher-speed MC68030 (some links here). A SRAM memory bank is just a SRAM cache that never misses, so some of the analysis might be relevant.
Hi there!

Thank you for the heads-up! I’ve actually been re-reading Section 12.5 of the MC68030 User’s Manual quite a bit lately.

The link you provided doesn't work. I guess it's because it points to a private direct message?

The trade-off between clock frequency and bus cycles is the defining factor here. Even by doubling the clock to 31.33 MHz—while keeping the RAM access in asynchronous mode (minimum 3 cycles, DSACKx terminated) and potentially adding one or two wait states—the result still provides the most significant performance boost for the Performer accelerator.

Essentially, the raw throughput of the higher clock frequency outweighs the penalty of the extra cycles required for memory access. This is particularly true for the 68030 architecture (with its internal Harvard design) compared to the 68020 and 68000, as the '030's split caches allow it to better mask external bus latency. I guess this is the reason why we see so many 68000/68020 accelerator boards with cache.

The bottleneck for a synchronous 31.33 MHz design is the SRAM itself; I would need 4MB of 5V-parallel -SRAM type with an access time of roughly 20ns to satisfy the two-cycle window. The highest density available for such fast chips is 512Kb x 8 (4Mbit), which would require populating eight 36-pin SOJ ICs on the board to reach 4MB!. Instead, I’ve opted to maintain the asynchronous access mode—similar to the DRAM expansion board for the TS Mercury—to use the 45ns CY62158H. These offer the highest density currently in production (1Mb x 8), allowing me to populate the full 32-bit bus with just four ICs.

In theory, these 5V 45ns parts are a perfect match for a 31.33 MHz asynchronous design. I am taking the precaution of adding logic to selectively implement between 1 and 3 wait states. At 31.33 MHz, at least one wait state would be always needed.

As you can see, the SRAM is not what is holding up the Gerber files.

I am still a bit hesitant about doubling the clock speed. I am trying to determine whether, at least in theory, the Performer's core logic will remain stable if I double the frequency to 31.33 MHz using a PLL clock multiplier while adding 74ACT541 address buffers.
 
Last edited:
The link you provided doesn't work. I guess it's because it points to a private direct message?

Oups, sorry

http://www.bitsavers.org/components/idt/_dataBooks/1991_IDT_SRAM_Databook.pdf
http://www.bitsavers.org/components/ti/_dataBooks/1990_TI_Cache_Memory_Management_Data_Book.pdf
Enhancing MC68030 performance using the SN74ACT2155 cache.

while keeping the RAM access in asynchronous mode

It might be easier to use synchronous mode on the '030n actually, as you don't need bus sizing.

The bottleneck for a synchronous 31.33 MHz design is the SRAM itself; I would need 4MB of 5V-parallel -SRAM type with an access time of roughly 20ns to satisfy the two-cycle window

3-cycles synchronous might be easier to implement than 3-cycles async; and if you need more than one bank of chips for the chosen size, you can have two banks and get 3-1-2-1 burst reasonably easily (maybe 3-1-1-1 with fast enough SRAM).
 
You’re correct that 32-bit RAM avoids bus sizing, but the 68030 still has to interface with 68000-era legacy hardware. That is precisely why the bus sizing feature exists in the first place. Modifying the core logic to handle the handover between Fast and slow RAM for video buffer access—specifically switching from a 2-cycle synchronous to a 3-cycle asynchronous mode—would be a massive undertaking.

Right now, the accelerator toggles RAM access by sniffing the memory address space (handled by PLD RU6), while the main state machine manages the arbitration and asynchronous signal translation. Re-engineering that logic is definitely beyond my current reach! Furthermore, adding the logic required to handle an external cache, which would effectively require scrapping and replacing the current design, is a challenge well above my skill level. So, asynchronous mode for SRAM in this context is not just an option; it is simply mandatory.
 
Last edited:
An update:

I’m evolving my design to use latched buffers for both the address (74574s) and data (74646s) buses. If you look at more advanced 68030 accelerator boards you will notice immediately that they all use latched buffers.

These provide at least two absolutely necessary requirements for increasing the clock speed of the Performer:

Isolation: They "hide" the daughterboard's local traffic (such as the CPU talking to its own Fast RAM) from the host system. The original Macintosh Plus/512 was designed for an 8 MHz clock; when you hammer those old traces with faster bus signals, the capacitive load increases significantly, degrading the signal integrity. Because those boards were never designed for such speeds, the resulting impedances are often far out of spec.

Stability: The latches "freeze" the bus state for the Macintosh Plus/SE motherboard for the exact duration its own asynchronous cycle requires.

The crashes I’ve been seeing in my DRAM prototype are likely direct results of these integrity issues, stemming from long traces with poor impedance matching.



This approach is modeled after the design seen on the Gemini Ultra and Vandal accelerators (an educated guess made after carefully studying photos of them online). While I could use 74646s for both buses, using the 74574 for the address bus saves much-needed space, allowing for a more compact layout and a reduced PCB footprint.

I am still polishing the logic that will control these latched buffers. So far, the project seems attainable because I am not attempting to dive into synchronous cycles. That would require far more complex logic that is currently beyond my experience level (keeping it asynchronous is a much more manageable path for this accelerator design).

As an interesting note, the DRAM expansion board of the Mercury 030 uses 74245 data buffers to drive the RAM ICs, rather than the host system data bus. I see this as a limitation for any clock speed increase. I am attempting to flip the logic to use the buffers (latched type) on the host side. The CPU load will be limited to the local logic and the SRAM ICs. That seems to be a much more common arrangement in high-profile 68030 accelerators.

While this level of "over-engineering" might seem like overkill for a board without a cache and limited to 4MB of RAM, it ensures signal integrity is strong enough to safely double the clock speed to 31.33 MHz.

From what I’ve learned so far, the host system and accelerator clocks in the MicroMac Performer’s core logic are not uncoupled; they remain synchronized in phase. This makes the logic far simpler than that of more sophisticated 68030 accelerators. However, I’m hopeful that by introducing these bus-driving improvements, the core logic will perform reliably at double its default speed using a PLL clock multiplier. Any feedback or back-and-forth comments are more than welcome!
 
Last edited:
Back
Top