• Hello MLAers! We've re-enabled auto-approval for accounts. If you are still waiting on account approval, please check this thread for more information.

NeXTStation Clock Doubler!

zigzagjoe

68060
Well, nextcomputers.org forum doesn't seem to be accepting registrations, so figure I'll post this here instead.

I recently got a NeXTStation Color Turbo thanks to a kind forum member. As my first hardware project for it, I wanted to accelerate it; both as it really could use the help, and as nobody that I can see has tackled this in modern times. There's a few unobtainium vintage designs, of course, and one fellow that came up with a pretty ugly hack. Not good options, that hack had some nonsensical stuff in there that isn't required, but even so replacing the PLL and removing some resistors to make it work... no thanks, I don't want to modify my logic board. Folks had already tried the QuadDoublers that will work in some amigas and found they didn't work in NeXT, and that seems to have been the end of any development attempts.

Putting a little theory out there first... generally, with CPU accelerators you need to present strobes, read/write data, etc as if you were a native CPU operating on the bus. What that means is if you feed a 2x clock into the CPU (simplifying a bit here) the suddenly faster CPU is going to freak out because signals from the logic board will violate what it expects (badly) and cause bad data to be read/written/etc.

Two ways to address this. The QuadDoublers supposedly do some janky tricks to track if the 040 is in the middle of a bus cycle or not, and if it isn't, feeds the CPU a doubled clock, but when a bus cycle starts the clock is reduced back to standard. This way the bus access timings look like a 25mhz CPU (because it is a 25mhz CPU). Either way, yuck, though evidently this approach wasn't unheard of as the IIfx does something similar according to @Bolle.

The second and more common method is to accept the signals from the fast cpu and massage them in logic until they look like the appropriate 25mhz signals, or close enough that it isn't massively violating timings. This is the more compatible approach - if you have logic that is up to the task. For example, this is the approach taken on my Booster accelerators, and an example of what that looks like can be seen below.

1745890720633.png
(/TS = transfer start, indicates the beginning of a transfer, and /TA is acknowledge, indicating either the end or that data is ready for a multi-word transfer)

Well, @Bolle had reverse engineered the Formac PL150 accelerator some time ago originally designed for the LC475. Credit to Bolle for the image. This ran the CPU at 45mhz while the logic board remained at 25mhz. Bolle had found you could put a typical motorola PLL on the board and run it at 50mhz, too. He was so kind as to share the logic and schematic with me, and I used this as a starting point to see what is needed. As with a PLL clock you're phased-locked to the input clock (literally in the name, after all) I rewrote all of the logic to take advantage of that and as a result increased performance since we can make certain assumptions on how signals line up. Also, there were some really screwy choices made by formac to bypass bugs they introduced or couldn't see how to fix (?).

1745890092113.jpeg1745890444415.jpeg

This was all well and good, and the revised logic was solid enough that it could boot an Amiga 4000. At least, as much as it could with a bad OS - the unit was briefly on loan to me, and I have no amiga background, so I failed there. Still, this is a good omen because it's entirely different from the Quadras I'd tested with.

Next bit of difficulty: The Turbo Color schematic was found some time ago (attached) and it makes clear at least part of the problem with using other accelerators: NeXT uses the 68040 Multiplexed bus mode (didn't even realize that was a thing!). Briefly - a single rising edge, to be exact - the address is presented on the combined address-data bus, and then it switches to data-bus operation. Unless you specifically built your accelerator with that in mind, yeah, that wouldn't work.

Pretty clearly this unobtainable vintage accelerator (seen below) had a set of 2x 16 bit transceivers and buffers, though it's not quite clear to me how they'd use these in order to hold the address as is required for multiplexed operation. Due to this I would need a new board design that both ties the busses together and has the logic required to snipe that address and hold it for long enough to make the system happy.

1745889320761.png1745889546633.jpeg

So a new board was designed and sent off to JLC. Last JLC order to make it over the line in time. The NeXTstation will need a jumper populated to reduce the bus clock to 25mhz; 33mhz would have the poor 68040 trying to run at 66mhz and that's just not going to work.

1745890914793.jpeg1745890933545.jpeg

With a bit of tweaking... it lives! Interestingly, I found NeXT apparently tweaks AVEC on the fly. It's nominally pulled high, but it's wired to one of the core chipset. I'd strapped it high not noticing that. Oops. Nothing a bodged pin won't sort out.

Preliminary results looking good! Dhrystone is a best-case scenario that doesn't stress memory, I will have to see what is out there for a more rounded benchmark. Video results were essentially at parity with the original Turbo@33 - reduced bandwidth was presumably offset by increased performance on the algorithmic side. Haven't been able to do much more testing as of yet, but it seems noticeably more responsive already.

1745891364169.png

As far as I am aware this is the first modern accelerator for the NeXT ecosystem. It remains to be seen if it can work in the Cubes though (I need to know if they use a multiplexed bus). Despite not being strictly Mac-related, I figured it was worth posting due to the overlap, and it does have a little Mac heritage :)
 

Attachments

Fantastic work! The Cubes and stations are identical in operation, so this should 'just work' with a turbo cube. I think it's also very likely this will work without modification in a nonTurbo model (25mhz bus). I would love to give one of these boards a spin and can test across models. The NeXT community has been waiting a long time for this!
 
Fantastic work! The Cubes and stations are identical in operation, so this should 'just work' with a turbo cube. I think it's also very likely this will work without modification in a nonTurbo model (25mhz bus). I would love to give one of these boards a spin and can test across models. The NeXT community has been waiting a long time for this!

I figured as much; looked like the slabs were a direct decedent of the 040 cubes, and I did a quick probe to verify the nonturbo slab I have also has a multiplexed bus. So, I'm pretty sure it ought to work (mechanical constraints permitting) in any NeXT that's been set to a 25mhz. At least conceptually; I am running into a bit of a DMA issue currently and the non-turbo hardware uses a different DMA controller. I'm testing using a Turbo Color slab that I installed the jumper to reduce it down to 25mhz.

Performance is fantastic, though; the machine is noticably perkier under OpenStep 4.2.

Working on an annoying bug: DMA seems to work correctly at least for disk IO, ethernet TX, but ethernet RX is very lossy and sound likewise acts similarly. Video is also fine (though I'm not convinced it actually uses DMA for much), haven't tested serial, sound in, floppy, dsp or printer DMA channels yet. I think it's an edge case on bus arbitration but I haven't managed to pin down what exactly is wrong yet.
 
Quite impressive work!

Does your research could also lead to better iterations of current Mac 040 accelerators?
Not really, beyond generally educating me on how the 040 bus works. This accelerator design is only really applicable to the Q700/Q900 which have a limited maximum bus speed and so get more benefit with a clock doubler accelerator than they do bus overclocking. Same kind of situation for other platforms that use 68040s: If you can increase the bus speed to 40mhz, then that's the best choice.

Here's some benchmarks to show how that works out.
Q605 25: Stock system
Q605 40 oc: Bus overclocked to 40mhz
Q605 50: Original formac PL150 logic
Q605 50 Opt1: Optimized performance with compatible timings
Q605 50 Diode TS: Best possible performance, but violates /TS timing specs by 5ns @ 25mhz bus. Q605 at least was fine with it.

I used a Q605 as my testbed as I knew this design worked with it and it was handy. From these results you can see it's largely a wash in performance... the doubler works out a bit better for CPU-centric tasks that don't access RAM as much, and video performance favors the bus overclock. So given that overclock is free and doubler is $$$, it doesn't make much sense here.

On NeXT hardware where the bus overclock is impossible or invasive, though, the doubler design makes a lot of sense. Unfortunately, I believe not all CPUs were socketed so that remains a possible issue still, though it's possible to install a socket afterwards if you're steady and careful. The NeXT design is distinct owing to the need for the multiplexed bus, so it only can work in NeXT hardware.
 
Last edited:
I posted over there: https://www.nextcomputers.org/forums/index.php?topic=5958.0

Amazing work zigzagjoe as usual! I wonder if the 25MHz 040 station or Cube could use a version of this.
Heh, someone apparently beat you to the punch :)

I expect it will work in at least the non-turbo stations. Assuming the cube uses the same multiplexed bus (likely), then it should work there too.

Made some headway on the DMA issue. As I thought, it's a bus arbitration issue: the arbiter needs to recognize the deasserted state of BB for at least a single LB BCLK cycle before it's able to register it as being asserted again. Otherwise, arbitration hangs as it thinks the bus is still busy. Still trying to figure the exact timings it wants, but with a temporary fix in place it corrected the sound and ethernet RX issue I'd been having. Of course, having a theory is well and good but as I try to *deliberately* fix it I find myself heading in the opposite direction (less stable/more issues), so I haven't precisely keyed on the requirements yet.
 
Well, nextcomputers.org forum doesn't seem to be accepting registrations...

It doesn't seem to do anything but try to run scripts and install cookies. I just see a blank page.

Anyway, it looks like you're doing nice work. Thank you for sharing your project with us!
 
Oooh very nice! My friend has a non-turbo slab I've been wanting to fix up for them...

The multiplexor detail is pretty interesting, I wonder how unique that was in 040 designs overall.
 
I looked at a few 040 designs as I digested the Formac design; none of them save NeXT used the multiplexed bus. Conceptually the multiplexed bus might cost you a cycle on each access (haven't compared the exact data valid timings) but it does save you 32 pins. Actual benefit is going to depend on how much of the address bus you need to break out elsewhere, though. Might be a little simpler to achieve higher speeds and operate the 040 in large-buffer mode.

I think it only really makes sense if you pull a NeXT with a bespoke core chipset; commodity controllers, io devices etc, are really going to expect to have some form of independent address bus. My personal theory is Motorola liked to try one major off the wall bus feature per major revision to see if it was worthwhile to adopt, the multiplexed bus is one of the features that didn't make the cut.

000 introduced asynchronous bus
020 introduced dynamic bus sizing
030 introduced synchronous bus cycles and bursts
040 introduced fully-syncronous and burst-first bus, optionally multiplexed
060 introduced adjustable BCLK ratios
 
Back
Top