FPUs in Macs

Paralel · Feb 22, 2016

For all the Macs that have a built-in socket for an FPU, do they all require an FPU that is the same speed as the main processor and it clocks down if its rated higher or does it switch into an async. mode if the processor is slower than the FPU?

My best somewhat educated guess is that FPUs in this situation must have synchronous operation, so they are either the same speed or clock down.

Elfen · Feb 22, 2016

The FPU takes what clock signal is available, usually with what is given to the CPU. So if the CPU is running on 16MHz, the FPU will run at 16MHz. If the FPU is marked for 25MHz and it put with a CPU that has a 16MHz clock, it will run at 16MHz. If anything, the 25MHz mark is a Speed Limit Sign as to the maximum speed it should go. (But who listens to that as a 25MHz MHz '030 and '882 FPU have been pushed to 33MHz.)

IMHO if you are going to overclock an FPU, put a heatsink on it. I read reports that they heat up more than CPUs when over clocked.

When operating at the same speed, the CPU-to-FPU communications is synchronous. But in rare cases when the FPU runs at a different MHz speed from the CPU, they communicate in asynchronous mode.

Paralel · Feb 22, 2016

That's what a figured, Macs only generate one 'master' clock signal. So that means Macs, in general, were meant to operate with synchronous FPUs. A separate crystal would have been used if they were intended to operate in async mode.

I guess that makes the Macs that use FPU's from add-ons special in a certain way, they can use a local crystal for the clock, so they can operate async, and as such, much faster than the ones that have a built in spot for an FPU.

That kinda makes me want to change the card for the Classic II so it can run optionally run in async mode with a 50 MHz 68882. It should really spank the SE/30 at that point.

Elfen · Feb 23, 2016

I would agree to a point. And its a minor point.

About 90% to 96% of the Mac's operations is done on the CPU; that's a given. And on systems without a FPU, about 99% is done on the CPU. Things like floppy drive and hard disk is done by their respective I/O chips, counting for the 1% for "automatic operations" (programs like Norton just take over these chips as it reduces basic I/O into more complex ones).

Most programs do not need a FPU. Programs that involve vector graphics, animation and other graphics, Postscript printing (including fonts), encryption, graphic intensive games and diagnostic tools use the FPU. It is the last one, diagnostic tools, that will test an FPU to its limits with its Whestone, Drystone, and trigonometric vector plotting (on the video) tests. Everything else does not use the FPU, including MS Works/Office/Excel.

The point is diminishing returns. Lets say for the sake of argument that a 16MHz 68882 does 100 math functions per second when called for. At 25MHz, the 68882 will do 152 math functions per second when called for. At 33MHz, the 68882 will do 200 math functions per second when called for. At 50MHz, the 68882 will do 312 math functions per second when called for. These numbers look fast and they are. But here's the rub - the faster 68882 will have wait for the next math problem the 68030 throws at it. So the increase gets smaller and smaller the faster you push the FPU because it has to wait for the CPU to talk to it. The Diagnostic programs will record a whopping increase performance but for everyday use it will barely be seen. It will kick the SE\30's ass but the Classic II is already crippled on more than one area. Small RAM Footprint, Slow Disk I/O, limited 9in Classic video that shares system RAM and the CPU running all that on top of everything else.

For the past couple of weeks I have been thinking about an FPU on a classic Mac like the SE or Plus. It would reuse of a Kelly Clip to latch onto the 68K but what improvements will it have? An FPU running at 8MHz? It would end that rare "FPU Not Installed" error. And some programs might improve in its speed (like the vector graphics program FreeHand). But is it worth to do it?

johnklos · Feb 24, 2016

The diminishing returns thing isn't really an issue. The amount of time spent waiting for the delivery of data to an FPU is much, much smaller than the amount of time it takes for the FPU to run an instruction.

For instance, let's use a Classic II as an example . It has a 16 bit, 16 MHz bus which can transfer 16 MB/sec (two clocks per transfer at 16 MHz is 8 million transfers per second, an at 16 bits means 16 million bytes per second). Although oversimplifying, let's say that an FPU operation loads two 64 bit floating point numbers, an instruction and transfers a 64 bit number when done. That amounts to 13 transfers of 16 bits at a time. If an FPU were infinitely fast, then our Classic II would be able to do 8 million divided by 13, or about 600,000 floating point operations a second when moving this much data. More typically there'd be less transfers, but we're just guesstimating.

An m68882 takes anywhere from 130 clocks to 400 clocks for common operations with extremes at nearly 1000 clocks. But, again, for the sake of argument, let's say that the operations do require that much transfer (when they usually require less) and that the FPU takes 130 clocks (when it usually requires more). That means that at 16 MHz, you could do 16 million / (130+26) operations, or about 102,000 floating point operations.

Let's compare this with a 50 MHz m68882 on a 16 bit, 16 MHz bus: we get 16 million / (130/3.125+26), or 236,000 floating point operations per second or so.

Finally, let's compare this with a 50 MHz m68882 on a 32 bit, 50 MHz bus: we get 50 million / (130+14), or 347,000 floating point operations.

So does the bus make the FPU's speed increase irrelevant? No, even though this is a worst-case scenario.

Let's look at the numbers if we use more realistic figures - nine 16 bit transfers and 400 clocks per FPU operation:

16 bit, 16 MHz everything: 16 million / (400+18), or 38,000 FLOPS

16 bit, 16 MHz bus, 50 MHz m68882: 16 million / (400/3.125+18), or 109,000 FLOPS

32 bit, 50 MHz everything: 50 million / (400+18), or 119,000 FLOPS

To summarize, worst case with a 50 MHz m68882 is 2.3 times faster on a Classic II's bus instead of 3.4 times faster with a 50 MHz, 32 bit bus.

Typical case is 2.8 times faster with a 50 MHz m68882 instead of 3.1 times faster.

Elfen · Feb 24, 2016

Q: Is the Data Bus connection to/from the CPU <--> FPU on the Classic II 16 Bit or 32 Bit?

Paralel · Feb 24, 2016

16-bit

FPUs in Macs

Paralel

Elfen

Paralel

Elfen

johnklos

Elfen

Paralel

Similar threads