• Hello MLAers! We've re-enabled auto-approval for accounts. If you are still waiting on account approval, please check this thread for more information.

How to not make software suck on the G5

So this week I had a serious breakthru on why the JavaScript accelerator in TenFourFox chugs so badly on the G5. Apple, true to form, only documents some of this. I wrote a whole big blurb for the TenFourFox blog, but here are the highlights:

- The D-cache is very different. Not a problem for me right now, but it might be if I start manipulating the data cache to get more performance. AltiVec D-cache instructions like dst actually can cause pipeline bubbles, worsening performance.

- Loads and stores to nearby addresses should have nops between them to force them into separate dispatch groups, or you risk pipeline stalls when the G5 discovers the aliasing fault. Tweaking this bought me some extra points in V8.

- Watch out for cracked and microcoded instructions.

- mtctr should be first in a dispatch group if possible. I bet there are others.

Here's the one that really frosted my cake, and Apple doesn't mention it anywhere:

- mcrxr is emulated in software on the G5. Because the nanojit uses lots of code to check for overflow state, we heavily use XER, and I used mcrxr to get the XER overflow bit into a CR for branching (for a variety of reasons, summary overflow is unsuitable). This works great on the G3 and G4, which have it in hardware. On the G5, it causes an illegal instruction, a pipeline spill and emulation in software by the OS. Yikes! This is by far the biggest reason why the G4 ran rings around the G5. Now, the G5 is back on top.

With these fixes, the raw nanojit (without the fixes I originally used) drops from a dismal 5200ms in SunSpider to 1760ms on my quad 2.5GHz. I bet the people with dual 2.7s do even better. This will be in TenFourFox 4.0.1.

I can see why people got frustrated with optimizing for the G5; Apple never documented this stuff very well.

 
ClassicHasClass,

Not to go off subject, but from your experience in working with software code, of the three different CPU families the Mac has lived in, which one do you find the easiest to dive into? 68k, PPC or Intel? Just curious, though. I'll bet you do have your moments of ripping the hair out of your head. I've always thought 68k was the toughest, since you are pretty much writing code in Assembly and porting it to the gui or visa versa.

73s de Phreakout. :rambo:

 
There's certainly the most documentation on x86, but I've already discussed my philosophical objections to the ISA in other threads. That said, you can get plenty of help and there's lots of people who know it. So, from a support standpoint at least, I guess Intel would be easiest.

In terms of the actual instruction set, 68K is actually very elegant. The instruction set is highly orthogonal and works "like you'd expect." I think as CISC implementations go, it is probably one of my favourites. Apple provided lots of support in the Toolbox for people writing raw assembly code, so it's not as bad as you might think. You just have to be aware of what your registers are doing and which ones you need to preserve. However, segmentation can really ruin your day -- CFM-68K helped a lot here.

Register usage is much more important for the PowerPC because of its load/store nature, its much larger register file and the PowerPC ABI - the list of registers and conditions you need to preserve is much longer, and the instruction set is dizzyingly large despite being "RISC." It's really RISC only insofar as it is load/store. That said, it didn't take me long to learn it and many of the instructions are just variants of more basic operations.

I'd really like to be better at ARM, personally. "They say" it's inspired by the 6502 but I don't really see much resemblance.

 
Thank you for this, CHC. I read the blog entry. I wouldn't be surprised if it ends up being a top Google page for "G5 optimization" before long.

How much time do you spend on your programming hobbies?

 
Back
Top