The performance portion of the code is a unrolled loop that copies 32 bytes per loop. So, basically, nothing else is as impactful on the result as this portion of code.
The addition operation is used to verify that memory is valid. The buffer has been preloaded with an incrementing value where the end sum is known. (This is a cache checker program.)
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
sum += *((unsigned long*)currentBufferPtr)++;
I've checked the PowerPC disassembly and it looks fine. There are two operations per C line on PPC, as opposed to a single 040 operation, which is to be expected on RISC vs CISC. However, someone with more expertise in PPC assembly might know of an optimization.
The disassembly is interesting in that the PPC code switches between loading one register and then another. I assume using multiple registers allows a performance gain where it can execute two or more operations (a read to one register using the load/store unit and an add to another register using the integer unit) in parallel. Cool.
00000098: 807C0000 lwz r3,0(r28)
0000009C: 841C0004 lwzu r0,4(r28)
000000A0: 7CC61A14 add r6,r6,r3
000000A4: 849C0004 lwzu r4,4(r28)
000000A8: 7CC60214 add r6,r6,r0
000000AC: 847C0004 lwzu r3,4(r28)
000000B0: 7CC62214 add r6,r6,r4
000000B4: 841C0004 lwzu r0,4(r28)
000000B8: 7CC61A14 add r6,r6,r3
000000BC: 849C0004 lwzu r4,4(r28)
000000C0: 7CC60214 add r6,r6,r0
000000C4: 847C0004 lwzu r3,4(r28)
000000C8: 841C0004 lwzu r0,4(r28)
000000CC: 7CC62214 add r6,r6,r4
000000D0: 7CC61A14 add r6,r6,r3
[two operations to prepare to loop and finally]
000000DC: 7CC60214 add r6,r6,r0
Here's the 040:
0000007E: D69A ADD.L (A2)+,D3
00000080: D69A ADD.L (A2)+,D3
00000082: D69A ADD.L (A2)+,D3
00000084: D69A ADD.L (A2)+,D3
00000086: D69A ADD.L (A2)+,D3
00000088: D69A ADD.L (A2)+,D3
0000008A: D69A ADD.L (A2)+,D3
0000008C: D69A ADD.L (A2)+,D3
- David