Here's my mixer. Sample end has to be tested for every output sample (for end/data swapping).
a0 = Points to signed 16-bit mix buffer
a1 = Points to current sample data
a2 = Points to volume LUT, pre-aligned for current voice volume
a3 = Sample end point for next cycle (not an address)
a4 = Points to sample data for next cycle
d0 = <scratch>
d1.w = Integer sample position (upper word is cleared)
d2.w = Fractional sample position
d3.w = Fractional sample delta (pitch)
d4.w = Integer sample delta (pitch)
d5 = <scratch>
d6.w = Current sample end point
Inner mixer loop macro for first voice:
Code:
moveq #0,d0
move.b (a1,d1.l),d0
add.w d0,d0
move.w (a2,d0.w),(a0)+
add.w d3,d2
addx.w d4,d1
cmp.w d6,d1
bhs.w \1
Inner mixer loop macro for other voices:
Code:
moveq #0,d0
move.b (a1,d1.l),d0
add.w d0,d0
move.w (a2,d0.w),d5
add.w d5,(a0)+
add.w d3,d2
addx.w d4,d1
cmp.w d6,d1
bhs.w \1
Then whenever that end-of-sample branch happens, it does this, then jumps back to the inner mixing loop:
Code:
sub.w d6,d1 ; subtract end point from sample position (keeps overflow samples)
move.w a3,d6 ; set new sample end point
move.l a4,a1 ; set new sample data pointer
bra.w \1
This is unrolled 16 times for less branching overhead.
I don't think it gets much faster than this, keeping in mind that I have to test the sample end point every sample. Also, I'm mixing at 22254.54Hz.
EDIT: Since sample lengths are limited to 65534 ($FFFE) bytes in ProTracker modules, you simply can't overflow the word index register with the highest delta (pitch) possible (1.41 at 22254.54Hz). So the code is safe. Some nasty modules from other trackers extended this limit to 128kB, but I force it to 64kB in this case. I know I can do 32-bit logic and support 128kB, but it's slower on the 68000 ALU.