MODTracker audio replay on early 68k macs

Snial · May 27, 2026

8bitbubsy said:
Bumping this. I am going to port the ProTracker replayer to 68000 Macs one day, I just need to set up a development environment that compiles C code (for loader/GUI) and assembles assembly code (replayer/mixer), linking together the two into a single executable. Any suggestions for a dev setup for this purpose on System 6?

I played with Retro68 for the work I did on MODTracker, but frankly I'd choose THINK C 5 as it's a much cleaner, direct development environment. It works under System 6 and you can do inline assembly pretty well. You can write your VBL (replayer/mixer) task to work in inline assembly. Assembly stuff can see the 'C' symbols and vice versa. THINKC can do Object Orientation with a subset of C++ features including virtual methods and has a well respected Class Library (TCL) for writing OOP-based apps.

8bitbubsy · May 27, 2026

OK, so THINK C 5 it is, but I need an assembler as well. I don't want to inline the whole mixer and replayer, and I want both of those to be in 100% assembler for minimal overhead.

Snial · May 27, 2026

8bitbubsy said:
OK, so THINK C 5 it is, but I need an assembler as well. I don't want to inline the whole mixer and replayer, and I want both of those to be in 100% assembler for minimal overhead.

I understand. Maybe MDS could help there. But inline assembler is a very nice old-style inline assembler, where you just do stuff like _asm { and then type lines of assembly until you want to stop with } . It's not like GNU inline assembler. It sets up a stack frame, which costs a couple of instructions.

Which compiles to when you choose Source: Disassemble:

-cheers from Julz

8bitbubsy · May 27, 2026

Thanks! That’s indeed nice and clean. No need for a separate assembler after all. Does it support the REPT directive for repeating a block of asm code?

olePigeon · May 27, 2026

If it matters at all, as a non-programmer, I discovered that the AI coding agents will actually produce useable code for THINK C. I tried for many, many hours to get it to do anything for CodeWarrior with zero success. I think it's because the code it steals for THINK C is, by necessity, clean and concise as it must adhere to specific standards. CodeWarrior is more loosey goosey and forgiving of mixing styles and standards, so the code the AI steals and regurgitates is really sloppy.

I'm only saying this incase you use it as a tool for a tiny bit of troubleshooting or for commenting code and/or libraries to better understand what's going on. It's also the only thing of (relative) value I feel I could contribute.

joevt · May 27, 2026

MPW has a powerful 68K assembler that supports macros and such.

Can you link object files generated from that into Think C? I did that with Think Pascal.

Attached are some projects I implemented in Think Pascal and 68K Assembly from "Zen of Graphics Programming" by Michael Abrash (First Edition © 1995). The code in the book is C and i386 assembly.

The 68K assembly files have extension .a. I think they are formatted in MPW with Geneva 9 pt font, tab space 4.
An MPW worksheet named "Compile" has commands for compiling the assembly files (to generate object files) and doing other stuff in MPW such as searches. The compile commands should probably be converted to an MPW MakeFile (with dependency lists, etc.). In this worksheet form, you select a command to execute, and press enter - similar to BBEdit.app worksheets in modern macOS.
The files use classic Mac OS carriage return line endings. If you add them to a GitHub repository, you'll want a script to convert to and from Unix linefeed line endings and to add the MPW or Think file type and creator codes.

Snial · May 27, 2026

8bitbubsy said:
Thanks! That’s indeed nice and clean. No need for a separate assembler after all. Does it support the REPT directive for repeating a block of asm code?

I guess you could use the C pre-processor? Define the block of code in a macro, then define a rept macro to repeat another macro a number of times with some kind of recursive #ifdef trickery? Finally, have a AsmRept(n, macroName) in the asm block?

@8bitbubsy , [ @joevt ], if you're familiar with MPW, it's more of a heavy-weight, more sophisticated development environment, which perhaps is worth looking at. It's not really like a VS / THINK / Xcode / CodeWarrior / Eclipse type environment though. Steeper learning curve, but more rewarding if you're committed enough.

Snial · May 27, 2026

8bitbubsy said:
Thanks! That’s indeed nice and clean. No need for a separate assembler after all. Does it support the REPT directive for repeating a block of asm code?

e.g:

Works, although it's a little clumsy.

8bitbubsy · May 28, 2026

Snial said:
e.g:

View attachment 99324
Works, although it's a little clumsy.

At least #define can be used for inline asm, that is good enough for me (#define a block of code, repeat the definition N times inside the inline asm).

MIST · May 28, 2026

So you'll port the Amiga code to the slower Mac _and_ add a software mixer? Sounds cool. When do you expect to have some working demo?

8bitbubsy · May 28, 2026

MIST said:
So you'll port the Amiga code to the slower Mac _and_ add a software mixer? Sounds cool. When do you expect to have some working demo?

Yes. I already made a 4ch 16-bit software-mixer for 7MHz 68000 Amigas using the original PT replayer, so I know it works. It mixed at around the same rate as the Mac version will use (22kHz).

Not sure when I'll have something to show off, I'll need to set up some stuff on my Macintosh Classic first, then find the motivation.
And I also need to take into account that I may get stuck getting it to be fast enough. I know there are some limitations on 68k Macs as the available CPU time is not really 100%.

Snial · May 28, 2026

8bitbubsy said:
Yes. I already made a 4ch 16-bit software-mixer for 7MHz 68000 Amigas using the original PT replayer, so I know it works. It mixed at around the same rate as the Mac version will use (22kHz).

Not sure when I'll have something to show off, I'll need to set up some stuff on my Macintosh Classic first, then find the motivation.
And I also need to take into account that I may get stuck getting it to be fast enough. I know there are some limitations on 68k Macs as the available CPU time is not really 100%.

Original 68K Macs up to a Mac Plus, have about 6.5MHz of effective performance, because the video hardware fetches a word every 8 cycles when copying the actual frame buffer. Thus 4*(512/16)*342*60.15=2633126.4 are taken/ second. In theory, this leaves only 4866873.6cycles /second, but obviously a lot of 68000 instructions have internal cycles so it's a bit better than that.

The Mac SE and Mac Classic are a bit better because they fetche 32-bits every 16 cycles (1/4 potential memory fetch cycles). Thus 2633126.4/2=1316563.2 are taken/second, leaving 6.2M cycle/s available for the CPU. Having written that the Mac Plus has 6.5MHz of CPU, that can't quite be right, because it implies the SE has: 6.2M/4.9M*6.5MHz=8.2MHz, which is impossible. Hmmm.

8bitbubsy · May 28, 2026

Yeah, I'm worried it may not be fast enough on a 68k compact Mac, which would be a huge letdown, but I'll give it a try first before I conclude with anything at all. Though I'm only going to support Macintosh SE or newer, as I need all the CPU time I can get.

I have some things in mind to hopefully make it fast enough, like making sure all sample loops are at least 512 bytes long by unrolling the loop N time(s) where needed during module load. Also to calculate the max amount of output samples one can mix during a frame before reaching the end of the sample data (or its loop end point), so that there is no need to test for loop/end boundaries inside the actual inner mixing loop, which would be SLOW. And unrolling the inner mixing loop too, obviously!

8bitbubsy · May 29, 2026

OK scratch that, I will try to support Mac Plus too, though I have a feeling that if it's fast enough on a SE, it may occasionally hiccup on a Plus.

Snial · May 29, 2026

8bitbubsy said:
OK scratch that, I will try to support Mac Plus too, though I have a feeling that if it's fast enough on a SE, it may occasionally hiccup on a Plus.

I see you started contributing from near the beginning, there are quite a few posts which discuss optimising basic playback performance. I'm not sure exactly how familiar you are with Mac audio and all the posts in this thread. So at the risk of condescension,

I guess you know that a compact Mac plays a single 370b, 8-bit sampled audio buffer from a fixed location in RAM relative to the video frame buffer, on every frame, which you can hook with a VBL (Vertical Blanking Line) interrupt.

So, the technique here was to generate one frame's worth of audio in a separate buffer and then copy it to the hardware buffer at the beginning of the next VBL, then while it's being played by the audio we have time to generate the next frame.

For example:

Post in thread 'MODTracker audio replay on early 68k macs'

Jul 24, 2025

MIST said:
The replay routine itself is the WizzCat routine as discussed at https://www.atari-forum.com/viewtopic.php?t=43127 I did not touch the MOD decoder/parser itself. It's my understanding that it's a full MOD decoder supporting the typical four channels and all the basic effects required for MOD playback. Some effects may actually be part of the included samples. I am no MOD expert.

OK, so I've just been looking at your VBL routine. Excellent use of movep, I wouldn't have thought of that, but it's great for writing into every other byte. I still think a few improvements can be...

Discusses how quickly we can do this, a routine taking 0.629ms (7.5MHz) or 0.726ms (effective 6.5MHz). At 60.15Hz, we have 16.63ms per frame, so that's about 3.78% of CPU gone for this.

This post discusses a central loop for audio generation, 2 tracks at a time:

Post in thread 'MODTracker audio replay on early 68k macs'

Jul 29, 2025

Arbee said:
The ultimate reference is the ProTracker 68000 source, because there's a fair amount of corner-case behavior that certain MODs rely on. But that's only correct for newer MODs that were made on ProTracker rather than SoundTracker.

There's a famous (and very good) MOD called "Klisje Paa Klisje" that sets a tempo of 0x20 at one point. In SoundTracker, that just meant "update every 32 timer ticks", but in ProTracker that means "set timer to 32 BPM" which is quite different.

[ and @MIST ]. I've been looking at the assembler code a bit more, to turn this MOD player into a proper...

It would take up 4461806.7cycles/second, about 69% of CPU at an effective 6.5MHz. This, I think is enough to be fairly sure you can do it on a Mac Plus.

Much of your comments revolve around getting rid of the simulated 50Hz scheduling period and just working with the true sample rate and Mac VBL frequencies. I agree, since making them conform complicated audio generation, which means it made it less efficient.

One of my discussion points early on (i.e. my first comment above) is that sample generation doesn't really have to take place at the beginning of the hardware buffer, because firstly, from a user's viewpoint, we just hear a continual stream of samples: if the audio generation wrote from sample 185 (the middle) of the HW buffer to the end and back to the beginning, then to sample 184 it would sound the same if you could be sure that the audio generation never overtook the hardware playback. And this has the advantage of giving the audio software a bit more latency, but also frees up that 0.629ms from having to copy from your generated audio buffer to the hardware buffer.

e.g Let's say audio generation takes 65% of CPU and we start writing audio at sample 185, while the HW is at sample 0. When we've done sample 369 (the last one, counting from 0), the HW is at (370-185)*0.65=Sample 120. We start back at sample 0 at this point, but the HW is ahead, so we're not overwriting it. When we've done the next 120 samples, the HW has done another 120*.65=78, so it's at sample 198, 13 samples into the audio we've just been generating. We have another 185-120=65 samples to do and when we have finished, we've written sample 184 and the HW is at sample 240 or 241.

We now have some time to calculate playback and effect and when we leave the VBL, handle some UI (slowly). On the next VBL interrupt, again we start generating samples and writing them from hw buffer sample 185.

This is just an example assuming 65% of CPU. If we were much faster than that, we'd overtake the hw playback, so we'd have to start at an earlier sample to prevent that.

8bitbubsy · May 29, 2026

Ah nice. So if I understand it right I can use MOVEP.L (A0)+,D0 to store four high-bytes from my 16-bit mixing buffer to D0, then do MOVE.L D0,(A1)+ (where A1 points to the 8-bit audio buffer)? That's for sure a great optimization.

Also yeah, I'll have to figure out how and at which point during a frame to write to the output audio buffer. It sounds complicated, and I may need some help when that time comes. First I need to get it all ported over.

joevt · May 29, 2026

8bitbubsy said:
MOVEP.L (A0)+,D0

I don't think MOVEP.L supports post increment addressing mode.
Also, 68K assembly instructions have source as the first parameter, and destination as the second.
You would probably have an unrolled loop like in the examples given in earlier posts. Do several moves in a row then an add instruction to increment A0 by 8 times the number of moves. Then a branch.

https://68kmla.org/bb/threads/modtracker-audio-replay-on-early-68k-macs.50519/post-568559
https://www.plutiedev.com/movep

Snial · May 29, 2026

8bitbubsy said:
Ah nice. So if I understand it right I can use MOVEP.L (A0)+,D0 to store four high-bytes from my 16-bit mixing buffer to D0, then do MOVE.L D0,(A1)+ (where A1 points to the 8-bit audio buffer)? That's for sure a great optimization.

Close. The actual code snippet did it the other way around:

Code:

    move.l (a0)+,d0
    movep.l d0,0(a1)

It moved 4 bytes from an 8-bit mixing buffer to 4 consecutive alternate bytes in the hardware audio buffer. You can't use the (an)+ addressing mode, because only disp(an) is supported for movep. Of course, this can be made faster:

Code:

    move.l (a0)+,d0 ;12c,
    movep.l d0,0(a1) ;we now have 44.9us free.
    movem.l d0/d1/d2/d3/d4/d5/d6/d7,-(a7) ;about 9.6us (72c)
    movem.l (a0)+,d0/d1/d2/d3/d4/d5/d6/d7 ;another 9.6us.(72c)
    movep.l d0,8(a1)
    movep.l d1,16(a1)
    ;etc
    movep.l d7,64(a1) ;Total: (24*8+72)=264c vs 288c. for 32samples.
    ;Need to do this another 10 times (another 320 samples). 14 samples left.
    ;that's 12
    movem.l (a0)+,d0/d1/d2 ;12 samples (32c)
    ;another 3x movep.l (24c*3=72c)
    move.w (a0)+,d0
    movep.w d0,736(a1)

This version takes 36+264*11+32+72+8+16=3068 cycles or 0.41ms instead of 0.629ms.

8bitbubsy said:
Also yeah, I'll have to figure out how and at which point during a frame to write to the output audio buffer. It sounds complicated, and I may need some help when that time comes. First I need to get it all ported over.

OK, sure.

8bitbubsy · May 30, 2026

joevt said:
Also, 68K assembly instructions have source as the first parameter, and destination as the second.

I know, I was talking about moving the upper bytes from my own 16-bit mixing buffer to a data register.

Snial said:
It moved 4 bytes from an 8-bit mixing buffer to 4 consecutive alternate bytes in the hardware audio buffer.

Huh? Isn't the audio output buffer on compact Macs 8-bit sequential bytes?

So I think I'll have to do something like...

Code:

MOVE.L (A0)+,D0     ; read two 16-bit samples from mix buffer (upper byte -> 8-bit sample)
AND.L #$FF00FF00,D0 ; (TODO: put the mask in a register)
MOVE.L D0,(A1)+     ; write two 8-bit samples to audio output buffer

But that's awfully slow. Not to mention that 32-bit stuff takes much more time in the 16-bit ALU.
Also not sure if I'm allowed to even write (zeroes) to the odd bytes in the audio buffer.

EDIT: Or maybe this would work...

Code:

MOVEP.L (A0),D0 ; A0 = 16-bit mix buffer
MOVEP.L D0,(A1) ; A1 = audio output buffer

Snial · May 30, 2026

8bitbubsy said:
Huh? Isn't the audio output buffer on compact Macs 8-bit sequential bytes?

Ho, ho, you might think so, but actually the other byte controls the motor speed for the floppy drive (of course!). In fact both bytes are PWM outputs: the sampled audio stream is just a poor-soul's PWM.

8bitbubsy said:
<snip>

EDIT: Or maybe this would work...

Code:

MOVEP.L disp(A0),D0 ; A0 = 16-bit mix buffer MOVEP.L D0,disp(A1) ; A1 = audio output buffer

Yes, that's what you'd do! It'll be a bit slower than the previous routines, at 12c/sample, 4440 cycles overall, 0.592ms.

However, thinking about my earlier remark; if you know the Floppy drive won't be being used, then perhaps you could simply do:

MOVEM.L (a0)+,d0/d1/d2/d3/d4/d5/d6/d7
movem.l d0/d1/d2/d3/d4/d5/d6/d7,(a1)+

This is 144c for 16 samples, 9c/sample or 3330 cycles overall, 0.444ms.

MODTracker audio replay on early 68k macs

Snial

Moderator

8bitbubsy

Snial

Moderator

8bitbubsy

olePigeon

joevt

Attachments

Snial

Moderator

Snial

Moderator

8bitbubsy

MIST

8bitbubsy

Snial

Moderator

8bitbubsy

8bitbubsy

Snial

Moderator

Post in thread 'MODTracker audio replay on early 68k macs'

Post in thread 'MODTracker audio replay on early 68k macs'

8bitbubsy

joevt

Snial

Moderator

8bitbubsy

Snial

Moderator

Similar threads