OK scratch that, I will try to support Mac Plus too, though I have a feeling that if it's fast enough on a SE, it may occasionally hiccup on a Plus.
I see you started contributing from near the beginning, there are quite a few posts which discuss optimising basic playback performance. I'm not sure exactly how familiar you are with Mac audio and all the posts in this thread. So at the risk of condescension,
I guess you know that a compact Mac plays a single 370b, 8-bit sampled audio buffer from a fixed location in RAM relative to the video frame buffer, on every frame, which you can hook with a VBL (Vertical Blanking Line) interrupt.
So, the technique here was to generate one frame's worth of audio in a separate buffer and then copy it to the hardware buffer at the beginning of the next VBL, then while it's being played by the audio we have time to generate the next frame.
For example:
The replay routine itself is the WizzCat routine as discussed at
https://www.atari-forum.com/viewtopic.php?t=43127 I did not touch the MOD decoder/parser itself. It's my understanding that it's a full MOD decoder supporting the typical four channels and all the basic effects required for MOD playback. Some effects may actually be part of the included samples. I am no MOD expert.
OK, so I've just been looking at your VBL routine. Excellent use of movep, I wouldn't have thought of that, but it's great for writing into every other byte. I still think a few improvements can be...
Discusses how quickly we can do this, a routine taking 0.629ms (7.5MHz) or 0.726ms (effective 6.5MHz). At 60.15Hz, we have 16.63ms per frame, so that's about 3.78% of CPU gone for this.
This post discusses a central loop for audio generation, 2 tracks at a time:
The ultimate reference is the ProTracker 68000 source, because there's a fair amount of corner-case behavior that certain MODs rely on. But that's only correct for newer MODs that were made on ProTracker rather than SoundTracker.
There's a famous (and very good) MOD called "Klisje Paa Klisje" that sets a tempo of 0x20 at one point. In SoundTracker, that just meant "update every 32 timer ticks", but in ProTracker that means "set timer to 32 BPM" which is quite different.
[ and
@MIST ]. I've been looking at the assembler code a bit more, to turn this MOD player into a proper...
It would take up 4461806.7cycles/second, about 69% of CPU at an effective 6.5MHz. This, I think is enough to be fairly sure you can do it on a Mac Plus.
Much of your comments revolve around getting rid of the simulated 50Hz scheduling period and just working with the true sample rate and Mac VBL frequencies. I agree, since making them conform complicated audio generation, which means it made it less efficient.
One of my discussion points early on (i.e. my first comment above) is that sample generation doesn't really have to take place at the beginning of the hardware buffer, because firstly, from a user's viewpoint, we just hear a continual stream of samples: if the audio generation wrote from sample 185 (the middle) of the HW buffer to the end and back to the beginning, then to sample 184 it would sound the same
if you could be sure that the audio generation never overtook the hardware playback. And this has the advantage of giving the audio software a bit more latency, but also frees up that 0.629ms from having to copy from your generated audio buffer to the hardware buffer.
e.g Let's say audio generation takes 65% of CPU and we start writing audio at sample 185, while the HW is at sample 0. When we've done sample 369 (the last one, counting from 0), the HW is at (370-185)*0.65=Sample 120. We start back at sample 0 at this point, but the HW is ahead, so we're not overwriting it. When we've done the next 120 samples, the HW has done another 120*.65=78, so it's at sample 198, 13 samples into the audio we've just been generating. We have another 185-120=65 samples to do and when we have finished, we've written sample 184 and the HW is at sample 240 or 241.
We now have some time to calculate playback and effect and when we leave the VBL, handle some UI (slowly). On the next VBL interrupt, again we start generating samples and writing them from hw buffer sample 185.
This is just an example assuming 65% of CPU. If we were much faster than that, we'd overtake the hw playback, so we'd have to start at an earlier sample to prevent that.