MODTracker audio replay on early 68k macs

MIST

Well-known member
While searching for fun stuff for my NanoMac I was wondering why there doesn't seem to be a MOD tracker. The 22khz audio and the 7+ Mhz 68k CPU is pretty close to the Amiga and Atari ST and thus should allow for playback.

That's why I joined this discussion. As a result, I have something that sort of works. It's a pretty non-mac-style piece of code which hooks directly into the VIA IRQ and drives the hardware directly ... quite Amiga/Atari ST style ... :)

To make a long story short: This code actually runs quite well on MiniVMac and on my NanoMac. I haven't tested it on real hardware, as I don't own any genuine Apple hardware.

It would be cool to know if the attached program does run on a real 128k/512k or Plus. Since I am hooking into the IRQ handlers and basically shut down any OS related tasks, I'd recommend to not use it on a hard disk based setup as e.g. you cannot shut down properly once the program is running.

I'll clean the code up a little more and then publish it. It's using the Retro68 setup and is rather easy to compile and work on.
 

Attachments

  • NanoMacTracker.zip
    62.1 KB · Views: 16

Byrd

Well-known member
This is great, can't wait to try it out on a lowly Mac - is it for 4 channel .MOD files? Yes, the Mac hardware is close to the Amiga/Atari but the sound is much simpler and slower to move in and out of RAM hence why system requirements on a Mac were slightly higher. See the IIGS @ 2.5Mhz with more advanced sound chipset which could happily play back .MODs.

Sound Trecker and MODplayer were early players, both say 68020 but I'm sure I ran the former on a 16Mhz PowerBook 100:

 

MIST

Well-known member
> This is great, can't wait to try it out on a lowly Mac
And I am really keen to hear if it actually works on the real classic hardware. It relies on a quite exact relationship between the vertical blank interrupt and the timing of the memory locations the samples are being played from by the Macs hardware.

Yes, I knew about these other players. But they IMHO won't work on the very early classic machines which is a pity as these are IMHO the devices which would impress most doing this.

This demo plays regular (four channel) MODs like the included axel-f which is actually one of the most famous MOD files. For some reason, it crashes with the popcorn.mod i tried. But that's pretty likely no fundamental issue, but stupid memory handling on my side or the like. I don't have plans to improve this much more. But I'll release the source code asap and someone with more interest in classic macs (and perhaps even owning one) may optimize this a little more and perhaps even be able to add some small user interface with some eye candy. That should actually not be too difficult to do.

The MOD format is closely tied to the Amigas hardware. But as seen in the Atari forum I linked to, the Atari ST community managed to squeeze the necessary audio processing into that machine as well. And while the Atari STE had some DMA capable PCM subsystem, the original Atari ST only had a YM sound generator which could only barely play samples though misuse of its volume control. So in the end the classic mac is actually better suited for this than the Atari ST although the Atari ST was slightly faster due to a slightly higher CPU clock and a slightly faster memory interface.

So this kind of MOD playback is basically exactly what the classic mac should be capable of. And having seen/heard many of the classic mac games, it's actually pretty cool to hear it doing the Axel-F in acceptable quality.
 

Boctor

Well-known member
I had a SE for ages before I was able to get my hands on '030 stuff, and so I always wanted some kind of minimal module player. Without System 7 or Sound Manager, just a single-tasking music player like the original Super Studio Session. I can't wait to see how you did this, because it's a really cool project.

Sound Trecker and MODplayer were early players, both say 68020 but I'm sure I ran the former on a 16Mhz PowerBook 100:
Don't forget PlayerPRO! Though maybe it's not as early as these two. In my experience, it steals less CPU time while hidden than even Sound-Trecker does, despite also being a fullblown tracker for music creation. If you hide all the windows except the playback controls and music list, it's lean enough to be worth using even on a stock SE/30 or IIcx.
 

Andy

Well-known member
I just gave this a try on my SE and 512ke and it works on both! There's a very little bit of distortion occasionally, during the intro notes you can hear it briefly. Ran it off the HD on my SE and off a floppy for the 512ke.
 

MIST

Well-known member
I just gave this a try on my SE and 512ke and it works on both! There's a very little bit of distortion occasionally, during the intro notes you can hear it briefly. Ran it off the HD on my SE and off a floppy for the 512ke.
Excellent! Thanks a lot. Yeah ... some noise here and there do I hear as well. But I think this goes into right direction.

I'll publish the code tomorrow.
 

Snial

Well-known member
Excellent! Thanks a lot. Yeah ... some noise here and there do I hear as well. But I think this goes into right direction.

I'll publish the code tomorrow.
I looked at the CODE resources, very handily named! I guess the 4.4K "RunTime" resource actually plays the music and the 400kB "main" code resource contains all the MOD data? I played it on miniVMac, at 1x performance. I thought the Atari ST font was very humorous!

Just wondering what kind of polyphony you expect to be able to get out of it? It also looks like you have a bit of reverb, but for all I know it's just that the sample has reverb built into it.
 

MIST

Well-known member
Here's the source code:


The current version is sufficient for me to showcase my NanoMac. But anyone interested in a real player perhaps e.g. with a simple user interface to load a MOD from disk may use this as a basis.
 

MIST

Well-known member
Just wondering what kind of polyphony you expect to be able to get out of it? It also looks like you have a bit of reverb, but for all I know it's just that the sample has reverb built into it.
The replay routine itself is the WizzCat routine as discussed at https://www.atari-forum.com/viewtopic.php?t=43127 I did not touch the MOD decoder/parser itself. It's my understanding that it's a full MOD decoder supporting the typical four channels and all the basic effects required for MOD playback. Some effects may actually be part of the included samples. I am no MOD expert.
 

MIST

Well-known member
Someone may actually have fun making this into a 16 Bit dance party using a Donkey Kong Country mod for some famous classic Mac enthusiast repairing them in his basement.
 

Snial

Well-known member
The replay routine itself is the WizzCat routine as discussed at https://www.atari-forum.com/viewtopic.php?t=43127 I did not touch the MOD decoder/parser itself. It's my understanding that it's a full MOD decoder supporting the typical four channels and all the basic effects required for MOD playback. Some effects may actually be part of the included samples. I am no MOD expert.
OK, so I've just been looking at your VBL routine. Excellent use of movep, I wouldn't have thought of that, but it's great for writing into every other byte. I still think a few improvements can be made, which would improve performance and avoid glitches.

  1. I guess you know that it's best to pre-fill the hardware audio buffer with 0x7f or 0x80 so that there's no glitches on the first sample, or last sample? (because that's really 0x00 if the samples are interpreted as signed -0x80 to 0x7f instead of 0x00 to 0xff).
  2. Obviously, the loop could be unrolled and movem.l used to reduce the total cycles.
  3. You can reduce the initial constraint where you say: "this initial copy needs to run as fast as possible to make sure the first byte is being written before the hardware reads it" by writing into an offset from the audio buffer. I'll explain that below.

Intuitively we'd write into the beginning of the buffer, and this involves a sort-of "racing the beam" exercise. In fact we don't need to. The buffer is circular, so in theory we can start writing anywhere as long as the audio playback doesn't overtake us and we don't overtake the audio playback.

Conceptually, the most general method would be to write the second half the buffer on the VBL (while the hardware is playing the first half of the buffer from a previous write) and have an interrupt about 9ms later where you write the first half of the buffer (while the hardware is playing the second half of the buffer you just wrote). This is like double-buffering except you've turned the single hardware audio buffers into 2x half-length buffers. That would tie up a VIA interrupt and I'm not sure you'd want to do that especially as you do the MOD processing in the VBL interrupt.

But we can still improve latency by having a starting offset far enough into the audio buffer that by the time we're wrapping round, we can be sure that the hardware will have played it.

Code:
cp:
    /* copy 8 samples per iteration */
        move.l   (%a1)+,%d1 ;12c
        movep.l  %d1,0(%a0) ;24c
        move.l  (%a1)+,%d1    ;12c
        movep.l %d1,8(%a0)    ;24c
        add.l   #16,%a0        ;20c
        dbra    %d0,cp         ;10c Total loop=102c*46=4692c
        move.b  (%a1)+,(%a0)   /* byte 369 */ ;12c
        move.b  (%a1)+,2(%a0)  /* byte 370 */    ;16c =4720c.
    /* audio has been copied to hardware buffer */

That's 0.629ms at 7.5MHz or 0.726ms at an effective 6.5MHz (which is probably closer to the average performance) or 9.8 scans (which is 9.8 samples) .. assuming the faster speed of 7.5MHz.

This means that if you start 8 samples into the hardware audio buffer; play (371-8)=363 samples, then the hardware audio playback will have played the first 9 samples by the time you wrap around (and nearly the 10th), which means it'll have played the first one you've just written at least. Then you fill in the first 8 samples of the audio buffer with the last 8 samples you want to write. If the Mac is effectively running at 6.5MHz then you're still OK, the hardware will have played 11 samples including 3 from the new buffer you've written.

Thus you've increased your allowed latency from approx 64µs to about 512µs. Ironically, if you improved the performance by following (2) then you would have to recalculate the minimum number of samples and perhaps start a bit earlier, maybe 6 samples into the hardware audio buffer or whatever is needed, otherwise your routine could catch up with the hardware audio playback itself.
 

8bitbubsy

Well-known member
Some things about NanoMacTracker:
1) The clicks and pops most likely come from the fact that there is no volume ramping. Sudden volume changes (and sample triggers) are not interpolated, so they cause an audible discontinuity in the voice output waveform. This is the same on a real Amiga, where MOD comes from, and is totally normal. Volume ramping is expensive to implement on such a slow machine, it's simply not worth it.
2) Why tick the replayer in a 60Hz interrupt when you can instead count output samples in the mixbuffer output stage, and then tick the replayer after N samples have been transferred to the H/W audio buffer? This is how a lot of tracker replayers did it in the non-Amiga world, and this way you also get perfect BPM (50Hz base). N (aka. "samplesPerTick") would be something like round[audioOutputRate / (BPM / 2.5)]. A fractional part can be added to the delta and "sampleTickCounter" accumulator to achieve even higher BPM precision.
 

MIST

Well-known member
Lots of nice feedback, thanks. But I don't intend to optimize this any further. It worked for me and that's sufficient. I don't own any Apple hardware and don't really have a further use case.

Just a few remarks:

The buffer handling IMHO cannot just be relaxed as simple as stated here. If I delay the buffer update, then I still need to update then entire buffer as there's no way around updating all 370 samples per VBL. So there will always be the moment where the buffer fill routine meets the point from which the hardware currently replays. And doing it at the very beginning as I do now makes sure I know exactly where the hardware pointer currently is. On the other hand adding any kind of delay wastes precious CPU cycles.

Ticking the replayer from the output stage will require you to count samples during synthesis and make sure you do the update in the right moment. That's sure a nice solution and if correctly done should actually also result in the replayer to be run only during 5 of 6 VBLs. But just not at the end but at varying points.

But that's why I made this open source. If you think things can be improved: Just do it! The code is free, all the tools are free and over on the Atari forum are people who also worked on this code and have also done optimizations and will sure enjoy to discuss your ideas.
 

Snial

Well-known member
Hi MIST,

The buffer handling IMHO cannot just be relaxed as simple as stated here. If I delay the buffer update, then I still need to update then entire buffer as there's no way around updating all 370 samples per VBL. So there will always be the moment where the buffer fill routine meets the point from which the hardware currently replays. And doing it at the very beginning as I do now makes sure I know exactly where the hardware pointer currently is. On the other hand adding any kind of delay wastes precious CPU cycles.
I think you've misunderstood. You won't have delayed the buffer update, you're just allowing more latency (e.g. if your VBL routine is slightly delayed by something in the system).

What you do currently is:
C:
    /* copy from audio buffer to hardware */
    move.l  #0x3ffd00, %a0  
    move.l  samp1, %a1
        move.w  #45, %d0  /* 46*8 = 368 bytes + 2 */
cp:
    /* copy 8 samples per iteration */
        move.l   (%a1)+,%d1
        movep.l  %d1,0(%a0)
        move.l  (%a1)+,%d1
        movep.l %d1,8(%a0)
        add.l   #16,%a0
        dbra    %d0,cp    
        move.b  (%a1)+,(%a0)   /* byte 369 */
        move.b  (%a1)+,2(%a0)  /* byte 370 */

This code will sound exactly the same on a real Mac, but allows for your VBL to be up to 512µs later than you want it to be:

Code:
    /* copy from audio buffer to hardware */
    move.l  #0x3ffd10, %a0    /* 8 samples in */
    move.l  samp1, %a1 /* but we write sample 0 there. */
    move.w  #44, %d0  /* 45*8 = 360 bytes + 2 */
cp:
    /* copy 8 samples per iteration */
        move.l   (%a1)+,%d1
        movep.l  %d1,0(%a0)
        move.l  (%a1)+,%d1
        movep.l %d1,8(%a0)
        add.l   #16,%a0
        dbra    %d0,cp  
        move.b  (%a1)+,(%a0)   /* byte 369 */
        move.b  (%a1)+,2(%a0)  /* byte 370 */
        /* a0 is at the end of the Hw buffer, but we still have
           8 more samples to copy */
        move.l   (%a1)+,%d1
        movep.l  %d1,-740(%a0) /* wrap to start of HwBuffer */
        move.l  (%a1)+,%d1
        movep.l %d1,-732(%a0) /* next 4 samples */

Ironically it might not work on an emulator, which might just copy the Hw buffer after each frame, though I'd have to check this.

2) Why tick the replayer in a 60Hz interrupt when you can instead count output samples in the mixbuffer output stage, and then tick the replayer after N samples have been transferred to the H/W audio buffer? This is how a lot of tracker replayers did it in the non-Amiga world, and this way you also get perfect BPM (50Hz base). N (aka. "samplesPerTick") would be something like round[audioOutputRate / (BPM / 2.5)]. A fractional part can be added to the delta and "sampleTickCounter" accumulator to achieve even higher BPM precision.
Are you talking about this: Line 307-308: "Mac runs in vbl at 60Hz. So skip every 6th run to update at 50Hz."?

That doesn't make sense to me. The previous comment about the ST makes sense. If each sample run is 250Hz (4ms), then 5 of them = 50Hz (20ms).

However, because LEN=370 and INC is set to: 3579546/22250*65536, this sets up the tone tables for 22250Hz. Thus 370 samples will exactly generate a playback suitable for a VBL at 60Hz.
 

8bitbubsy

Well-known member
Are you talking about this: Line 307-308: "Mac runs in vbl at 60Hz. So skip every 6th run to update at 50Hz."?

That doesn't make sense to me. The previous comment about the ST makes sense. If each sample run is 250Hz (4ms), then 5 of them = 50Hz (20ms).

However, because LEN=370 and INC is set to: 3579546/22250*65536, this sets up the tone tables for 22250Hz. Thus 370 samples will exactly generate a playback suitable for a VBL at 60Hz.
I see.

By the way, shouldn't the actual audio rate be 22254.5454 (recurring) and not 22250.0Hz? Derived from 15667200.0 (pixel clock) / 704 (number of video lines). Not a big error, but 22255Hz would be closer than 22250Hz.

Also, vblank is 60.1474201474Hz (15667200.0 / (704 * 370)) and not 60.0Hz. When calculating the delta values, it makes sense to use precise nominals even though the clock is slightly off (tolerance) in every system. :)
 

Snial

Well-known member
Lots of nice feedback, thanks. But I don't intend to optimize this any further. It worked for me and that's sufficient. I don't own any Apple hardware and don't really have a further use case.

But that's why I made this open source. If you think things can be improved: Just do it! The code is free, all the tools are free and over on the Atari forum are people who also worked on this code and have also done optimizations and will sure enjoy to discuss your ideas.
I thought I'd add another comment. It might feel like you've done something great and now people are criticising you, but in reality it's just that you've done something great, so people are interested and want to contribute.
 

MIST

Well-known member
Filling the audio buffers and setting the
Are you talking about this: Line 307-308: "Mac runs in vbl at 60Hz. So skip every 6th run to update at 50Hz."?

That doesn't make sense to me. The previous comment about the ST makes sense. If each sample run is 250Hz (4ms), then 5 of them = 50Hz (20ms).
As explained before, there are two major but rather independent things happening: a) the samples are assembled/mixed into buffer. This happens at 22250 hz or 370 times every 60hz vbl. There's not much to do about this on a Mac as that's how the hardware works. And b) there are times when you need to change the samples currently being used. That is re-evaluated every 20ms/50hz for a MOD. A) is like someone pressing keys on a keyboard. That person needs to press the right keys for the right tone. And b) is someone telling person (a) to change keys. That happens at its own speed/frequency. If person B "runs" at the wrong speed, then the song will run faster. But the pitch will not change.

Really, just try it. If you think my code/comment in that lines doesn't make sense, then remove it, run the code again and see what happens. In this case the MOD will replay 20% too fast but _not_ at a 20% higher pitch. That's the way it was before I changed that. Just take the code, play around with it and see what happens.
 

MIST

Well-known member
I thought I'd add another comment. It might feel like you've done something great and now people are criticising you, but in reality it's just that you've done something great, so people are interested and want to contribute.
I fully understand. They may even be right and I may be wrong. But as long as they just guess and I really do run the code and see what happens then I tend to believe my own ears.
 

Snial

Well-known member
<snip> 22254.5454 (recurring) and not 22250.0Hz?<snip>
This would cause the pitch to be 0.02% too low I think, but that's OK IMHO.

As explained before, there are two major but rather independent things happening: a) the samples are assembled/mixed into buffer. This happens at 22250 hz or 370 times every 60hz vbl. There's not much to do about this on a Mac as that's how the hardware works. And b) there are times when you need to change the samples currently being used. That is re-evaluated every 20ms/50hz for a MOD. A) is like someone pressing keys on a keyboard. That person needs to press the right keys for the right tone. And b) is someone telling person (a) to change keys. That happens at its own speed/frequency. If person B "runs" at the wrong speed, then the song will run faster. But the pitch will not change.
Yes, there's essentially 4 levels to all sample playback sequencers:
  1. Sample buffer to hardware copying (which might be DMA, but it's the beginning of the VBL routine in this case). This is the most hard, real-time code, 22kHz.
  2. Static waveform generation per voice and mixing (if the hardware audio doesn't support multiple, sampled channels like it does on an Amiga or Archimedes, but not an early Mac). This is stereo. Here, we control a static amp and frequency per voice. These are usually decoupled from (1), but run at the same frequency. As long as the buffer audio generation is ahead of (1) it'll playback OK.
  3. Synthesiser, which selects waveforms and ramps volumes according to the envelopes and effects. This is music and runs at a lower frequency, usually fast enough not to hear envelope changes. This performs channel to voice allocation (in MOD files it's 1:1).
  4. Sequencer which is your (b) which selects the notes per voice; their timings; programme to channel allocation. This is also music in this code.
Really, just try it. If you think my code/comment in that lines doesn't make sense, then remove it, run the code again and see what happens. In this case the MOD will replay 20% too fast but _not_ at a 20% higher pitch. That's the way it was before I changed that. Just take the code, play around with it and see what happens.
I've run it and believe it I'm just trying to make sense of it. It's not you, it's me.

Going back to the 5/6 playback rate. OK, I finally get it.

On the Atari ST, it only generates 4ms worth of audio on each pass (but for all 4 parts). It doesn't need to generate a frame's worth of audio on each pass, because sample playback isn't hard locked to video frames. So, on every pass it needs to generate the 4ms of buffer audio (layer (2) above), and on every 5th pass (20ms) it needs to handle the sequencer and synth side of things (layers (3) and (4)).

On the Mac, it generates 16.7ms word of audio on each pass. That's already close to the 20ms sequencer and synth update rate, so 5/6 times it also runs that code (layers (3) and (4)) as well as layer (2), and on the 6th pass it just uses the same waveform generation (layer (2)) as before. Hence the overall rate is 50Hz, or 20ms (or close enough given that the frame rate is actually 60.15Hz not 60Hz).

This means the MOD playback sequencer and synth layers aren't quite right, but most people wouldn't notice. Short drum sample or staccato note triggered every 80ms (approx 12Hz) would sound evenly-spaced on an ST, but on this Mac code would get triggered for 67ms the first time, then 83ms the second, third, fourth, fifth times. That might just be audibly detectable, though my example is rather contrived. It won't make any difference with respect to the voices, because the unevenness would be synced across all of them.

Proper synchronisation means making sure music is called every 22250/50=445 (89x5) generated samples. Yet we need to generate 370 samples per frame as an unrolled loop. The "easiest" way, IMHO is to split the unrolled loop into a separate subroutine (Samp1Gen) and then be able to JSR into the unrolled loop up to two times.

Code:
SampSplit:    ;count needs to start at 445*(Samp1GenEnd-Samp1Gen)
    ;currently 16w=32b, so 7120
    sub.w #LEN*(Samp1GenEnd-Samp1Gen),count ;<0 means two parts, >=0 means 1 part.
    bpl.s SampGenAll
    ;Want to JSR to Samp1Gen-count
    move.w Samp1GenEnd-Samp1Gen,%d7
    sub.w count,%d7 ;correct offset.
    jsr Samp1Gen-.(%pc,%d7) ;correct offset into Samp1Gen
    bsr music
    ;Now we want to do -count samples, so it's Samp1GenEnd+count
    move.w Samp1GenEnd,%d7
    add.w count,%d7
    jsr Samp1Gen-.(%pc,%d7) ;correct offset into Samp1Gen
    
    add.w #445*(Samp1GenEnd-Samp1Gen),count ;
    bra.s SampSplitDone
SampGenAll:
    bsr Samp1Gen
SampSplitDone:

This means deleting lines 307 to 312 ("/* Mac runs in vbl at 60Hz. So skip" .. to "bra.s nomus") and the "bsr music" in line 316; copying the .rept LEN .endr to a subroutine:

Code:
Samp1Gen:
    add.w    %a4,%d1
    ;.. a single iteration
    move.b    %d7,(%a6)+
Samp1GenEnd:
    .rept LEN-1
    add.w    %a4,%d1
    ;.. a single iteration as before
    move.b    %d7,(%a6)+
    rts

And finally replacing the old .rept LEN code with the SampSplit code above and defining count as: count: DC.W 445*(Samp1GenEnd-Samp1Gen) .

Again, as you say you don't plan to modify your MOD code, which is fine - it's just interesting to do some analysis on the code and in this case (fool that I am, I haven't tested it yet), look at correcting a minor timing inconsistency.

Thanks for publishing it all!

-cheers from Julz
 
Top