• Hello MLAers! We've re-enabled auto-approval for accounts. If you are still waiting on account approval, please check this thread for more information.

MODTracker audio replay on early 68k macs

Any news? Can I remove the code from the NanoMac repository?
I guess that's a question for @Mu0n . I'm using their repo.

In the meantime (to go off-topic) I was musing about how well a PC at the time could handle MOD playback, assuming an equivalent sample playback to a Mac Plus (which is reasonable, since it's pretty simple hardware). My inner loop code would look like this:

Code:
;8086 version.
;ES:BX^volume table, BH=volume table page.
;DS:SI^waveform.
;ES:DI^samples buffer (byte)
;CX=fractional frequency
;DX=whole frequency
;AX=Fractional position
;SI=whole position.
;BP^ Stack frame?

    .rept SAMPLES
    add ax,cx ;FracPos+=FracFreq 2c:2b, so 3c
    adc si,dx ;WholePos+=WholeFreq 2c:2b, so 3c
    mov bl,[si] ;Waveform sample  5c:2b, so 6c (2x memory fetches)
    mov bl,es:[bx] ;volume adjusted  5c (0c for seg override) 7.5c (2.5 mem fetch)
    add es:[di],bl ;add to sample buffer 7c (3.5 mem fetches) 10.5c
    inc di ;6 instructions 2c:1b =2c.
    .endr
   
    ;2*3+10+7=23c.
    ;However, because it's an unrolled loop the BIU will empty and so real cycle counts should be
    ;based on bus cycles.
    ;So, real calculation is 32c per loop per track.
    ;So, an 8MHz 80286 with a 22050Hz hardware sample buffer would use
    ;32x22050=705,600 cycles per track, 2.8MCycles for 4 tracks or
    ;35% of CPU at 8MHz.

Initial cycle calculations are for an 8MHz PC/AT, the fastest PC at the time of the Mac Plus. Although the 8086 architecture has a lot less register state than a 68000 and address registers can only handle 64kB at a time in Real mode; a PC/AT would easily outperform a Mac for the same task, using only 35% of CPU at 8MHz. There are a few reasons for this:
  1. The 80286 on a PC/AT is a much bigger CPU than the 68000, with 134K transistors, over twice as many.
  2. The 80286 only requires 3 cycles for a bus transfer. in an unrolled loop this effectively determines the speed of the algorithm. The BIU's buffer gets filled up when instructions involve internal clock cycles. However, most of the instructions in this loop don't. This means that the BIU is usually empty, waiting for a new instruction to be loaded.
  3. The 80286 has a dedicated Effective Address ALU which can calculate any effective address combination in a fixed time (I assume 1 cycle based on the Intel document I used). However, even if EA calculations were somewhat slower it wouldn't affect the timing much, because the EA calculation cycles could be filled with BIU fetches.
  4. The algorithm requires no better than 16-bit calculations, so the 32-bit performance improvement offered by the 68000 doesn't offer an advantage.
  5. Clever uses of segment registers gives us enough register to play with (and in fact BP is still free for use as the frame pointer).
It's worth thinking about point 5 a bit more. In this code, both the amp adjustment tables and sample buffer use the same segment (ES). How does this work? There are only 64 volume levels and 256 source sample values, so the volume tables need 16kB. By setting the sample buffer to an allocated segment: P, using 370b rounded to 512b and a given volume table to offset (512/16), then ES=P and BH=512+VolumeLevel*256 on entry to the routine.

Similarly, If a given waveform starts at 32-bit address W and we are M samples into the waveform, then at the start DS=(W+M)>>4 and SI=(W+M)&15. This ensures that 0<SI<16 on entry and can generate a full-buffer's worth of samples by the end of the tick even though a full waveform can be well over 64kB (and over 128kB including the repeat sections).

We can compare this with an 8086 or 8088 timings based PC of a similar era. Here, EA calculations will take 5 cycles for [SI] and [BX]. Reg,Reg and INC reg operations take 3 cycles; ADD [DI],BL is 24; MOV BL,[SI] is 12+EA and ES is 2 cycles. So, then we have: 3+3+(12+5)+(12+5+2)+(24+5+2)+3= 76 cycles per loop. At 22050kHz that's 1.6MHz per track, about 2x slower than a 80286. An 8MHz 8086 PC (such as the Olivetti M24) could perhaps just manage 4 tracks at 84% of CPU, assuming that the rest of the playback engine could run in 15% of CPU.

Conclusion
Motorola 68000 fans (like me) deride (and derided) the 16-bit PC and PC/AT architectures of the day, but Intel were able to make up for its deficiencies for many practical applications. In this case, because it's surprisingly easy to work around segment limitations.

Anyway, back to getting the MOD Tracker to not crash loading arbitrary .MOD files!
 
@Snial since you're spending countless hours on this and I'm spending 0 hour on this, it makes more sense if you start a github repo for it. Feel free to borrow the little code I added last time I posted something. It makes no sense for me to host the code and for you to work on it, let's keep it at the more active account.
 
@Snial since you're spending countless hours on this and I'm spending 0 hour on this, it makes more sense if you start a github repo for it. Feel free to borrow the little code I added last time I posted something. It makes no sense for me to host the code and for you to work on it, let's keep it at the more active account.
Honestly, I'm not spending countless hours on it, I'm mostly just posting comments musing about it as that's easier. I've got my Init code written and I'm testing it in Xcode at the moment, so I'll be able to find out if it ought to stay within memory bounds fairly soon. If you can cope with me to continuing to use a branch on your Repo, I'll do that for a while. But I will switch.
 
Honestly, I'm not spending countless hours on it, I'm mostly just posting comments musing about it as that's easier. I've got my Init code written and I'm testing it in Xcode at the moment, so I'll be able to find out if it ought to stay within memory bounds fairly soon. If you can cope with me to continuing to use a branch on your Repo, I'll do that for a while. But I will switch.
I spent a bit of time on it today. I think I've basically debugged my 'C' version of the Init and Perform: assembly code, in a standard 'C' test harness. It still crashes on the actual app. I think the next step is to get better debugging on miniVMac to work. Not sure how.
 
In the meantime (to go off-topic) I was musing about how well a PC at the time could handle MOD playback, assuming an equivalent sample playback to a Mac Plus (which is reasonable, since it's pretty simple hardware). My inner loop code would look like this:
For what it's worth, there were PC MOD players at the time that worked on the internal speaker (generating PWM on the fly, similar to speech on the 8-bit Apple II). I don't recall how fast of a CPU that needed - I saw it running on a 386/16 I believe.
 
For what it's worth, there were PC MOD players at the time that worked on the internal speaker (generating PWM on the fly, similar to speech on the 8-bit Apple II). I don't recall how fast of a CPU that needed - I saw it running on a 386/16 I believe.
An 8MHz 80286 is fast enough to support at least 6 MOD channels with a Mac Plus and earlier audio (which is also PWM), see my earlier listing. PC/AT class machines had DMA, so I suppose one could could hook up a timer at 22kHz to trigger an 8-bit DMA transfer to another timer reg controlling a PWM duty cycle running at, at least 5.6MHz (256 x 22050). Otherwise you need to generate two interrupts every 22kHz at a 5.6MHz resolution. INT takes 23 clocks so, probably INTR will too.

2.8M cycles are used for 4 tracks. Assuming 0.5M cycles are used for music processing, that's 3.2, leaving 4.8M cycles. We need interrupts at 44.1kHz. This leaves about 100c per interrupt. So, yes, it should be possible on an 8MHz 286.
 
Is there a 286 based Macintosh or do you plan to do this under dos? I really seem to remember that those machines do have their mod players already. So this would probably significantly less impressive than a classic Mac version.

Maybe someone else finds the time and interest to actually finish a nice mod player for the classic Macs. It has at least been proven that this is possible.
 
Maybe someone else finds the time and interest to actually finish a nice mod player for the classic Macs. It has at least been proven that this is possible.
Nicely done MIST. There is also libmikmod, I used it decades ago for a demo for ppc mac, but there should be a 68k version as well, e.g:

This one might work? oh never mind the player is only Power Macs, but anyhow, the library (and source) exists for 68k, and it should have a fairly good support for various mods
 
I wonder why there is no player for the classic compact Macs if this library works on them and exists since such a long time.

The Mac world is somehow odd. Lots of things that "should work" and "could easily be done" yet it seems noone ever actually tried. Somehow the opposite of the Amiga and Atari ST world where people usually address what's not possible ... only to find a clever programming hack that finally does the impossible.
 
I wonder why there is no player for the classic compact Macs if this library works on them and exists since such a long time.

The Mac world is somehow odd. Lots of things that "should work" and "could easily be done" yet it seems noone ever actually tried. Somehow the opposite of the Amiga and Atari ST world where people usually address what's not possible ... only to find a clever programming hack that finally does the impossible.
I don't know much about the 68k macs unfortunately, as my first mac was a PowerPC, but I agree, there is definitely a cultural difference. I think one reason is that Apple made so many different machines with many different cpu's and chips. They were heavily pushing that you should use their toolbox and not talk directly to the hardware so that things stay portable and continue to work. I think something must have changed after the Apple II, as that one had lots of games and many would hit the metal directly to do impossible things, and there were also a bit of demoscene around it.

Of course, if you just bypass their toolbox and do things specifically for a particular set of machines, you can do "impossible things", but it is a long road as few have gone there. I wanted to find a way to get a reliable vbl sync on the 5200, ended up disasm the driver and do a bunch of experiments, found a way that kind of works but not as good as I had hoped. These are things that is just a web search away on other platforms.

On the flipside, I think many new things remain undiscovered on these machines. I have tried push things on the Performa 5200 (my first mac) and got to do some stuff that 12y old me would have flipped out seeing, but not many have the 5200/5300/6200/6300 machines so the audience that can appreciate it gets very small fast.

In contrast, amiga, c64, etc, had fewer variations so lots of people had them (and still do) and a culture of hitting the hardware directly. And way more gaming oriented as well so you had more game developers who wanted to push things and gamers who would appreciate it. PC was a big mess of different hardware, but I think it was just large enough market that developers took the time to go very low level to squeece out max from each cpu, graphics card and sound card, and DOS kind of encouraged it as it didn't offer much.

If you are into these kind of things, it would be awesome to have more folks doing things in the demoscene for mac :)
 
It seems there's not enough motivation on the Max side to get this into a nice end user usable state. This is surprising as the old Macs seem indeed to be quite capable. But that's fine for me. I will remove this half finished code from the repository as my device is mainly meant to be an Atari ST and an Amiga and those do support mod playback quite well and this player is this not really needed, anymore.
 
The Mac world is somehow odd. Lots of things that "should work" and "could easily be done" yet it seems noone ever actually tried. Somehow the opposite of the Amiga and Atari ST world where people usually address what's not possible ... only to find a clever programming hack that finally does the impossible.

Not sure if you're serious, but give it some time? There are some pretty amazing new developments in all aspects of vintage computing but nobody is in a hurry.
 
Sure. Take your time. But I personally have no need for the player, anymore. So I have deleted my unfinished version.
 
It seems there's not enough motivation on the Max side to get this into a nice end user usable state. This is surprising as the old Macs seem indeed to be quite capable. But that's fine for me. I will remove this half finished code from the repository as my device is mainly meant to be an Atari ST and an Amiga and those do support mod playback quite well and this player is this not really needed, anymore.
I still plan to make it more usable, but I haven't been working on it for a couple of weeks. It almost plays properly with my current version (which loads .mod files; allocates space + workspace and runs the preparation code from 'C' versions).
 
I am sure you are already quite busy with the m0ebius. Maybe one day you need to test your emulators performance and accuracy. Then you might have a use for this.

I'd like to see your emulator running. So far I think you'll still face a lot of things that'll need additional CPU time like forwarding IO data between both cores and lots of special cases like odd CPU flags handling and the like. But I'd enjoy to be proven wrong. A tiny yet powerful 68k CPU emulator could even have fun use cases inside a real machine.
 
I am sure you are already quite busy with the m0ebius.
It's a bit stalled, most of my projects get stalled for a while and then resume.
Maybe one day you need to test your emulators performance and accuracy. Then you might have a use for this.
Yeah, the MOD player! Well, it's a good question, because although things like MOVE reg,reg aren't much faster on MØBius (Shift+Option+O); MOVE reg,disp8(An, Am|Dm) are very much faster. MOVEP is also much faster. 32-bit calculations are about as fast as 16-bit. Jumps are pretty fast; shifts use the barrel shifter; MUL is super-fast (DIV uses a software routine, but it's still faster than a real 68K).
I'd like to see your emulator running.
Thanks!
So far I think you'll still face a lot of things that'll need additional CPU time like forwarding IO data between both cores
Some IO will still be synchronous and done on the CPU core for these kinds of reasoning. IO that requires background processing is for the second core. For example, the video/DMA/PIO and Audio PWM is done via the second core. The keyboard and mouse emulation is on the second core. But SCC and VIA implementations are likely to be synchronous.
and lots of special cases like odd CPU flags handling and the like
Probably. I use ARM flags directly to implement M68000 flags, rather than doing what mot emulators do and have routines that test a result for each flag and copy the bits into the emulated CPU's context. Instead I move the ARM flag result of a calculation into kRegCcr which is an ARM register (whereas SR is in the CPU context).

There are some consequences to this. Firstly, I don't translate ARM flags to the M68000 format flag bits unless it's a MOVE CCR,dn ; type instruction, because those instructions aren't common. Conditional branches therefore just vector to the equivalent ARM jump based on kRegCcr. However, that doesn't work for all cases, because a few M68000 BCC conditions don't match a single ARM BCC. Also, subtract is very different on ARM vs M68000, because on ARM it's dest = dest+~source+1 not dest-=source, so the carry is inverted. That needs to be adjusted for. It's still much faster than calculating all flags explicitly.

However, as you say: "odd CPU flags" are likely to come and bite me at some point, it's a recipe for subtle errors!
But I'd enjoy to be proven wrong. A tiny yet powerful 68k CPU emulator could even have fun use cases inside a real machine.
Yes, it's about fun!

-cheers from Julz
 
Back
Top