• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Studio Session File Format reverse-engineer

Mu0n

Well-known member
If you're not familiar with Bogas Productions' 1986 Studio Session app, then check out this other thread I made:





Reverse-Engineer Goal: Load a Studio Session song file (along with its instruments) and play it back, close to the original in a custom game program (eg in an intro splash screen)

Target Platform: System 6 to probably System 4.2 (ish)

Development Environment: THINK C Manager (using Symantec C++ 6.0 which can deal with both C and ASM code blocks) under System 7.5.5 (emulated)

Group Effort?: if this is something that interests you and you have some keen programming knowledge in ASM and C, then that would be grand! I'm always looking for some tips too.
 

In 2004-2006, I wanted to:

1) increase my C/C++ programming skills

2) figure out a way to do game programming on my old Mac Plus machine

Back in the day (1986-1993), I only had access to the limiting Microsoft Basic 2.0 development platform. While great for learning programming fundamentals, it showed its limit in terms of fluid animation and sound playback fairly quickly. There were loads of ZX80, Spectrum and C64 programming books in my city's library, but very few for the Macintosh. I did have an introductory book with examples in Basic, but I was light years away from having the tools and examples necessary to do real development that approached the commercial stuff.

The goal was to use Studio Session to compose intro/ending music (with not much animation happening on screen) instead of writing a music engine from scratch. While I sadly found no code anywhere on the web, in the hotline retromac68k server or countless various FTPs. I did find a spec document written in 1993 by a MIT student who was interested by the very same challenge!

I was able to use his document somewhat to start gathering the global data of a song file. If I recall correctly, I might have been successful into recording a sound in a modern PC of that 2005 era, convert it a bunch of times into the sound file needed to create a new Studio Session Instrument. They are the same format as used by SoundCaps 4.2 (which can run in System 6 but not System 7).

Project Goals:

Phase 1: Simply interpret a song file and play one track back using the square-wave or one of the tracks from the four-tone synth, just to get pitch and tempo right (ie ditch instrument data and any track that's not #1)

Phase 2: Play up to 4 tracks under the four-tone synth

Phase 3: Mix the 6 potential tracks in the free-form synth - this will possibly involve on-the-fly mixing of wave data as the song is interpreted. This is the real challenge as you don't know in advance how the tracks go into activation or fade out, since the notes can be of various length and the sound files associated with each instruments have different lengths.

Here's the spec document in full:

View attachment Studio Session Song (.sss).txt

 
Last edited by a moderator:

Toni_

Well-known member
This sounds like a great project! And great timing btw also, as I was actually recently this week contemplating about doing something similar also in future for fun (Studio Session file playback for games)  :)  

I noticed from your other thread that Tetris is using Studio Session files, and from this it dawned to me that I actually have recently (last winter) partially reverse-engineered their mixer when I was adding support to our emulator for playing music in Tetris. Here's at least some info that might be of use that I can quickly recall (which is why might not be 100% accurate, more like 97%):

• They actually bypass the Apple's sound driver completely, and use their own carefully optimized (for timing the sound buffer writes) 68k assembler mixer to control classic mac sound hardware.

• Their mixer is able to combine 6 channels in real time, as opposed to Apple's 4 channels.

• They do this by instead of using 22255 Hz frequency for samples, they use roughly ≈11127 Hz sample rate, writing each sample *twice* to the buffer at SoundBase. This allows them to save CPU time in the mixer, allowing the two extra channels without compromising too much sound quality (185 samples mixed per VBL task compared of 370).

• Otherwise their mixer appears to be pretty close to the standard Apple mixer, with fixed-point values used for tracking frequency and phase for each channel. I also recall, that have very cool hack that instead of fetching those values from memory like Apple's FTSoundRec, they actually write those floating-point values directly *into the mixer 68k code* as immediate load instructions, so that they save CPU time by avoiding extra fetch from memory.

• I don't know yet about how they handle interpreting the song tracks to get the instrument frequencies and samples, but I'm wildly guessing they're probably doing this either in a callback in the VBL task, or from another VBL task...or maybe even from main loop like Zero Gravity does with regular sound driver. 

I can drop in more technical info later - I'm about to get on summer vacation for a few weeks, so I have to get back to you.

 
Last edited by a moderator:

Toni_

Well-known member
Sorry I forgot to mention, one another difference from Apple's four-tone synthesizer is absence of the 256-byte sample length limitation of wave table instruments (they don't clamp the offsets in channels at byte length), which enables them to use much longer audio samples for instruments & get better audio quality.

Also, they avoid needing to divide the summed sample by 6 (channel count) by using a lookup table of 6x256 = 1536 elements which contains precalculated results of division by 6 (Apple's fourtone synthesizer does the division with 4 by shifting the summed sample right two bits).

 

Mu0n

Well-known member
Awesome replies, I got so many questions:

-Did you read all of these tidbits somewhere? A technical note? Or just raw attempts made by yourself in the past?

-The lookup table of 6x256 elements, how is that compatible with instruments lasting variable lengths of time?

-What do you put in those 256 elements? The attack of the sound? its sustain? both?

-Doesn't dividing by 6 mute a single instrument too much, if it's being played solo?

-Could you just cross your fingers and hope the sample data for each instrument is not often at its peak simultaneously and perform a simple addition without risking too much saturation? Or at least, do something other than a crude division by 6 (division by sqrt of 6, say)

 

Toni_

Well-known member
Awesome replies, I got so many questions:

-Did you read all of these tidbits somewhere? A technical note? Or just raw attempts made by yourself in the past?
Hi,

I found all this stuff out when reverse-engineering the tetris code as 68k disassembly, and investigating the program flow using debugger (to make the sound work on our emulator) - and Apple's four-tone synthesizer I figured out by disassembling the 68k source for DRVR 3 driver I extracted from the Mac Plus ROM :)

-The lookup table of 6x256 elements, how is that compatible with instruments lasting variable lengths of time?

-What do you put in those 256 elements? The attack of the sound? its sustain? both?
- The lookup table they use is just a optimization to avoid division, basically it's a very small part of the overall mixer (where they sum up individual bytes from each of six channels and divide by six to get average). For example, if the samples would be hypothetically something random like 117, 39, 5, 220, 0 and 0, the sum would be 381, which when divided by six would be 63 - they avoid this division by stuffing value 63 into the lookup table at index 381 for quick access (byte values in the table would be [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3....<continued total 1536 elements>... 255, 255, 255])

- This mixing is just what's done for each individual 8-bit sample passed to sound buffer, it's repeated for each byte in the instrument over and over again. From what I've seen from the docs you linked, the instrument format is quite simple and just contains loop information, so actually any attack/decay would probably be pre-sampled into the part before loop, sustain would be the looping part, and release would be the part following loop. This way the mixer has always only to fetch one byte at a time from the instrument, and use the loop information just to know when to wrap into beginning of loop, or when the end of instrument sample is reached. This part about ADSR envelope is however hypothetical, as that wasn't part of the code I disassembled during debugging (but is how it would be sensible to make it performance-wise).

-Doesn't dividing by 6 mute a single instrument too much, if it's being played solo?
- It's true that the amplitude will get down a bit when playing solo instruments, but with fixed and small amount channels I think they've usually considered it to be negligible (I'm only 95% sure about the 6-channel mixer, without checking the disassembly, but at least Apple's fourtone mixer doesn't have any compensation for this - and I'm pretty sure the studio session 6-channel probably won't have either as on classic mac the CPU time is very tight so they most likely would have just been ok with it). It should be noted that there are techniques to avoid this amplitude loss ("muting"), for example dividing two summed samples by square root of 2 instead of two. However my knowledge on this part of higher-end audio mixing is a bit limited, but my friend Pukka knows a bit more about this (he's written even his own MOD music player for mac, and he told be about this square root of two thing).

-Could you just cross your fingers and hope the sample data for each instrument is not often at its peak simultaneously and perform a simple addition without risking too much saturation? Or at least, do something other than a crude division by 6 (division by sqrt of 6, say)
- I think the technique of dividing by square root is actually is based on this "hope" that the summed samples would not exceed maximum amplitude, and is (from what I've heard!) used in more advanced mixers, especially when the number of sound channels is not constant.

Sadly I don't (before end of summer vacation) time to revisit the live disassembly of mixer, BUT I found an old screenshot (which I originally used in our blog) about tetris mixer, which shows the particular part which does the aforementioned mixing:

2019-01-03 disassembler.png

It's really old screenshot, so some of the disassembly is not completely accurate (Pukka has since developed the disassembler further). However here's a few interesting parts:

• Code at 30B9A and 30BA0 loads the active 6 channel instrument pointers into A0-A5, and current offsets into D0-D5

• Code between 30BAA and 30BC8 increments the offsets, using the immediate fixed-point values I mentioned earlier (frequency of channel)

• Code at 30BD0 clears  D7, which is where instrument samples are summed into from channels

• The four instructions swap, move.b, swap and add.w are repeated for each channel, to sum current sample byte from instrument into the D7. The swap commands are just used to switch fixed-point number's integer portion temporarily to low-order word for add.w, and move.b is used to fetch the sample byte from instrument at memory address A(0-5)+D(0-5), where Ax is instrument address and Dx is offset inside the instrument's sample data

• The lookup table (6 x 256 bytes) is assigned to A0 at 30C10, and the "pre-divided" sample byte is copied to hardware audio buffer at 30C14 and 30C18 (the instruction at 30C18 should actually be move.b (0,A0,D7.l*1), $0002(A6), there was a disassembler bug there still at that point). Note that this copy is done two times here for same output value, which is where the output sample rate is effectively halved from 22255)

• This sample mixing process, code from 30BAA to 30C24, is repeated 185 times

 

Mu0n

Well-known member
I made some progress last night on the Studio Session file interpretation front. I have the file header interpreted right, incluring some empty bytes acting as a frontier between the header and the track data. I have track one being played back using the square wave. The note length seems ok, but I'm doing something wrong with the key signature and/or accented notes in my switch case method that fetches the necessary note frequencies (from my code recycled from 2005). Also, some notes will seldom sound very high pitched, which is not what you hear of course when you play the song in Studio Session. I'll have to walk through a song with key presses and the info on what my interpreter thinks it needs to play and visually compare it to the song partition.

If anyone wants to hear the buggy state of the program I have so far, I'll make a quick video of it, otherwise, I'll fix the bad note pitches first. I'm worrying about having to create lookup tables for a given key signature to generate the proper frequencies for thirds, fourths, etc and avoid the slight inaccuracy from equal temperment frequencies (ie one lookup table to fit all keys, which makes many intervals sound wrong). What if a song often changes keys during playback? Will I have enough memory for all this? I haven't even started on mixing instrument soundwave data.

 

Crutch

Well-known member
That sounds really impressive. On the equal temperament point, are you thinking Studio Session originally used some sort of just temperament that depended on the key signature? I would have definitely assumed it did not do that, and simply used a single set of frequencies for each note (equal temperament). 

 

Mu0n

Well-known member
Bugs fixed:

-one-off errors at octave detection

-one-off errors at note detection

-proper accidentals management

Here's a quick demo of a chromatic scale in C major, followed by a major and then minor arpeggios. It's played in Studio Session itself first, then launched by my own program, SSPhase.



 
Top