just programming your display memory address generation to do the necessary crazy-stride through RAM on the output side
... it actually just occurred to me an alternate 4th-dimensional way to handle this that wouldn't change the scan on the *output* side would be to do address translation on the input side. IE, you'll need to have a read/write buffer on the Mac side anyway (even if you did 16 bit PDS instead of Nubus you're still going to have the SDRAM controller and the video arbitration in the way), so to handle this you could implement address translation circuitry so each read/write to display memory is remapped to the correct location in the framebuffer.
The advantage of this approach is it keeps the output RAM bandwidth requirements from changing, they stay the same for every mode whether it's portrait or landscape. But it has some pretty crazy side effects on the Mac side. The worst case I can come up with is actually 1-bit B&W mode; for every 32 bit word (32 pixels) written/read by the Mac you have to read/update a single bit out of 32 different memory locations. The read part also applies on the outside, but that shouldn't be a problem if you build the hardware to handle true color because in that mode you'll have to read
two 16 bit words for every pixel anyway
(*); 1 bit color rotated "only" requires the same number of reads as 16 bit color, IE, a word per pixel. But if you do it on the inside it would add a lot of latency for every read/write operation. Of course, the Mac side is going to be so slow compared to what the hardware needs to be able to handle for the video output would it necessarily be a problem?
(*) Note what I said here means I was probably wrong when I said this:
(if you have 16 bit wide RAM like the FPGA board has then 32 bit color in portrait needs twice as many reads per pixel as true color landscape.)
Not sure where that came from, either way you're still reading two contiguous 16 bit words for each pixel, I think I must have confuzzled that with the fact that the transaction bandwidth requirements *do* go up for every mode less than 16 bit color compared to landscape, because for all of them you'll need to do one fetch per pixel and extract the bits you need, verses getting multiple pixels for each read. The one gotchya I wonder about is since we're dealing with SDRAM I wonder if it's faster reading from linear addresses verses random ones? (My eyes glaze over trying to remember whatever I try to remember whatever I ever knew about "page mode" and such.)