They stress the importance of 24-bit compatibility, Given the 16-bit data transaction limitation of LC PDS, The NuBus adaptation would need only handle 16 data bits using a couple less address bits than the 24 available in that transaction mode?
There are still two different things floating around in your head that are sort of irreconcilable. Yes, Nubus supports byte and halfword transfers. (And unaligned transfers, which on the Motorola 68k are a fact of life because even though the 68000 architecture was designed as a "32 bit" ISA the original implementations were effectively 16 bit CPUs (one variant of which actually had an 8-bit bus) and instructions can therefore be aligned "off" from 32 bit words. x86 has the same issue. So any bus for these systems needs to be able to handle the situation.) Here's the thing about Nubus, though: *by definition* it always uses 32 bit multiplexed data/address words. You can't run a "24 bit subset" of it even if your plan is to only land a 16 or 8 bit peripheral that only needs some subset of the address lines. "Unaligned data support" simply means that if a read/write is split on an awkward non-word border it's smart enough to do two bus cycles and use the correct byte lanes to complete the transaction, instead of just losing half the transaction.
Let's go crazy here and pretend that you're intending to stick something like a Motorola 6821 PIA on a Nubus card. This is an 8-bit chip that has two 8 bit parallel ports and two sets of control registers. This effectively means that on an 8-bit computer you need 2 "address lines" to select between the I/O ports and the control registers. (Obviously this does not include the address lines you need to decode the memory address at which the chip actually resides in the address space, but let's pretend we're doing this in something like an Apple II that has pre-decoded slots that give you a "this line is active if this card is supposed to be" line; I believe Nubus in the Mac also has this, which makes "simple" Nubus cards a little more straightforward to build than, say, ISA cards, which do have to have full address decode on them.) What that document is saying about "narrow cards" (which technically is a different issue from 'non-aligned transfers') is that, sure, with our incredibly brain-dead 8 bit card we can take some shortcuts and effectively not care about the upper 24 bits of data we get from *either* the address or data cycles of the multiplexed Nubus lines if we place the PIO's data lines on the correct side of a full 32 bit word and specify that when we're communicating with this chip we specifically address the parallel ports and registers on word boundaries. (IE, instead of the PIA occupying "32 bits worth" of memory address space it's going to occupy 128 bits.) This mechanism is explained on page 18 of that document you linked to, under the header "data byte placement".
THESE SHORTCUTS SPECIFICALLY DON'T APPEAR TO APPLY TO RAM-LIKE CARDS, WHICH A VIDEO CARD IS.
Let's say you're sticking a video card with an 16 bit memory bus in a Nubus slot. You do not have the option of 32 bit word aligning a 16 bit RAM by declaring you're only using two of the four byte lanes, because that would effectively mean that every-other-halfword of the video buffer would be missing. Macintoshes expect packed linear framebuffers, and when they're doing a block copy, move, memory fill, whatever, they expect to treat the RAM on the video card just like it's main memory, IE, I can just copy this from here and splat it over there.
I'm pretty sure this is where the comments about PDS having the advantage of working directly with the 68030's dynamic bus sizing mechanism come from in the Apple card development manuals; a Nubus card can declare in its declaration ROM that it's only 16 bits wide, which the Slot Manager will honor at initialization (IE, it'll know to do word-level hopscotch when reading the rest of the ROM and piece it together in RAM rather than trying to execute any code on it in place), but it doesn't actually flip any hardware bits that will automatically map full word writes to only use the two byte lanes you stuck your VRAM on, it'd be on you to write a unique video driver (and probably rewrite Quickdraw) to deal with your weird non-linear vide buffer. So unless there's something that says otherwise in the video card section it looks to me like that just flat out won't work, and if you want to have an 8 or 16 bit wide VRAM on a Nubus Mac you're going to have to do it by sticking a 32 bit buffer on the Nubus end and blocking the subwords into your narrow memory with additional cycles. This might not be a big deal if your VRAM can cycle massively faster than the Mac can (See Trag's comment about how fast the 16 bit DDR interface on some of these FPGAs can go), but this *does* pose a barrier to literally taking an LC-slot video card and adapting it to Nubus. Best case the card will run about half as if it were 32 bit, all else being equal.
I've gone through the Apple development documentation looking for more information about byte lanes and, again, I could be wrong but I don't any indication that declaring the card as "narrow" in the byte lanes specifier does *anything* other than affect how it handles the ROM (apparently even 32 bit cards like the Toby video card that's used as the video card example, and make no mistake, its VRAM is 32 bits wide, being composed of 8 or 16x 64kx4 RAM chips, might have an 8-bit declaration ROM). Every indication to me is that a RAM-like card (verses some sort of register/data port device) needs to be able to deal with 32 bit read/writes.