For the Altivec acceleration, I remember TenFourFox once made a point of advertising that it had used it in a _lot_ of places, including some you just described as not amenable to vectorization, which makes me wonder how that was done. I don't know how relevant or useful this observation may be, but I believe its source code is still public if you want to see the details. For that matter, I think the author (Cameron Kaiser of Floodgap Systems, if memory serves) still posts here sometimes, though I don't remember his username if so (the one I thought he uses doesn't come up in autocomplete).