Think I've realised why you might see this differently.
On OS 4 you cannot easily queue the commands. On OS 6, command queueing is used on both R128 and VPod (because of the use of data aborts to stall and unstall the pipeline when direct memory access occurs), which is why the OS 6 performance is faster for all cards.
In isolation, I agree you'd see little difference. But when you queue them together, polyhline operations are significantly faster than their direct memory access fill equivalents, and yet the podule bus is still the bottleneck here.
As Rob's pointed out, podule bus logic may also come into play.
In either case, the numbers will speak for themselves - would anyone care to write a quick BASIC app or two to test graphics drawing ops on the different drivers?