The podule bus is only the bottleneck for operations that hardly benefit from acceleration in the first place (e.g. line drawing). These operations have little or no impact on overall desktop performance.
If you would expend the effort to make them as fast as possible anyway, it's simply a matter of limiting the amount of data going across the podule bus. There's no difference in potential for that between ATI chipsets and SM50X. Their programming models are virtually identical.