Unless I've misunderstood, they're talking about the "general speed of rectangular drawing operations", rather than the transfer rate from the board to the screen, which is what you calculation is of (I think). So, the boards can shift around several screens' worth of pixels around the screen per frame.
"How do you plan to do this, without introducing a new Poll call?"
I plan to add a new poll call.
But only for new applications; old RO apps get to see the same behaviour as before.
Actually on umpteenth thought, even that may not be necessary.
How about a Poll response code (for modern applications, and only when requested) that indicates that the application's request has been queued, but the Wimp is busy. The application may then do some non-Wimp work (decoding video or audio, for example) before trying again. As long as it doesn't wait too long between tries and doesn't hog the processor in the mean time, it should provide a good, RO-like, user experience.
"There really aren't any shortcuts here, either you do things properly (which is now beyond our resources), or you'll just make the user experience worse."
It doesn't have to be that bad.
Keep some things cooperative (opening/moving windows, RO-style inter-process messaging, user input), but allow multiple processes to draw their windows simultaneously and use other IPC mechanisms for things like sound. In that way, you can have your video playing uninterrupted on one window while another task is rendering a draw image.
When you move a window, window updates pause temporarily, but you're concentrating on moving the window (and the sound from the movie should continue uninterrupted).