In reply to James: Its possible to have multiple messages in flight between two tasks, but most applications will asume that the first datasave has been abandoned if they receive another one with a different session id.
In reply to Stefano: It was possible to use an external maths co-processor with older processors such as the ARM2,3 and 7, and WSS even managed to get some useful gains from farming out FP from an ARM7 to the x86 second processor. However with faster processors the overhead of synchronisation and data transfer via shared memory (especially as the StrongARM and later lack cache snooping) far exceeds any benefit from an external accelarator. The FP unit needs to be on chip taking instructions from the main pipeline and gaining direct access to the data cache.
As more sophisticated audio and video codecs come in to use, mobile and embedded processors are increasingly gaining FP support. Many recent ARM cores now have integrated Vector Floating Point co-processors. These only work single precsision and not compatible with the old ARM FP instruction set, but given a suitably adpated FPem module, would give significant benefits for some code.