I wondered why you were all hankering after a pre-emptive OS. Killermike is right; his suggestion is an actually an old Unix idea called “forking”. Obviously, since the A9 is symmetrical (all cores can access the memory) you don’t need to replicate the whole 4MB of the OS, just a unique data area. You also need to split of the API from the OS. Clearly, access to hardware and task management must be centrally arbitrated, let’s say by the Central Control Program (remember Tron?) which resides always on the same core. The key to porting RISC OS to a multiprocessor system then lies in the SWI decode routine. When a task calls its instance of the SWI decode routine, that routine must determine whether it can handle the call or whether it must pass the request on to the CCP. Provided the application behaves well, it does not need to be aware that it is operating on a multiprocessor system.
So, to put flesh on the bones of killermike’s suggestion; A CCP, responsible for allocating hardware, and co-ordinating tasks, is permanently allocated to the first core. When a new task is requested, its API is set up in memory along with the task, and the pair are allocated to a core. Once the task is initialised it returns via Wimp_Poll. It’s API saves the processor state and relinquishes control of the processor core. When Wimp_Poll returns, the CCP allocates a core to the API and task, and starts the API from where it left off.
You don’t need PMT to take advantage of a multiprocessor system, CMT lends itself well to multiprocessing and PMT is inefficient on a single user system. For those who don’t know, the original purpose of PMT was to allow multiple users simultaneous access to a single processor. Windows XP and vista are evolutions of Windows NT, a multiuser competitor to Unix. Multithreading was originally part of the PMT solution. Back in the days of yore (early 80s) I used to manage a Perkin Elmer computer with 8 register sets (of 32 registers). You can imagine the overhead of providing simultaneous access to 5 graphical terminals on a 25MHz system using one register set. So the Perkin Elmer used block multithreading, which is not the same as the simultaneous multithreading you get on the Intel Atom. The Cortex A9 is not multithreaded, and I don’t believe its successor will be either. Instead, it uses something called speculative out of order execution. I am not sure how well it will deal with old calculator stacks.
To take full advantage of the multiprocessor system (one task split over two processors) the task needs to be multithreaded. If Simtec wrote a module to do that, best drag it out and dust it off. Alternatively, you could implement a very coarse multithreading strategy by splitting background activities (Printing, saving files, etc) to a separate task. Without multithreading, the system won’t slow down so much when you add more tasks. With multithreading, a program can speed up as cores become available.
BTW. I'm new to this thread. I stopped using RISC OS 5 years ago (it was a tough call) but I would pay a reasonable fee to be able use !Organiser on a netbook.