
Of C Libraries and RISC OS
Published on 3rd Jan 2005, 14:35:35, source is drobe.co.uk
By Peter Naulls
Unixlib versus the World
In my latest article, I'll again talk about some of the obscure, but very important, issues related to porting work I've been doing which are also crucial to many RISC OS programs. I've chosen to discuss the issues involved with C and C libraries on RISC OS. This is an even longer article than my usual offerings, but if you can hang in there, I hope you'll come out with a much improved understanding.
RISC OS and C
As you many be aware, the majority of new RISC OS programs, and a considerable proportion of older programs, are written using the C programming language. Indeed, much of RISC OS itself is written in C. I'm not doing to discuss C in great detail, since you'll probably be aware of it already at some level, and the details of programming it are mostly irrelevant to this article. What's most important to understand about C is that to ultimately do anything useful, programs need to either call functions which do some processing, or call the operating system services in some way (possibly by a SWI or assembler wrapper on RISC OS).
Apart from code provided by the program itself, these functions are stored in libraries, and are combined with the program during compile time (with headers referring to library functionality used during a compile and the final executable created from program code and library code at link time).
C Libraries
There's an important distinction to be had here. In the first instance, there are libraries written in C (and sometimes with small amounts of assembler). DeskLib and OSLib are both examples of these. These types of libraries will mostly not concern us for the purposes of this article. The other type of library, whilst still written in C, contains functionality that is required to be there for any C implementation. The two libraries on RISC OS that fulfil this role are the SharedCLibrary and Unixlib. I'll refer to them, and others in this article as "a C library".
The functionality required to be present in a C library is covered by various specifications including the ANSI C specification and C99. The behaviour of the C compiler is also crucial in ensuring that specifications are met. There are also further degrees of functionality that may be met by a given implementation. For example, Unix systems will try to meet requirements laid out by POSIX and several related specifications. There are also GNU extensions, and under RISC OS, specific behaviour Acorn decided was required, and therefore implemented by RISC OS C libraries.
RISC OS C libraries
One of the questions you might ask is why RISC OS has two libraries. The question is rather more complex than it might first seem because of various versions of the libraries and ways of generating code that uses them.
The short(er) answer is that they meet two different sets of requirements. The SharedCLibrary was originally developed by Acorn to support C (and other high level language) programs under RISC OS. Its primary aim is to fully and correctly implement the ANSI C specification (and later C99), and little more - although it has had a few accretions over time. It's called the SharedCLibrary because it's a module and hence shared - programs talk to it via a small amount of code called stubs. This sharing means that it doesn't have to be combined with each and every C program on RISC OS therefore making them bigger, and also has the benefit that if bugs are found, a new version can be soft loaded. The SharedCLibrary has appeared in ROM in RISC OS machines for a long time, although many RiscPCs will now be running a soft-loaded version provided by Castle. We'll talk a little about this later on.
The SharedCLibrary has filled these roles admirably over the years, providing services to countless RISC OS C programs. The problem is that when it comes to converting programs from other platforms, (i.e. porting) the SharedCLibrary may let you down. In fairness, Acorn did provide TCPIPLibs, which is an implementation of BSD sockets, used by Unix networking programs. They also did provide their own "unixlib", which had a handful of functions often used by Unix programs, and plenty of programs have been ported by using the SharedCLibrary, but it's a long way from a satisfactory and complete solution.
To facilitate porting on a larger scale, and address issues it would be hard to resolve by simply using the SharedCLibrary or building onto it, it was decided that the best solution was to start from scratch around 1995. In truth, this wasn't entirely why it came about - Unixlib was originally created to support the GCC 2.4.5 port to RISC OS - but the result was the same. Unixlib became an alternative to the SharedCLibrary. Both are compliant with the ANSI C and C99 specifications, and except for a small number of cases due to requirements of greater functionality, Unixlib will work exactly the same when used to compile a program that also works under the SharedCLibrary.
As an aside, the Cygwin DLL under Windows performs much the same role as Unixlib does in RISC OS. But on most, but certainly not all, the norm is to have only one C library.
Why Unixlib?
The point behind Unixlib becomes obvious if you've ever tried to compile a program written for a Unix system on RISC OS. Whilst it's true that there are many programs which are well behaved and don't use any non-ANSI features and will work as expected using the SharedCLibrary, there's many more that won't compile - they use non-standard features, GNU extensions, or functions simply not specified in standard C. Indeed, Unixlib offers a whole host of advantages over the SharedCLibrary:
- Filename translation - Many Unix programs assume filename formats quite different to those in use in RISC OS. We covered this in a this article in detail. The upshot is that the default behaviour for Unixlib is to treat all filenames as Unix format unless they are unambiguously a RISC OS filename, and only translate at the lowest interface to RISC OS. That is, the SWI call to OS_File and friends.
- File and Socket handling - Unlike the separation of SCL and its sockets library, Unixlib's implementation is integrated with the open/close/read/write functions like a real Unix system and includes handling of various special files without any special RISC OS code in the program.
- Extra headers and functions - Probably the most obvious. Unixlib contains much of the functionality specified by POSIX and other specifications. Some of these functions are dummy and some aren't complete, and some don't make sense to implemented on RISC OS. Not comprehensive by any means, but an extensive and heroic effort, and sufficient for the a very large number of Unix programs to work with little or no modification. Very useful.
- Open Source - Any Tom, Dick and Harry can have a go at modifying Unixlib to fix bugs, add features, or generally make a hash of it. In fairness, there isn't much call to modify the SharedCLibrary since it is complete and essentially bug free, but potential modifications to Unixlib are to features the SCL doesn't provide.
And the down sides? Not so much of an issue nowadays, but Unixlib adds a minimum of 100k to your program to implement to basic level of C functionality and Unix compatibility. This is because Unixlib is linked, like most RISC OS libraries, statically to your program. In contrast to the SharedCLibrary, it's not a module so the executable must contain all the code it might potentially run. This is partly because GCC has never been able to generate module code, and partly because it would be extremely involved. In any case, this is often dominated in size by much larger, and also statically linked, ported libraries, so there is limited motivation to do this. A much more useful solution would be a RISC OS shared library system which would allow creation of a shared Unixlib with little extra effort, and with the considerable benefit of automatically propagating any fixes instead of old binaries contain potentially disastrous bugs.
What is SharedUnixLibary?
I want to say very little about this, since it's been a huge cause of confusion for no good reason. Its singular, and only purpose, is to catch a callback that RISC OS makes when an application quits. RISC OS helpfully insists on calling this (for a variety of reasons) when the listening application might be paged out, thereby jumping to random code and quickly causing your computer to freeze or crash some unrelated program. This code is therefore in a module and only passes the message onto Unixlib when it is safe to do so. The issue has been fixed in Select/Adjust, but Unixlib has not been modified to avoid this on machines running those versions of RISC OS.
Compilers and C Libraries
To complete the picture, I'd like to briefly discuss to interaction between the two main RISC OS compilers - Norcroft (Acorn C/C++) and GCC with the two C libraries.
By default, GCC will compile your program against Unixlib and Unixlib headers. but this is not set in stone, and with a command line switch, you can compile with the alternate stubs and headers for the SharedCLibrary provided with GCC.
Norcroft, by contrast, by default unsurprisingly compiles with the SharedCLibrary stubs and headers it is bundled with. This too, can be modified with some switches. These options are detailed on on my C programming site.
Because Norcroft is a strict C compiler, and doesn't understand various GNU extensions implemented by GCC and has a different range of warnings, maintaining the ability to use Unixlib, let alone compile it, with Norcroft has become problematic. Not because it's impossible to modify Unixlib to perform sensibly, but because of the maintenance factor: Unixlib has had considerable changes in the last two years, and it may be that the current version is the last it is possible to build with Norcroft, especially in light of changes I'll discuss in the next section. Fortunately, the number of people doing this is quite limited.
To round this off, there's one final complexity. The advent of the 32-bit SharedCLibrary means that a new version of stubs, the glue library that talks to the SCL module, has been required. That's well and good, and of course a new version came with the Castle C/C++ suite and GCC's stubs have also been updated. The problem in that once you've built a program against these stubs, you have to have the 32-bit SharedCLibrary loaded all the time. Generally, this isn't a big deal - on RISC OS 5, it's in ROM, and most other systems it's loaded by the first program that requires it, or during the boot sequence.
But there are some instances where this isn't practical, or you'd prefer not to load the 32-bit SharedCLibrary, or it's a system where it simply hasn't been installed. To avoid this, you can instead use RISCOS Ltd's "StubsG" instead of the the regular stubs provided with Castle's C/C++, or older versions of Acorn C/C++ program and your program will work equally well on systems that do and do not have the 32-bit SCL loaded.
GCC doesn't provide an equivalent of StubsG, because the task involved in making one is quite involved, but there's nothing stopping you using StubsG with GCC and instructions are provided for doing so.
The problem with Unixlib
We've made great progress with Unixlib - and in 2004 we have no less than 70 ChangeLog entries. Whilst many of these changes were very important to improving Unix compatibility and generally fixing bugs, they can be mostly characterised into two kinds. The first is in RISC OS functionality and the crucial translation between Unix behaviour and RISC OS behaviour: the magic that makes Unixlib work as sensible system. The other functionality is code, whilst required to be in a C library, contains little or no RISC OS specific code. In many cases, this code has been taken from other systems - sometimes modified to fit into Unixlib - and sometimes used verbatim. In other instances, the code has been written from scratch. Because Unixlib exists on a minority platform, and has only been widely used as a result of programs converted to RISC OS in the last few years, it's not always the case that code in Unixlib has been widely tested with the huge range of inputs that result from running a large number of programs.
Many instances of this problem rear their heads much earlier on. Whilst spotting programming problems earlier on almost always saves time and effort, they are no less annoying. The issue is the layout and content of C header files - the files that advertise the behaviour of the C library. These must be laid out with certain declarations appearing (or not appearing), and they must also advertise all the functionality that is present, and none that isn't. Faults in header files will result in programs either not compiling at all - if some unusual sequence of includes doesn't advertise the right behaviour, then the program might not buildable at all. Or it might be configured incorrectly resulting in different problems during compile or when running.
It's true that there are tools to check the behaviour of header files, and that these issues can be fixed given sufficient effort, but it's a huge amount of time not available to a limited number of developers. There are still further bugs in Unixlib that don't even have entries in the bugs database because it's not even clear what's causing them. Such bugs have prevented programs like GNUChess and bash being released for RISC OS.
Beyond Unixlib
I mentioned that Unixlib contains code taken from elsewhere. In particular, Unixlib contains code taken from the GNU C Library. The GNU C Library, or glibc, is certainly not the only free Unix C library, but it is probably the most accessible, widely used (it's used on almost all Linux systems) and it's written to support use on multiple operating systems. Indeed, with reference to the previously section, Unixlib uses a small number of header files taken from glibc. With the assumption that glibc contains far fewer bugs than Unixlib and that the programs we're trying to make work on RISC OS have long ago been built against glibc and are known to work, this clearly a step in the right direction for compatibility.
You might therefore be asking why Unixlib doesn't use more of glibc. Or perhaps, why RISC OS instead isn't using a port of glibc instead of Unixlib. Probably, if we were starting from scratch, a port of glibc would certainly be the correct way to go, but 10 years ago, when Unixlib was begun, things were quite different. And right now, we have a great deal of code in Unixlib which is the crucial translation between Unix and RISC OS and simply wouldn't exist in any other C library.
As for using more of glibc in Unixlib, this is a solution which I am now actively investigating, but to import much more code than we have presently means a radical alteration of the way Unixlib is put together. glibc's files are arranged quite differently in order to support multiple systems, and has a proliferation of public and private header files which must be used in the precise manner dictated by glibc's build system. To compound the issue, glibc uses some structures differently to Unixlib - notably, the one containing information about open files, so we must be very careful in mixing files. Glibc also contains much more extensive wide character and locale handling and this causes its own set of problems, especially since the latter in Unixlib makes use of the Territory module. I've already had to modify GCC to support the GNU alias attributes - something that glibc makes extensive use of.
There are other reasons to use parts of glibc too - it contains a considerable portion of the code required to support use of shared libraries. Ultimately, the point is a bug and maintenance avoidance exercise. It may prove indeed, that the best solution is to integrate RISC OS parts from Unixlib into glibc. Whilst this might mean we have a less free hand in modifying code, it has the same benefit of other RISC OS ports - improvements and bug fixes (at least, to generic code) are made by an army of people with little or no knowledge of RISC OS.
Licensing
To round off this feature, I will briefly mention some parallel issues that have been considered during Unixlib's development. As with much open source software, Unixlib is governed by well known licenses. In particular, most of Unixlib is covered by the BSD licence and by the Lesser GNU Public Licence, although there are some other files that are covered by some more obscure but compatible licenses. Unixlib in the last few years has also managed to acquire some code that is not LGPL, but GPL. I won't go into the implications of this here, but the number of files is small, and we intend in the next maintenance release of GCC for RISC OS, to replace these with versions with alternate licenses in the hope that attempting to clarify the legal standing of Unixlib might encourage greater usage. For reference, glibc mostly uses LGPL code and a variety of BSD-style licenses.
Onwards
Whilst my present work with glibc is a long way from being anything but a research exercise, and I won't be giving anyone and programs using it for a long time, it's clear that continuing to patch and improve Unixlib piecemeal can only go so far. What the precise solution is likely to be for RISC OS is going to come down to a good deal of debate, experimentation, and plenty of persistence. It must also compliment the related work on bringing ELF to RISC OS as well as the effort to port GCC 4.0 to RISC OS. In particular, this last effort relies heavily on the way the C library works, so this must be done correctly.
In conclusion, it looks like we have our work cut out. If you've got any input of your own, we'd be glad to hear from you.
Links
GCCSDK and Unixlib
GNU C Library
RISCOS Ltd's StubsG
Castle's C/C++ Development Suite
RISC OS C Programming
Related articles
Show your love for RISC OS on Facebook
New release of RISC OS Firefox available
USB in latest RISC OS 5 source release
This article has been linked to, or is available in the following formats:
Design and concept (c) Fudgecake Design, 1999 - 2001. Content (c) The Drobe Team, 1999 - 2006. See www.drobe.co.uk for more information. For non-commercial personal use only.