Drobe :: The archives
About Drobe | Contact | RSS | Twitter | Tech docs | Downloads | BBC Micro

RISC OS filename translation

By Peter Naulls. Published: 2nd Jan 2004, 21:03:48 | Permalink | Printable

Being understood in the outside world

Compatibility with other Operating Systems has long since been important for all OSes, especially minority ones like RISC OS. A crucial part of this for us is how RISC OS handles filenames compared to the rest of the world. There's been recent discussion about the failing of LanManFS on both Usenet and the Iyonix mailing list, so we thought it might be appropriate to cover these now.

The Issue

A fully qualified RISC OS filename might be something like:

    ADFS::MyRiscPC.$.Documents.Letter (filetyped to text)

But under Unix, it might be:

    /home/peter/documents/letter.txt

Or Windows:

    C:MyDocumentsletter.txt

We're not going to discuss the Windows/DOS format any further, since the issues are much the same, and the Unix format is, by and large, the canonical format for data swapping. However, it's clear that RISC OS and Unix formats differ considerably, and what's appropriate for translation depends upon the circumstance, and there are several we should consider:
  • URL file handling
  • Filenames over network filing systems (NFS, Samba)
  • File handling in RISC OS ports and source files
  • Files on ADFS filing systems under Linux

Simply Put

The simple case which applies to all is to simply swap the '.' and '/' characters, and this is reasonably obvious:

RISC OSUnix
file/txtfile.txt
program.dataprogram/data
SpriteFile (filetyped sprite)SpriteFile,ff8


The last one is slightly unusual - in some cases, it's appropriate to add the ,xxx extension containing the RISC OS filetype to preserve it over systems which don't support this concept.

There is one final issue - Unix filenames usually have everything represented on one filesystem under the root (the "/" directory), whilst RISC OS uses different file systems. Usually this is handled with something like:

    ATAFS::Erble.$.Internet.!Nettle

Becoming:

    /ATAFS::Erble/$/Internet/!Nettle/


URLs in RISC OS
Given the above, and because you've probably seen a file: style URL in RISC OS, the result won't be very surprising. RISC OS browser and web servers will need to peform the '.' and '/' swap. We therefore end up with URLs like:

    file:/ATAFS::Erble/$/Programming/SDL/SDL12/docs.html

When fetching remote files, the browser will look at the Content-type header or filename extension and do a MimeMap lookup to set its type.


Motif of filenamesNetwork filesystem filenames
This is probably the most contentious topic of the all of them - failure to do it properly will result in lots of frustrated users. Recently, Castle released an update to LanManFS, the Samba client bundled with RISC OS 5. Unfortunately, it had a number of issues, including having an uhelpful default type for untyped files on the host.

The main issue with network filing systems - and there are several for RISC OS, including LanManFS, LanMan98, ImageNFS and SunFish - is the need to have sensible filenames at both ends, and handling of RISC OS filetypes. Some, but not all, of the principles also apply to filename handling in DOSFS/Win95FS filesystems.

In short, it can be boiled down to this - if there's an ,xxx extension on the host filename, then this becomes the filetype presented to RISC OS for the file. If not, the client looks at the filename extension on the host (the bit after the last full stop) - and tries to match that to a filetype using the RISC OS MimeMap functions. If there's no match, or no extension, then it is given the default filetype.

There's also another incidental issue here - some of the miscellaneous ASCII characters that are valid in RISC OS aren't valid on network filesystems, and need to be translated back and forth.

When going the other way, we don't want to needlessly add ,xxx extensions, so if the RISC OS file has an extension which matches its filetype with a mimemap lookup, then no ,xxx extension is added, otherwise it is - the exception is if the filetype matches the default, in which case it is also left unchanged.

One slightly unexpected behaviour with this if you are not familar with the operation is the naming of a typed file on RISC OS which lacks an extension. A JPEG typed file 'picture' in RISC OS becomes 'picture,c85', which may not be entirely helpful in contrast to 'picture/jpg" which becomes 'picture.jpg'.

It's easy to suggest that such files automatically get their extension automatically added. However, this creates problems with reversability, because the filename that gets created is now not identical to what an application expected. To counter this, you might also suggest that file extensions are hidden, and RISC OS is just presented typed files - but again, this causes problems, because we might have several files with the same name, but different extensions. In practice, the best workaround is to make sure you preserve the extension if you plan on trasferring files for use on Windows or Unix.

Finally, what should the default filetype be? The only correct answer which happens to work in all circumstances is text, which is what LanMan98 and SunFish have. Unfortunately, LanManFS has an unhelpful default of the DOS filetype (fe4). On the Iyonix mailing list, I outlined why this would cause problems, which I'll repeat here:
  • From a purely syntatical standpoint, it adds nothing. The DOS filetype isn't anything that's usefully used by any RISC OS application.

  • On a Unix system, many files lack extensions (are "untyped"). Many of these are in fact textual, and the text filetype is certainly what you want. Most of the remainder are native binary exectuables, and loading into an editor is probably the only sensible thing to do with them on RISC OS.

  • Because of the above, creating a new plain file on a Unix share means that the unhelpful ,fff extension is added when all you want to do is create a plain file, so a tedious rename from a shell is required.

  • There are numerous (mostly source code) files that certainly are also textual in nature. True, these have extensions, and could be added to mimemap, but there are at least 30 common ones, and these would have to be added to every system, but having the default textual requires no extra effort.

  • AOF/ALF files (Acorn C/C++ output notwithstanding) from GCC on RISC OS are filetyped text. It is common to transfer these back and forth to a system where you're cross compiling - again, there are obvious problems here.

  • On Windows the tangible benefits of a text default tend to be quite few, although there are certainly limited cases of all that I've listed above. Changing from DOS filetype default emphatically causes no problems at all.

Hopefully the default will be changed in the next version of LanManFS.


File handling in RISC OS ports and source files
For ports of programs from Unix, there are further issues. Many programs assume Unix style filenames, or their config files have Unix style filenames in them. Sometimes ports can be modified to use RISC OS filenames instead, and often this is preferrable. In other cases, the handling is deeply embedded, or just not worth the effort to change, especially when we're talking about lots of different ports.

Fortunately, Unixlib comes to the rescue, and at all places where a filename is passed to RISC OS, it goes through a translation function, which performs the types of translations previously discussed, and many others. The behaviour is configurable, and is also very complex to deal with many cases that we don't have room to discuss here. One for example which is on by default is to set the RISC OS filetype based upon the extension with a mimemap lookup.

For most ports, the default behaviour works fine, but there might need to for further code to translate RISC OS names to Unix ones (and Unixlib provides another function for this) at the point that filenames are passed to a program. It is sometimes neccessary to switch the translation on and off, which is entirely possible.

One special case, which is also dealt with by Unixlib, is source filename handling, in particular, C source files. For historical reasons relating to the old 10/77 limitations on RISC OS, and now something that is still used for convenience, C header and source files are stored in their own 'c' and 'h' directories under RISC OS.

This means that 'program/file.c' under Unix is 'program.c.file' under RISC OS. In some cases, it is even required for Unixlib to create a new directory before a file can be created. The extensions for which this happen are configurable, but for GCC, it applies to about 20.

Sometimes these files Unix format filenames for source files need to be translated to RISC OS ones when files are unpacked from a Unix archive. One program than can do this automatically without lots of tedious renaming is my program, Reverser.


Files on ADFS filing systems under Linux
There is one final interesting case. As you may know, it is possible to read ADFS filing systems under Linux. Sometimes it is useful to know the filetypes, especially if you're also exporting this filesystem over Samba or NFS to be subsequently read with a RISC OS client.

In this case, the solution is to always append the ,xxx filenames extracted from the ADFS filesystem. There is no gain in this instance of being choosy about when to add the extension. When saving files, the ,xxx extension is stripped and its filetype set. When there's no extension, fff is used. However, this is slightly academic, as ADFS saving is not fully implemented under Linux.


Finally
Hopefully, this has cleared up some confusion (and quite possibly generated some more). We'd be interested to hear about any related issues that weren't answered.

Previous: New RISC OS news site opens
Next: tofla moves to iconbar.com

Discussion

Viewing threaded comments | View comments unthreaded, listed by date | Skip to the end

I heard a rumor that this was going to be in Select 4...

 is a RISC OS UserAndrewDuffell on 2/1/04 9:31PM
[ Reply | Permalink | Report ]

...what was? I've named a large number of things in the article, saying "this" doesn't really clear things up.

However, I assume you're referring to Select using Unix format names instead, which is really a different issue, and was previously discussed in detail in comments on a recent article.

 is a RISC OS Usermrchocky on 2/1/04 9:37PM
[ Reply | Permalink | Report ]

Do RISC OS zip files store extensions in the same way internally?

 is a RISC OS Userjess on 2/1/04 10:06PM
[ Reply | Permalink | Report ]

Just for reference, the comments about the default filetype in LanManFS being DOS only apply to the Castle/RISC OS 5 version. The version of LanManFS supplied with Select does have a default filetype of text.

 is a RISC OS Userilludium on 2/1/04 10:08PM
[ Reply | Permalink | Report ]

Jess: sigh, as usual, not willing to try yourself, and actually load a zip file into an editor and have a look.

There's no such thing as a "RISC OS zip file". But there is an extension for the format for storing the RISC OS filetype.

 is a RISC OS Usermrchocky on 2/1/04 10:14PM
[ Reply | Permalink | Report ]

Not bothered to read the rest of the article, but your file: URL is incorrectly specified. There should be *3* initial slashes because there is no authority present. That is, <URL:file:///ADFS::4/$/Something/SomethingElse>. Consult RFC1738 (3.10) for more details.

You might also want to spell check the article - there's a 'convinience' in there at least.

You've also managed to ignore VMS in there, I think. And to ignore the escaping requirements of URLs, and reserved characters on exchanging files - remember that filenames which are valid under one system may not be under another, even with whaatever munging you're talking about.

 is a RISC OS UserGerph on 2/1/04 10:24PM
[ Reply | Permalink | Report ]

I've managed to "ignore" lots of things - the article was already excessively long, and I covered the topics I thought would most relevant. VMS might be relevant, but it's of interest to almost no RISC OS readers.

Anyone who's interested in this topic is welcome to look up them up for themselves, including the valid characters, which indeed were briefly mentioned.

The file: URL isn't an attempt to be 100% correct. I wasn't trying to use the exact format of the RFC, but rather explain the rationale of what's displayed in RISC OS browsers - blame them, not me.

 is a RISC OS Usermrchocky on 2/1/04 10:31PM
[ Reply | Permalink | Report ]

An interesting article, well done.

One small point - (modern) browsers on RISC OS look at the Content-type header and don't pay any attention to the extension, since you can have php scripts and perl scripts (for example) producing any type of data you like. Broken images usually unexpectedly come as html. I've never seen an HTTP response in the wild without a content-type header, which is nice.

Of course IE just ignores everyone and guesses but we don't have that on RISC OS :)

Apart from that, a very interesting article, clearly a lot of work has gone into it.

 is a RISC OS Userjohn on 2/1/04 11:48PM
[ Reply | Permalink | Report ]

Although hinted very briefly, I consider the encoding differences between filing systems (when defined) also a very important issue. More modern FS like HFS+, NTFS, AppleTalk, SMB, etc explicitly specify an encoding (usually Unicode based) and a FS implementation needs to take care of this.

 is a RISC OS Userjoty on 3/1/04 12:11AM
[ Reply | Permalink | Report ]

Of course the "dot" character in Unix filenames does not always signify the start of an extension, it is in fact just an ordinary character in a filename, although it is often used to delimit an extension. So a filename or even a leafname can have multiple dots. Also, a dot as the first character of a leafname signifies a "hidden" file, a concept we don't have on RISC OS. I don't suppose that any harm would come of translating the dots to slashes in either case, but clearly the MIME mapping code should ignore all except the last dot. Also, unlike DOS/Windows, Unix file extensions are not exclusivly 3 characters, eg things like "file.html" or "file.jpeg" are not that uncommon.

Martin

 is a RISC OS Usermrtd on 3/1/04 12:34AM
[ Reply | Permalink | Report ]

chocky> The file: URL is a good example that you should at least have got right yourself - if you're writing an article about getting filenames right between systems, the least you can do is get the examples right yourself :-)

The reason I cited it is that nobody actually seems to have bothered to read the RFC that describes its format. It's quite explicit. It even fits within the generic URL scheme which was built upon the exeisting URL schemes.

 is a RISC OS UserGerph on 3/1/04 2:37AM
[ Reply | Permalink | Report ]

To Peter Naulls:

Thanks peter for this concicive clarification.

To mtrd:

Not all DOS/Windows extensions are three chars anymore - even they can't shut their eyes to reality as the extension .html shows :-) But the majority is three chars indeed.

To all:

I must indeed say, that I still which Acorn had opted for the / or even the as a directory delimiter because that would have removed quite a bit of confiusion...

 is a RISC OS Userhzn on 3/1/04 9:03AM
[ Reply | Permalink | Report ]

gerph: sigh. I'll say it again. My example is in reference to those URLs displayed by RISC OS browsers, and is an explanation of why they display what they do. If I'd used something else, it would not have been what they display, regardless of what the RFC says on the matter.

john: the Content-type header is indeed briefly mentioned, thanks.

 is a RISC OS Usermrchocky on 3/1/04 9:18AM
[ Reply | Permalink | Report ]

Cool idea, but why keep it between Risc-OS and the outside world. It would be nice to have an environment inside Risc-OS where unix programs work and can be compiled as if they where running under unix. Sort of what cygwin is for windows.

 is a RISC OS UserJaco on 3/1/04 1:52PM
[ Reply | Permalink | Report ]

Peter> Good article.

Another possible area of "difficulty" springs to mind, namely in the DOS/Windows, you can also add a "." extention and a three character extention to a Directory name.... (the mind boggles) anyway obviously it would be incorrect to translate this back to a specific RISC OS filetype as it's a directory object rather than a file.

Thanks again

Regards

Annraoi

 is a RISC OS UserAMS on 3/1/04 2:15PM
[ Reply | Permalink | Report ]

Jaco: yes, it would. But there are issues with the RISC OS bash port, and many utillities still need to be ported.

We'd appreciate any help ;-)

AMS: Yes, of course; although checking if your object is a directory is pretty easy, and it goes without saying. Which is presumably why you've mentioned it ;-)

 is a RISC OS Usermrchocky on 3/1/04 2:29PM
[ Reply | Permalink | Report ]

Peter> I do tend to state the B L E E D I N G O B V I O U S sometimes !!!!!!

Anyway great article - best wishes for the New Year etc., etc,

Annraoi

 is a RISC OS UserAMS on 3/1/04 3:10PM
[ Reply | Permalink | Report ]

Interesting article.

Tried Reverser too - not 32-bit compatible :-(

 is a RISC OS UserTonyStill on 3/1/04 4:53PM
[ Reply | Permalink | Report ]

TS: oops. I'll update my software page shortly. I do have a 32-bit version here, and I need to put up Decor too.

AMS: It was actually a reference to a recent usenet thread - "It was an idiom, but you knew that, so why did you have to ask".

 is a RISC OS Usermrchocky on 3/1/04 5:24PM
[ Reply | Permalink | Report ]

w.r.t. automatically adding an extension to to files without them on the risc os side, an option to allow reversability would be to add ,.XXX to any of these files with a recognised type.

This would make "picture" become "picture,.jpg" which would still work under windows and be of more use than "picture,c85"

 is a RISC OS Userjess on 4/1/04 5:19PM
[ Reply | Permalink | Report ]

You seem to have completely missed the point. It is precisely your suggestion which causes the problem.

 is a RISC OS Usermrchocky on 4/1/04 6:56PM
[ Reply | Permalink | Report ]

So I really had to explicitly state that the extra comma would cause the additional extension to be stripped in the same way a for comma-hextypes?

I didn't realise it wouldn't be obvious.

Sorry.

 is a RISC OS Userjess on 4/1/04 7:28PM
[ Reply | Permalink | Report ]

It /does/ look like a typo, I read it several times before I realised what it meant.

FWIW, it seems a more reasonable method than the alternative you mention although the comma would still be annoying from windows.

 is a RISC OS Userjohn on 4/1/04 11:10PM
[ Reply | Permalink | Report ]

I see what you mean, (more quotes perhaps?) but if it were a typo, it would just be repeating what was in the article.

If this scheme were implemented it would be better used as a fallback situation, with the preferred method to be to include the "/XXX" in the RISC OS name as the article suggests.

 is a RISC OS Userjess on 5/1/04 6:37AM
[ Reply | Permalink | Report ]

Concerning the default filetype used by LanManFS, I seem to recall that the DOS filetype is/was used by DOSMap. That is, DOSMap would only work on filetypes of DOS or 000. Could this be the reasoning behind setting the default type to DOS?

I don't use LanManFS and I'm not on the Iyonix mailing list, so presumably this has already been covered at length there, but it just struck me that this might be relevant.

 is a RISC OS Userflypig on 6/1/04 5:24PM
[ Reply | Permalink | Report ]

I use Lanman98 because it is basically the only one that does the job properly and does it's job very well (plays mpegs fine and copes with long filesnames used by samba and Windows XP. Here my files and e-mail (!Newsdir) are kept in /home/user on linux (Redhat 9) I choose to keep the risc os filetypes, it would be nice to edit it on a per file basis or by changing an option on the filer. I would also like to be able to view hidden files easier (.) (dreaming..) basic Domain login support and better control of access permittions. If I had the know how I would be getting to work doing it not sitting here wishing.

 is a RISC OS Useracornandy on 9/1/04 1:34AM
[ Reply | Permalink | Report ]

LanMan98 has no problem with hidden files. Try having a look through the manual.

 is a RISC OS Usermrchocky on 9/1/04 9:34AM
[ Reply | Permalink | Report ]

I know I can change a system variable to see the hidden files Mr sarky!

But I wan't to do it from the filer for ease of use! Since lanmanfs lacks the long filename support and doen't come with a filer front end. Why doesn't RISC OS Ltd. & Castle just licence lanman98 from WSS?

 is a RISC OS Useracornandy on 9/1/04 6:52PM
[ Reply | Permalink | Report ]

Well, call me names all you want, but I was perfectly polite. Your post, which was a little confused, suggested you didn't know how. Get over it.

As for "Why doesn't X do Y?". I'm sure you know the answer to that. Firstly, it would cost money, and secondly both Select and RO5 come with LanManFS.

 is a RISC OS Usermrchocky on 9/1/04 7:15PM
[ Reply | Permalink | Report ]

No you weren't polite! You were rude and scarcastic, get over yourself! tit

 is a RISC OS Useracornandy on 9/1/04 9:41PM
[ Reply | Permalink | Report ]

I used no offensive words, I merely made a suggestion intended to be helpful without the slightest bit of sarcasm which you strangely took offence at. I'm not the one using exclamation marks and name calling.

It's quite clear I'm not the one being rude. If you want to tell people who are being genuinely helpful that they are being rude, then please don't stop and wonder when people no longer give any help.

 is a RISC OS Usermrchocky on 09/01/04 11:06PM
[ Reply | Permalink | Report ]

Please login before posting a comment. Use the form on the right to do so or create a free account.

Search the archives

Today's featured article

  • WebWonder review
    Creating a simple website in a few hours
     7 comments, latest by takkaria on 16/9/06 6:57PM. Published: 2 Sep 2006

  • Random article

  • Printers+ really open sourced
    Castle Printers 1.67 also on riscos.com [Updated 22:43 1/6/2003]
     22 comments, latest by nex on 11/6/03 3:36AM. Published: 1 Jun 2003

  • Useful links

    News and media:
    IconbarMyRISCOSArcSiteRISCOScodeANSC.S.A.AnnounceArchiveQercusRiscWorldDrag'n'DropGAG-News

    Top developers:
    RISCOS LtdRISC OS OpenMW SoftwareR-CompAdvantage SixVirtualAcorn

    Dealers:
    CJE MicrosAPDLCastlea4X-AmpleLiquid SiliconWebmonster

    Usergroups:
    WROCCRONENKACCIRUGSASAUGROUGOLRONWUGMUGWAUGGAGRISCOS.be

    Useful:
    RISCOS.org.ukRISCOS.orgRISCOS.infoFilebaseChris Why's Acorn/RISC OS collectionNetSurf

    Non-RISC OS:
    The RegisterThe InquirerApple InsiderBBC NewsSky NewsGoogle Newsxkcddiodesign


    © 1999-2009 The Drobe Team. Some rights reserved, click here for more information
    Powered by MiniDrobeCMS, based on J4U | Statistics
    "'Public interest' is what tabloids use to trot out all sorts of tat. It's turned into a meaningless term meaning 'I'm just trying to be sensationalist'"
    Page generated in 0.5068 seconds.