RISC OS News on Drobe
RISC OS Search
containing
"Have I been nominated on drobe.co.uk this year? Sadly not."
Welcome back guest  |  Login  |  Register Saturday 17th May 
Login

drobe.co.uk
About Drobe
RISC OS News
Drobe Features
Alternatives
Bookmarks
Riscos.org.uk
Auctions
Events (shows)
AU issues
Tech Material
Wallpaper
Movies
File archives
SH eBooks
FAQs
Changelog

Interact
Forums
Online chat
Your webspace
BBC Emu(games!)
User gallery
RSS news &
comments
Submit news
Contact us

Quick Links
Open directory
Nutshells
ANS archives
ArcSite
RO Repository
Announce
RISCOS Ltd.
Castle

NTK
The Inquirer
The Register
OSNews
Slashdot
Google

Alternatives
NetBSD
ARM Linux
Iyonix Linux

Found Apps
 RISC OS Software !Avalanche
 RISC OS Software !Darts
 RISC OS Software !CFuncAnal
 RISC OS Software !TranTIFF+
 RISC OS Software !Dustbin
 RISC OS Software !NurseW
 RISC OS Software !Tally
 RISC OS Software !VideoLog
 RISC OS Software !USBKick
 RISC OS Software !Spr2Jpeg
Recent users
flypig is a RISC OS User flypig
tduell is a RISC OS User tduell
OokPikRick is a RISC OS User OokPikRick
Loris is a RISC OS User Loris
stevek is a RISC OS User stevek
Alexander is a RISC OS User Alexander
helpful is a RISC OS User helpful
mjp is a RISC OS User mjp
Cogs is a RISC OS User Cogs
jmb is a RISC OS User jmb


Why donate?

Serving: 15GB
Fuel: caffeine
1 users online
22 guests
250 active accts 24327 comments

Webstats

 
RISC OS News Article
Processing text the easy way
Published: 7th Sep 2003, 19:35:03GMT  Source: Archive Magazine
By Paul Beverley
Page 1 of 1
Paul Beverley shows search'n'replace who's boss
Archive logoEvery publication online and offline has its own house rules on the style and presentation of content. Archive Magazine editor Paul Beverley explains how he uses a new freeware application to prepare text for publication.

Last month, "My Favourite Application" was SpamStamp. Indeed, SpamStamp still continues very high on the list, but this month a new PD program has appeared which is my latest "Favourite Application". It's still being developed, but it's most certainly already a very usable application.

The history
Some months ago, I asked on Archive-on-Line if someone could write me a program that would carry out a sequence of search and replace functions that I could specify by using a script. The particular job I had in mind was speeding up the production of the words-only version of the magazine each month. I take the Text file from inside the directory-type Impression document which contains all the text of all the articles in the magazine. This then has to be massaged by a whole string of search and replace actions. I wanted to automate this.

At the time, there were various suggestions as to how I could improve things, but no-one offered to write an application. Then, out of the blue, Paul Sprangers sent a copy of ConvText - and this is exactly what I was asking for.

ConvText in action
ConvText finishes processing a textfile



The job
The script below is what is used to convert the Text file. Having selected that particular script within ConvText, I just drag the file to the ConvText iconbar icon and the output file appears a few seconds later - brilliant.

SCRIPT:Archive
{nextstory }:
ž:fi
Ÿ:fl
{REMOVE SMART QUOTES}
{REMOVE MANY SPACES}
{REMOVE MANY NEW LINES}
[10][10]:[10]
[10]:[10]16.11[10]
[160]:[32]
[173]:-
[151]:-
[32][10]:[10]


I think it should be fairly self-explanatory, but I'll explain it anyway. On each line, the search element is the bit in front of the colon (:), and the thing which replaces it is after the colon. So, the first line after the SCRIPT title replaces all occurrences of "{nextstory }" with nothing - there's nothing after the colon. The two ligatures are then expanded into "fi" and "fl".
The next three lines are three of the predefined actions that Paul has created. The first changes all smart (or 'sexed') quotation marks into the straight equivalent. The second cuts down all multiple spaces into single spaces, and the final one removes multiple linefeeds, but only down to a minimum of two. This maintains a single blank line between paragraphs. In fact, in this case, I don't want blank lines between paragraphs, so I add the next line ([10][10]:[10]) which turns all pairs of ASCII 10 characters (i.e. linefeeds) into one.

In the next line, I add the volume and issue number between every paragraph and the next, and I guess the rest of it doesn't really need detailed explanation - [160] is a hard space and [32] is an ordinary space.

"I could get it to..."
As I used ConvText I began to think of other jobs I could use it for. Here are the first two that occurred to me:

SCRIPT:Single line addr
{REMOVE MANY NEW LINES}
[10][10]:zczc
[10]:,[32]
zczc:[10]

SCRIPT:Multi line addr
[10]:zczc
,[32]:[10]
zczc:[10][10]


These two convert an address list back and forth between a multi-line format for printing out labels and a single line format with commas suitable for printing the addresses on sheets of paper.

"Could you just get it to...?"
As I worked with ConvText, it pretty soon became obvious that a wildcard function would be useful, so Paul added # and * wildcards with # and * just in case I wanted to do a search and replace on either of those two characters, though I could of course have used [35] and [42].
What's more, the contents of the wildcards can be used in the replace string. For example..

[32]#pounds:[32]£#
[32]##pounds:[32]£##


Sledges and nuts
One word of warning - beware using a sledge to crack a nut. I had a drawfile and I wanted to get the text out of it for putting into an article. I knew that as long as the text in the drawfile hadn't been converted to paths, the actual text would there inside the drawfile. So I changed the filetype to 'text' and set to work. (Are you ahead of me?) I studied the format of the files, seeing what special characters occurred before each bit of text, and I tried to work out how I could use this wonderful automatic system to extract the text. "Once I've worked out a script, I and other users of ConvText will be able to just drag-and-drop drawfiles to our hearts' content", I thought.

I laboured for hours trying to work out clever ways of doing this (managing to show up some bugs in the application as I proceeded). I was beginning to get to the point of admitting that this was a bit beyond me when I asked on Archive-on-Line if there was an easier way. "Load the drawfile into Vector and there's a menu item to "Save text" - and that was only one way of a number of ways of doing it that were suggested.

Munging Impression files
One search and replace facility that has eluded us for years is in Impression. There's the facility to search for a particular style, but you can't replace it with another style name. I have one contributor who sends in articles for Archive using style names that are not the ones I use in a standard Archive document, so I have to change these names in the incoming Impression file before cutting and pasting it into one of my magazine files. Now I can do it automatically.

SCRIPT:Someone's articles
SubHead:Heading
ByLine:Main Heading
Heading:Box Heading
Emphasise:Italic
Together:Normal
! :![32]
. :.[32]
? :?[32]


This gentleman also uses double spaces at the end of each sentence, so those are changed to single spaces - easy-peasy!

A real time-saver
The article I get that needs most editing (and this is no reflection on its author, Steve Knattress) to get it into Archive-standard format is Archive-on-Line OffLine. The reason is that it consists of emails dashed off quite quickly in some cases, never intended for publication, with abbreviations, etc. So here's how I save time on that.

SCRIPT:AoL
<: {"email" on}
> :{"email" off}[32]
{REMOVE MANY NEW LINES}
[10][10]:[10]
/: "italic" on}
/ :"italic" off}[32]
*: "italic" on}
* :"italic" off}[32]
RISC OS:RISC[160]OS
RISCOS:RISC[160]OS
RISC[160]OS Ltd:RISCOS Ltd
Risc OS:RISC[160]OS
RiscOS:RISC[160]OS
RO:RISC[160]OS
Risc PC:RiscPC
RISC PC:RiscPC
Disk:disc
BASIC:Basic
web site:website
CD­ROM:CD­ROM
Draw file:drawfile
icon bar:iconbar
Mb:MB
Gb:GB
x :×
StrongArm:StrongARM
Jpeg:JPEG
jpeg:JPEG
e-mail:email
hz:Hz
file type:filetype


I won't explain it in detail, but the first few commands are for adding styles to email addresses and sometimes URLs which are enclosed in angle brackets by Steve. Then I try to pick up where people have put asterisks (*) or slashes (/) around words or phrases to add emphasis. OK, there are places where angle brackets slashes and asterisks might occur for other reasons, but all I do is look at the resultant Impression file and, if there are spurious styles, go back to the original text file, change it slightly and do the drag-and-drop again - quick and easy! I may have to do this a few times to account for each and every spurious style, but certainly that takes a lot less time than manually editing every single email address and every emphasis in the whole document.

And the other important point to make is that each time I use this script, I'll find other corrections that I haven't covered, so I'll add them to the script ready for next month. So, month-on-month, at a single drag-and-drop, a higher and higher percentage of the corrections will be done automatically.

Other uses
In addition to those I've already mentioned, I've come up with another four uses of ConvText in the first couple of weeks of using it. I won't bore you with them all, but simply say that for someone like me, for whom words are his stock-in-trade, this is a really productive application - many thanks to Paul Sprangers for producing it.

Links
ConvText can be found in the downloads section on the website of Archive's sister publication Living with Technology.
Paul's article on ConvText will appear in the next issue of Archive magazine, due out around 19th September.

Related articles
Add text to your Drawfiles with DrawText

This article has been linked to, or is available in the following formats:  
 
 
 
 
 
[Printable] [Digg this] [Blog search]


nunfetishist(bad user / troll) (-1.0)
8/9/03 1:35PM
I do this sort of thing quite often, too. But I use one of Awk, Perl, or sed, depending on how I'm feeling.
diomus(valued user)www (+1.0)
Face
8/9/03 3:33PM
I suspect this is more friendly than those tools. It took me a week to pick up Perl.

Chris, drobe.co.uk
tribbles2 
8/9/03 5:12PM
I think I prefer Perl for this kind of thing - the example where pairs of [10]s are changed to [10], what happens if you have 4 [10]s? Do you get 1 [10], 2[10]s (it depends on whether it starts the search from the start, or from the last change)?

With Perl, you'd do ~s:n{2,}:n:; which, to me, is a lot clearer as to what happens. But then I have read "Mastering Regular Expressions" quite thoroughly :)

Obviously I could try it out, but I'm in the office, and my RiscPC is sitting at home...
tribbles2 
8/9/03 5:15PM
Actually, thinking about it, I misread the original article, and it's probably the same as ~s:nn:n:g; (I thought that it was meant to remove multiple lines, whereas the 'macro' beforehand did it).
nunfetishist (-1.0)
8/9/03 5:34PM
In reply to diomus:
More friendly? I have to read the body of this article very carefully to work out wtf the scripts do!
VinceH(valued user) (-1.0)
Face
8/9/03 7:48PM
I don't read Archive-on-Line. Maybe if I
did, I might have been able to point out
to Paul that an app to do the sort of
thing he was asking about exists
already. Ho hum. :-/

The scripts are also human readable :-p
dansguardian(valued user) (-1.0)
Face
8/9/03 11:23PM
In reply to VinceH:
Which is?
VinceH(valued user) (-1.0)
Face
9/9/03 6:23PM
I can't tell you that. It's top secret,
and I really don't have time at the mo
to be running around exotic locations,
being chased by and getting into battles
with any of MI6's double-O agents.
Especially that 007, who just damned
well refuses to die. I'm far too busy
working on WebChange for that sort og
malarky ATM.
 

Top Tip

Acorn user magazine

Why not visit our Acorn User Magazine section and browse through old BBC, Acorn and RISC OS magazines?
 
Headline news
Wakefield 2008 show photos
28th Apr 2008

Wakefield 2008 show live news
26th Apr 2008

Who would want an A9home PDA?
24th Apr 2008

RISC OS 6.10 available to Select subscribers
24th Apr 2008

Gallery photo
Older news
Animation and typing applications really released
24th Apr 2008

Wakefield 2008 show preview
22nd Apr 2008

R-Comp unveils new PDF authoring package
22nd Apr 2008

NetSurf bags GBP10K investment from Google
21st Apr 2008

Apple Mac VirtualRiscPC leaves beta
20th Apr 2008

Blu-ray disc burn breakthrough
14th Apr 2008

PDF import support for ArtWorks
13th Apr 2008

Wakefield 2008 show theatre line-up revealed
13th Apr 2008

Animation software collection falls into R-Comp's hands
9th Apr 2008

Features
A9home: two years on
4th Dec 2007

A9home DIY laptop: first pictures
1st Dec 2007

Software hosted by Drobe: Your guide
5th Nov 2007

 

Top | Design and concept © Fudgecake Design, 1999 - 2001. Content © The Drobe Team, 1999 - 2008. 
Click here for more information and terms and conditions.