|
|
| Beta! | About us | Contact | Submit news | RSS | Twitter | Webspace | Tech docs | Downloads | BBC Micro | Gallery | Wallpaper |
|
Processing text the easy way By Paul Beverley. Published: 7th Sep 2003, 19:35:03.Paul Beverley shows search'n'replace who's boss Every publication online and offline has its own house rules on the style and presentation of content. Archive Magazine editor Paul Beverley explains how he uses a new freeware application to prepare text for publication.
Last month, "My Favourite Application" was SpamStamp. Indeed, SpamStamp still continues very high on the list, but this month a new PD program has appeared which is my latest "Favourite Application". It's still being developed, but it's most certainly already a very usable application. The history Some months ago, I asked on Archive-on-Line if someone could write me a program that would carry out a sequence of search and replace functions that I could specify by using a script. The particular job I had in mind was speeding up the production of the words-only version of the magazine each month. I take the Text file from inside the directory-type Impression document which contains all the text of all the articles in the magazine. This then has to be massaged by a whole string of search and replace actions. I wanted to automate this. At the time, there were various suggestions as to how I could improve things, but no-one offered to write an application. Then, out of the blue, Paul Sprangers sent a copy of ConvText - and this is exactly what I was asking for.
The job The script below is what is used to convert the Text file. Having selected that particular script within ConvText, I just drag the file to the ConvText iconbar icon and the output file appears a few seconds later - brilliant. SCRIPT:Archive
I think it should be fairly self-explanatory, but I'll explain it anyway. On each line, the search element is the bit in front of the colon (:), and the thing which replaces it is after the colon. So, the first line after the SCRIPT title replaces all occurrences of "{nextstory }" with nothing - there's nothing after the colon. The two ligatures are then expanded into "fi" and "fl". The next three lines are three of the predefined actions that Paul has created. The first changes all smart (or 'sexed') quotation marks into the straight equivalent. The second cuts down all multiple spaces into single spaces, and the final one removes multiple linefeeds, but only down to a minimum of two. This maintains a single blank line between paragraphs. In fact, in this case, I don't want blank lines between paragraphs, so I add the next line ([10][10]:[10]) which turns all pairs of ASCII 10 characters (i.e. linefeeds) into one. In the next line, I add the volume and issue number between every paragraph and the next, and I guess the rest of it doesn't really need detailed explanation - [160] is a hard space and [32] is an ordinary space. "I could get it to..." As I used ConvText I began to think of other jobs I could use it for. Here are the first two that occurred to me: SCRIPT:Single line addr
These two convert an address list back and forth between a multi-line format for printing out labels and a single line format with commas suitable for printing the addresses on sheets of paper. "Could you just get it to...?" As I worked with ConvText, it pretty soon became obvious that a wildcard function would be useful, so Paul added # and * wildcards with # and * just in case I wanted to do a search and replace on either of those two characters, though I could of course have used [35] and [42]. What's more, the contents of the wildcards can be used in the replace string. For example.. [32]#pounds:[32]£#
Sledges and nuts One word of warning - beware using a sledge to crack a nut. I had a drawfile and I wanted to get the text out of it for putting into an article. I knew that as long as the text in the drawfile hadn't been converted to paths, the actual text would there inside the drawfile. So I changed the filetype to 'text' and set to work. (Are you ahead of me?) I studied the format of the files, seeing what special characters occurred before each bit of text, and I tried to work out how I could use this wonderful automatic system to extract the text. "Once I've worked out a script, I and other users of ConvText will be able to just drag-and-drop drawfiles to our hearts' content", I thought. I laboured for hours trying to work out clever ways of doing this (managing to show up some bugs in the application as I proceeded). I was beginning to get to the point of admitting that this was a bit beyond me when I asked on Archive-on-Line if there was an easier way. "Load the drawfile into Vector and there's a menu item to "Save text" - and that was only one way of a number of ways of doing it that were suggested. Munging Impression files One search and replace facility that has eluded us for years is in Impression. There's the facility to search for a particular style, but you can't replace it with another style name. I have one contributor who sends in articles for Archive using style names that are not the ones I use in a standard Archive document, so I have to change these names in the incoming Impression file before cutting and pasting it into one of my magazine files. Now I can do it automatically. SCRIPT:Someone's articles
This gentleman also uses double spaces at the end of each sentence, so those are changed to single spaces - easy-peasy! A real time-saver The article I get that needs most editing (and this is no reflection on its author, Steve Knattress) to get it into Archive-standard format is Archive-on-Line OffLine. The reason is that it consists of emails dashed off quite quickly in some cases, never intended for publication, with abbreviations, etc. So here's how I save time on that. SCRIPT:AoL
I won't explain it in detail, but the first few commands are for adding styles to email addresses and sometimes URLs which are enclosed in angle brackets by Steve. Then I try to pick up where people have put asterisks (*) or slashes (/) around words or phrases to add emphasis. OK, there are places where angle brackets slashes and asterisks might occur for other reasons, but all I do is look at the resultant Impression file and, if there are spurious styles, go back to the original text file, change it slightly and do the drag-and-drop again - quick and easy! I may have to do this a few times to account for each and every spurious style, but certainly that takes a lot less time than manually editing every single email address and every emphasis in the whole document. And the other important point to make is that each time I use this script, I'll find other corrections that I haven't covered, so I'll add them to the script ready for next month. So, month-on-month, at a single drag-and-drop, a higher and higher percentage of the corrections will be done automatically. Other uses In addition to those I've already mentioned, I've come up with another four uses of ConvText in the first couple of weeks of using it. I won't bore you with them all, but simply say that for someone like me, for whom words are his stock-in-trade, this is a really productive application - many thanks to Paul Sprangers for producing it. Links ConvText can be found in the downloads section on the website of Archive's sister publication Living with Technology. Paul's article on ConvText will appear in the next issue of Archive magazine, due out around 19th September. Discussion Viewing threaded comments | View comments unthreaded, listed by date | Skip to the end
Please login before posting a comment. Use the form on the right to do so or create a free account. |
Login
Create a new account Forgot your password? Search this website
This week's poll
Featured articles The weekend's RISC OS event has been and gone and we've got the rest of our lives to look forward to. Here's a round-up of extra news and Drobe's show-related coverage and some photos taken from Wakefield 2009 - plus a video from the show floor. 16 comments, latest by AW on 29/4/09 7:41PM. Published: 27 Apr 2009Picture exclusive - This grainy photograph shows a port of RISC OS 5, sourced from the RISC OS Open project, running on a Beagleboard - a device powered by a 600MHz ARM Cortex-A8 processor with a built-in graphics chip. The port, developed by Jeffrey Lee with help from Uwe Kall and ROOL staff, is seen as a major breakthrough for the shared-source project as it proves the OS can be ported to new hardware without the need for a large team of engineers. 75 comments, latest by rjek on 30/4/09 3:15PM. Published: 25 Apr 2009It can be a pain when someone sends you a file that can only be opened on Windows, Mac OS X or Linux - but with the help of a free-to-use website and NetSurf, Paul Stewart reveals how these documents can be viewed on RISC OS. 6 comments, latest by AW on 8/5/09 12:12AM. Published: 19 Apr 2009Useful links News and media:Iconbar • MyRISCOS • ArcSite • RISCOScode • ANS • C.S.A.Announce • Archive • Qercus • RiscWorld • GAG-News Top developers: RISCOS Ltd • RISC OS Open • MW Software • R-Comp • Advantage Six • VirtualAcorn Dealers: CJE Micros • APDL • Castle • a4 • X-Ample • Liquid Silicon • Webmonster Usergroups: WROCC • RONE • NKACC • IRUG • SASAUG • ROUGOL • RONWUG • MUG • GAG • RISCOS.be Useful: RISCOS.org • RISCOS.info • Filebase • NetSurf Non-RISC OS: The Register • The Inquirer • Apple Insider • BBC News • Sky News • Google News • xkcd • diodesign |
Recently logged in:
stevek •
jmb •
Phlamethrower •
Mart •
AW •
JMBarber •
turbo •
Hairy •
hubersn •
rjek • Stats
© 1999-2009 The Drobe Team. Some rights reserved, click here for more information | Powered by MiniDrobeCMS, based on J4U
"We accept Drobe likes to be [controversial], no problem there - but a sinister pattern has appeared over the past year or so"
Page generated in 0.1362 seconds.