Drobe :: The archives
About Drobe | Contact | RSS | Twitter | Webspace | Tech docs | Downloads | BBC Micro

Using StrongED with Lua

By Gavin Wraith. Published: 29th Apr 2006, 23:13:56 | Permalink | Printable

Work that text editor hard



URL grabbing
We can extract all the URLs from a HTML file by SHIFT-doubleclicking it to load it into StrongED, and then applying the following script:

#! lua
-- URL grab
url_pat = "<[aA]%s+[hH][rR][eE][fF]%s*=%s*\"([^%c\"]*)\""
io.input(arg[1])
s = io.read("*all")
io.input()
for url in s:gmatch(url_pat) do
print(url)
end -- for


screen5.gifThe variable url_pat holds a pattern string, which is used to detect the presence of a HTML anchor tag in which a URL link is stored. It is not as gruesome as may first appear. It has to essentially pick out the <a href="http://www.."> tags in the given HTML web page. The pattern must start with a < followed by either a or A. Then the expression %s+ signifies at least one blank space. Then comes the word href or any variation on that got by using upper instead of lower case. The expression %s* means any sequence, including the empty sequence, of blank spaces. Then comes an equality sign, then maybe more blank spaces. The expression \" denotes a double-quote.

Then comes ([^%c\"]*). The parentheses mean that the pattern enclosed by them is to be captured. That pattern, [^%c"]*, matches any sequence of characters that are neither control characters nor doublequotes. So what the pattern captures is the stuff that comes inside doublequotes in a URL.

The function io.input makes its argument the default file for input if it exists, and switches to the keyboard otherwise. The effect of the three lines:

io.input(arg[1])
s = io.read("*all")
io.input()


is to read all the text in the StrongED window in as a single large string of characters, and assign it to the variable s. The for loop iterates over the matches of url_pat in s and the iteration variable url holds the string captured by the match. The function string.gmatch in Lua's string library produces an iterator for captures.

Back to the main article

Previous: Dispute over 'intrusive' VRPC copy protection
Next: R-Comp responds to Grapevine feedback

Discussion

Viewing threaded comments | View comments unthreaded, listed by date | Skip to the end

Nice, informative article.

Incidentally, Cretin also supports lua scripts (implemented by the brothers Sidwell back before I got hold of it), so this article may also be useful to those who wish to tinker there, too.

 is a RISC OS Userjymbob on 30/4/06 9:40AM
[ Reply | Permalink | Report ]

Indeed an intersting article!

What kind of irritates me is the list of "Related articles:" since the only real relation between the ones listed and this one is that they appeared on Drobe.

 is a RISC OS Userhzn on 30/4/06 10:07AM
[ Reply | Permalink | Report ]

Maybe the programmers do link "Related articles" to this one, considering that they use a Mouse and maybe Central Heating during the winter months of the long hours of their work? ;-)

(Just kidding here).

Steve.

 is a RISC OS UserSawadee on 30/4/06 11:25AM
[ Reply | Permalink | Report ]

Please login before posting a comment. Use the form on the right to do so or create a free account.

+++ Message board +++

Read messages or start a new thread

Search the archives

Today's featured article

  • RiscPC breaks 1.5GHz barrier
    Calm down dear, this is a case mod
     23 comments, latest by flypig on 13/5/04 1:56PM. Published: 12 May 2004

  • Random article

  • Software news in brief
    SSH, Ogg Vorbis, utilities and more
     3 comments, latest by IanK on 22/12/04 8:10PM. Published: 22 Dec 2004

  • Login

    Username

    Password

    Create a new account
    Forgot your password?

    Useful links

    News and media:
    IconbarMyRISCOSArcSiteRISCOScodeANSC.S.A.AnnounceArchiveQercusRiscWorldDrag'n'DropGAG-News

    Top developers:
    RISCOS LtdRISC OS OpenMW SoftwareR-CompAdvantage SixVirtualAcorn

    Dealers:
    CJE MicrosAPDLCastlea4X-AmpleLiquid SiliconWebmonster

    Usergroups:
    WROCCRONENKACCIRUGSASAUGROUGOLRONWUGMUGWAUGGAGRISCOS.be

    Useful:
    RISCOS.org.ukRISCOS.orgRISCOS.infoFilebaseChris Why's Acorn/RISC OS collectionNetSurf

    Non-RISC OS:
    The RegisterThe InquirerApple InsiderBBC NewsSky NewsGoogle Newsxkcddiodesign


    © 1999-2009 The Drobe Team. Some rights reserved, click here for more information
    Powered by MiniDrobeCMS, based on J4U | Statistics
    "We've written to our solicitor, you'll hear from them"
    Page generated in 0.0241 seconds.