|
Using StrongED with Lua By Gavin Wraith. Published: 29th Apr 2006, 23:13:56 | Permalink | PrintableWork that text editor hard
URL grabbing
We can extract all the URLs from a HTML file by SHIFT-doubleclicking it to load it into StrongED, and then applying the following script:
#! lua
-- URL grab
url_pat = "<[aA]%s+[hH][rR][eE][fF]%s*=%s*\"([^%c\"]*)\""
io.input(arg[1])
s = io.read("*all")
io.input()
for url in s:gmatch(url_pat) do
print(url)
end -- for
The variable url_pat holds a pattern string, which is used to detect the presence of a HTML anchor tag in which a URL link is stored. It is not as gruesome as may first appear. It has to essentially pick out the <a href="http://www.."> tags in the given HTML web page. The pattern must start with a < followed by either a or A. Then the expression %s+ signifies at least one blank space. Then comes the word href or any variation on that got by using upper instead of lower case. The expression %s* means any sequence, including the empty sequence, of blank spaces. Then comes an equality sign, then maybe more blank spaces. The expression \" denotes a double-quote.
Then comes ([^%c\"]*). The parentheses mean that the pattern enclosed by them is to be captured. That pattern, [^%c"]*, matches any sequence of characters that are neither control characters nor doublequotes. So what the pattern captures is the stuff that comes inside doublequotes in a URL.
The function io.input makes its argument the default file for input if it exists, and switches to the keyboard otherwise. The effect of the three lines:
io.input(arg[1])
s = io.read("*all")
io.input()
is to read all the text in the StrongED window in as a single large string of characters, and assign it to the variable s. The for loop iterates over the matches of url_pat in s and the iteration variable url holds the string captured by the match. The function string.gmatch in Lua's string library produces an iterator for captures.
Back to the main article
Previous: Dispute over 'intrusive' VRPC copy protection
Next: R-Comp responds to Grapevine feedback
Discussion Viewing threaded comments | View comments unthreaded, listed by date | Skip to the endPlease login before posting a comment. Use the form on the right to do so or create a free account.
|
+++ Message board +++
Read messages or start a new thread
Search the archives
Today's featured article
RiscPC breaks 1.5GHz barrier Calm down dear, this is a case mod
23 comments, latest by flypig on 13/5/04 1:56PM. Published: 12 May 2004
Random article
Software news in brief SSH, Ogg Vorbis, utilities and more
3 comments, latest by IanK on 22/12/04 8:10PM. Published: 22 Dec 2004
Login
Create a new account
Forgot your password?
Useful links
News and media:
Iconbar •
MyRISCOS •
ArcSite •
RISCOScode •
ANS •
C.S.A.Announce •
Archive •
Qercus •
RiscWorld •
Drag'n'Drop •
GAG-News
Top developers:
RISCOS Ltd •
RISC OS Open •
MW Software •
R-Comp •
Advantage Six •
VirtualAcorn
Dealers:
CJE Micros •
APDL •
Castle •
a4 •
X-Ample •
Liquid Silicon •
Webmonster
Usergroups:
WROCC •
RONE •
NKACC •
IRUG •
SASAUG •
ROUGOL •
RONWUG •
MUG •
WAUG •
GAG •
RISCOS.be
Useful:
RISCOS.org.uk •
RISCOS.org •
RISCOS.info •
Filebase •
Chris Why's Acorn/RISC OS collection •
NetSurf
Non-RISC OS:
The Register •
The Inquirer •
Apple Insider •
BBC News •
Sky News •
Google News •
xkcd •
diodesign
|