develooper Front page | perl.perl6.language | Postings from April 2002

Re: Regex extensions?

Thread Previous | Thread Next
From:
Larry Wall
Date:
April 1, 2002 16:28
Subject:
Re: Regex extensions?
Message ID:
200204020024.QAA03191@wall.org
Robin Houston writes:
: Are there any plans to change the regex syntax for Perl 6?

That's what the next apocalypse is about.

: I ask because I've spent the last few days playing with PCRE,
: and I added a rather powerful extension to it as an experiment.
: Details at http://www.puffinry.freeserve.co.uk/regex-extension.html

Interesting.  I do worry that recursion back into the same regex will
(at least with the current regex syntax) result in tremendous
obfuscation for anything other than (?1).  We need recursive syntax at
least as readable as yacc's.

The basic underlying problem is that regex syntax doesn't really
let you declare and define rules separately from executing them.

: Named capture-blocks are *long* overdue as well.

Fer shure.  Though there are right ways to do them and wrong ways to
do them.  Or at least, righter and wronger.

: Is the (?{code}) / (??{blah}) experiment generally regarded as a
: success? Is it too early to say?

I'd say they were a good start, but the syntax is self-obfuscating.
And the qr//-generated regexes know nothing about the data structure in
which they are stored, so there's no decent way to analyze or optimize
a system of interrelated regexes as a compiler-compiler would.

: There's a reasonable case to be made
: that having regexes which are less than Turing complete is actually
: an advantage: you can guarantee termination, for example. There's
: also the inevitable security issue - if regexes can contain unlink()
: calls, then you have to be pretty careful with them...
: 
: If my proposal has a hidden agenda, it's that I want to show that
: you can get a lot of the power we want without actually having to
: embed arbitrary code.

I think we need to think through how we name a set of regexes such that
they could be stored in something resembling a hash.  It's not clear
whether such a hash should be declared within the regex or outside of
it (or maybe either).  Clearly, however, the regex engine needs to
know about that data structure, whatever it turns out to be, and however
it happens to be scoped.

This is all intimately bound up with the question, "What would it take
for Perl's regexes be good at parsing Perl and variants of Perl?"

Larry

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About