develooper Front page | perl.perl6.documentation | Postings from November 2002

Re: String Literals, take 1

Thread Previous | Thread Next
From:
James Mastros
Date:
November 30, 2002 05:15
Subject:
Re: String Literals, take 1
Message ID:
3DE8403C.4030701@mastros.biz
On 11/29/2002 7:40 PM, Joseph Ryan wrote:
> - References and Object stringification hasn't been defined.
I belive it goes somthing like this:
All objects define a .AS_STRING method.  This method is called to 
stringify the object.  The builtin types have builtin .AS_STRINGs, the 
primitive types autopromote.  All strinification thus follows the same 
logical model, even if the implementation doesn't.

The default .AS_STRING for Strings is obvious.  Int and Num stringify to 
a decimal number (using the e exponential form if it is shorter?).

> - If References interpolate in some sort of readable way, how do
>  multi-leveled references interpolate, and how do self-referring
>  data structures interpolate?
Multi-leveled: The outer .AS_STRING calls it's members' .AS_STRINGs. 
Circular: I have no idea.

> A string is a literal value that represents a sequence of characters.
Possibly misleading: Leads people to think that a string is an array of 
chars, like in C?  (I don't think so, but new-to-perl people might.  I'm 
being nitpicky.)

> The base form for a non-interpolating string is the single-quoted
> string: 'string'.  However, non-interpolating strings can also be formed
> with the q() operator.  The q() operator allows strings to be made with
> any non-space, non-letter, non-digit character as the delimeter instead
> of '.  In addition, if the starting delimeter is a part of a paired
> set, such as (, [, <, or {, then the closing delimeter may be the
> matching member of the set.  In addition, the reverse holds true;
> delimeters which are the tail end of a pair may use the starting item
> as the closing delimeter.
This should be moved to general documentation for pick-a-delimiter 
functions.  Also, a rigirous definition of a pair of delimiters might be 
nice.  I'll look at unicode.org and see if I can find somthing out.

> It is also possible to embed an interpolating string within a non-
> interpolating string by the use of the \qq{} construct.  A string
> inside a \qq{} constructs acts exactly as if it were an interpolated
> string.  Note that any end-brackets, "}", must be escaped within the
> the \qq{} construct so that the parser can read it correctly.
Is the \qq{} construct a pick-a-delimiter thing?  I think it should be, 
for parallelisim with the qq() operator.

> =head3 <>; expanding a string as a list.
> 
> A set of braces is a special op that evaluates into the list of words
> contained, using whitespace as the delimeter.  It is similar to qw()
> from perl5, and can be thought of as roughly equivalent to:
> C<< "STRING".split(' ') >>
I thought it was named <<foo bar baz>> or «foo bar baz» or qw().  (That 
middle one should be U+00AB and U+00BB, \N{LEFT-POINTING DOUBLE ANGLE 
QUOTATION MARK} and \N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}. 
Additionaly, I'm fairly certian, the Unicode ops could be either 
direction.  I think there was a reason for that, but I don't remember what.

>    \t            tab
\U{9}
>    \n            newline
\U{10}
>    \r            return
\U{13}
>    \f            form feed
\U{12}
>    \b            backspace
\U{8}
>    \a            alarm (bell)
\U{7}
>    \e            escape
\U{27}
>    \b10        binary char
>    \o33        octal char
Is this true?  We changed the numeric octal shorthand base to 0c777, so 
what sense does \o for octal charcters make?  (Unfornatly, we can't use 
\c, since that's taken for control charcters.)  IIRC, somebody had 
mentioned just getting rid of \o altogether.  People don't think in octal.
>    \x1b        hex char
Specificly, \x must be followed by exactly two hex digits, or do we DWIM 
with one (IE, if there is only one character in 0-9A-Fa-f after the \x, 
do we
>    \x{263a}    wide hex cha
>    \c[            control char
In purticular, take the character after the \c, and call it $c.  If it's 
in [a-z], convert it to upper case.  Then delete 0x40, and take the 
character with that ordnal.  (This gives the traditional semantic for 
characters [@-_], characters after that will map back onto the printable 
range -- should characters after _ be illegal, or just map back to the 
printable range?  (The next char, `, maps on to space.))

>    \N{name}    named Unicode character
Suggested extension: \U{13#ac05} is Unicode character number ac05 in 
base 13.  Any perl expression will do inside the {}s.

>    \Q{}        Escape all characters that need escaping
>                within the current string (except "}")
Escape all characters in [^A-Za-z0-9] within the {}'d part with 
backslashes.  ("that need escaping" is inexact.)

> Within an interpolated string, interpolation of expressions can be
> stopped by \Q.
(Which acts somewhat like a non-breaking space.)

> The collected standard output of the
> command is returned; standard error is unaffected. 
Standard error is passed on to the standard error of the perl process? 
(Or should we leave it at "unaffected", and let the user guess what that 
means on their OS -- I'm betting I'm being unix-centric here -- OS<=9 
has no concept of "standard error" -- or "standard output", for that 
matter... IIRC, again.)

> In scalar context,
> it comes back as a single (potentially multi-line) string, or undef if
> the command failed. In list context, returns a list of lines (however
> you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty
> list if the command failed.
I don't think $/ still exists, at least as such.  In fact, I think we 
should probably just say "returns an iterator on the standard output of 
the command", and leave it at that.

> # modified from perl5's perlop
> A line-oriented form of quoting is based on the shell "here-document"
> syntax.  Following a << you specify a string to terminate the quoted
> material, and all lines following the current line down to the
> terminating string are the value of the item. The terminating string
> may be either an identifier (a word), or some quoted text. If quoted,
> the type of quotes you use determines the treatment of the text, just
> as in regular quoting. An unquoted identifier works like double quotes.
I think we need a non-optional space to follow the << in the case of 
double-quotes to disambuilage with <<>> qw lists.

> The terminating string must appear by itself, and any preceding or
> following whitespace on the terminating line is discarded.
This should probably be a link to somthing defining exactly what 
whitespace is in perl.  I suspect we should follow Unicode's defintion 
of whitespace -- possibly dissallowing the zero-width whitespace, for 
sanity reasons.  Come to think of it, I think Unicode has both the 
concept of "whitespace" and of "word-sepperating characters", which 
aren't neccessarly the same -- zero-width non-breaking space is 
nonprinting, and doesn't wordbreak, but is whitespace!

> =head2 Gory Details of parsing quoted constructs

I think this section is going to be very much different -- since the 
perl6 parser is going to be defined in perl6 regexes, it may just say 
"see anydelimiter.pl and quoted.pl".

	-=- James Mastros


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About