Re: String Literals, take 2

Front page | perl.perl6.documentation | Postings from December 2002

Re: String Literals, take 2

Thread Previous | Thread Next

From:

Larry Wall

Date:

December 4, 2002 11:47

Subject:

Re: String Literals, take 2

Message ID:

20021204194752.GA7001@wall.org

On Mon, Dec 02, 2002 at 04:42:52PM -0500, Joseph F. Ryan wrote:
: >Has this been vetted?  $(...)/etc seem to cover this case, and & being 
: >a qq() metachar makes using qq() strings to print HTML/XML difficult. 
: 
: 
: Well, it was in Apoc 2:
: http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of 
: subroutines
: http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of 
: object method calls

This is why the parens are required on sub interpolations.
HTML/XML entities don't have parens.  The parens are required on
method interpolations because it's too easy to get an accidental
"." after a variable.

: >>=item Escaped Characters
: >># Basically the same as Perl5; also, how are locale semantics handled?
: >>
: >>   \t            tab
: >>   \n            newline
: >>   \r            return
: >>   \f            form feed
: >>   \b            backspace
: >>   \a            alarm (bell)
: >>   \e            escape
: >
: >Can we get some riggor here?  Also, is \n the same everwhere, or do we 
: >play the same tricks we did with it in p5?  (I think it should be the 
: >same everywhere, a CR char, "\cM".  Disciplines, or encodings, or 
: >whatever we're calling them, can take care of it on IO.)  Oh, and it 
: >might be nice for \0 to be NUL.  (This used to be implicit with \0 as 
: >octal, but since \0 isn't octal anymore...) 
: 
: 
: As someone who has had to use NT, Mac OS 9, and Solaris with much
: frequency, I can say I very much appreciated the special tricks
: that \n did (does).

In regexen, \n matches any known newline sequence.  In a string, it interpolates
whatever is the native newline.

: >>   \b10        binary char

Can't easily have this and backspace \b.  But \b is already a mess from
meaning word boundary in regexen.  I'm inclined to throw out \b meaning
backspace.  It doesn't really work well in a Unicode world anyway.  If
you really mean it you can always specify a control-H.

: >>   \o33        octal char
: >
: >Numeric Literals, take 3 
: >(http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html), 
: >in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the shorthand 
: >form of octal numbers, so it doesn't make much sense for octal character 
: >constants to be \o123.  Do we want to change shorthand octal literal 
: >numbers to 0o123 (I don't like this, it's hard to read), change octal 
: >chars to \c123 (can't do this without getting rid of, or changing,  \c for 
: >control-character), get rid of octal chars entirely, or somthing else?  
: >(Baring a good "somthing else", I vote for killing octal chars.)
: 
: 
: This seems to be going back and forth:
: 
: $octal_format = ($octal_format_still_exists) ?
:                   sprintf("\\%s%d",$octals_current_letter_of_the_week, 
: $number) :
:                   undef;
:                  
: That should clear things up.
: 
: >>   \x1b        hex char
: >
: >Exactly two digits after the \x?  Perl5 attempts to do the right thing 
: >either way, but this can be confusing too -- "\xA" eq chr(0xA), 
: >"\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux". 
: 
: 
: That was in perl5's perldoc, so I assume it is encouraged.
: 
: You brought this up before:
: http://archive.develooper.com/perl6-documentation@perl.org/msg00485.html
: 
: I still say to stick with perl5's behavior.
: 
: >>   \x{263a}    wide hex char

May switch all of these to use square brackets instead of curlies:

      \x[263a]    wide hex char

: >>   \c[            control char

\c is no longer control char.  \c means what \N used to mean.
(\N now means "not a newline".)

To specify a control-H, say \c[^H].

: >Rigor?  What is \c~?  perl5 thinks it's >, should perl6 agree?
: 
: 
: I don't see why it shouldn't.
: 
: >How about \c\x{1000} (that's invalid, but you get the point), is that 
: >equiv to \x{ff9c}?
: 
: 
: No, its "\c\" ~ "x{1000}"
: 
: >What about \cé, (e+acute accent), does that capitalize, then subtract 
: >64, or just subtract?

\c[^é] would be é with it's 64-bit flipped.

: >>   \N{name}    named Unicode character

No, that's now \c[name].  \N means "not a newline".  Note that
\C[name] means "not a \c[name]".

: Just recycle perl5's, I suppose.  Not *everything* needs to be redone 
: from scratch.

True, but everything is being reevaluated from scratch.  Nothing gets a
free ride just because it's in Perl 5.

: >Is there any way to give the ordnal in decimal, like "\d192"?  (I'm 
: >not sure how useful this would be, but it would be nice parrellelisim. 
: >OTOH, you can use chr() easily enough. 
: 
: 
: That is a good point; if there is a 0dxxxxx, then there should be a 
: "\dxxxxx".

Can't, if \d still means digit.  But maybe \x[1234] is shorthand for
\c[0x1234].  In which case, you can always say \c[0d4321].

: >>=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>
: >>
: >>Modifiers apply a modification to text which they enclose; they can be
: >>embedded within interpolated strings.
: >>
: >>   \L{}        Lowercase all characters within brackets
: >>   \U{}        Uppercase all characters within brackets
: >>   \Q{}        Escape all characters that need escaping
: >>               within brackets (except "}")

Square brackets preferred these days--looks less like a closure.

: >Rigor: escape all non-alphanumerics.
: >Do we still have the other modifiers that p5 supports, \l and \u?

Yes, unless we want to roll over and allow \uXXXX for unicode, just to
be compatible with the rest of the world.

: >Do we want a new titlecase modifier, \T{james mastros} eq "James 
: >Mastros", doing the Right Thing for other languages, where it isn't so 
: >simple (there are complicated cases for this, but IIRC Unicode defines 
: >a robust algo to do this).  I'll check on the Unicode stuff if anybody 
: >thinks it's a good idea... I'm uncertian, myself, I never liked the 
: >qq() case-modifers, so don't use them. 
: 
: 
: There is ucfirst(), which I'm sure could be updated to handle Unicode;
: however, I don't know if it is important enough to deserve \T{}.  You
: might want to ask Larry :)

\u does title-case already in Perl 5.  \U[] will do uppercase.
So \u\U[$foo] would titlecase the first letter and uppercase the rest.

: >>A line-oriented form of quoting is based on the shell "here-document"
: >
: >s/shell/unix borne shell/
: >
: >>syntax.  Following a << you specify a string to terminate the quoted
: >>material, and all lines following the current line down to the
: >>terminating string are the value of the item. The terminating string
: >>may be either an identifier (a word), or some quoted text. If quoted,
: >>the type of quotes you use determines the treatment of the text, just
: >>as in regular quoting. An unquoted identifier works like double quotes.
: >>The terminating string must appear by itself, and any preceding or
: >>following whitespace on the terminating line is discarded.
: >
: >I could have sworn that Larry recently put somthing out about the edge 
: >cases between << heredoc and << beginning-of-qw.  I /think/ he said 
: >that  qw("Foo" bar) must be written as << "Foo" bar>>, because 
: >otherwise it would be interpreted as a here-doc ending with Foo with 
: >double-quote interpolation.  Can anybody find this, or is Larry watching?

Here docs require quotes, so <<EOF is the beginning of a qw//.  (This week.)

: >>Also note that with single quoted here-docs, backslashes are not
: >>special, and are taken for a literal backslash, a behaivor that is
: >>different from normal single-quoted strings.
: >
: >Are \qq()s still special, even in <<'noninterpolating's?  Either way, 
: >it should be explicitly noted. 
: 
: 
: As far as I know, *nothing* is special in a single quoted heredoc.

Here docs is where you *most* want the \qq[] ability.  It is assumed that
the sequence "\qq[" will not occur by accident very often in the typical
single-quoted string.

Larry

Thread Previous | Thread Next

String Literals, take 2 by Joseph F. Ryan

Re: String Literals, take 2 by Andrew Wilson

Re: String Literals, take 2 by Joseph F. Ryan

Re: String Literals, take 2 by Luke Palmer

Re: String Literals, take 2 by Andrew Wilson
Re: String Literals, take 2 by Joseph F. Ryan

Re: String Literals, take 2 by James Mastros

Re: String Literals, take 2 by Joseph F. Ryan

Re: String Literals, take 2 by Michael Lazzaro

Re: String Literals, take 2 by James Mastros

Re: String Literals, take 2 by Larry Wall
Re: String Literals, take 2 by Luke Palmer

Re: String Literals, take 2 by Larry Wall

Re: String Literals, take 2 by Michael Lazzaro
Re: String Literals, take 2 by Brad Hughes

nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About