Front page | perl.perl5.porters |
Postings from February 2003
[perl #21372] utf8 in regex leads to corruption when used with uc($1)
Thread Next
From:
perlbug-followup
Date:
February 26, 2003 08:39
Subject:
[perl #21372] utf8 in regex leads to corruption when used with uc($1)
Message ID:
rt-21372-52785.19.3710596019351@bugs6.perl.org
# New Ticket Created by (Dominic Mitchell)
# Please include the string: [perl #21372]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=21372 >
This is a bug report for perl from dom@semantico.com,
generated with the help of perlbug 1.34 running under perl v5.8.0.
-----------------------------------------------------------------
[Please enter your report here]
I have seen a problem a few times in local development where I have
gotten back extremely odd results after doing uc($1) or lc($1).
Finally, I've managed to work out a test script to demonstrate this.
This code snippet is based on some HTML::Mason code, which was showing
the problem.
#!/usr/bin/perl -w
use strict;
use warnings;
use Encode;
# Various bits taken from HTML::Mason::Lexer::match_block().
my $blocks_re = qr/once|flags|filter|args|attr|init|shared|perl|text|doc|cleanup/i;
my $comp_source = <<'WIBBLE';
<%args>
$foo
</%args>
This is a pretend mason component.
WIBBLE
Encode::_utf8_on( $comp_source );
if ( $comp_source =~ /\G<%($blocks_re)>/igcs ) {
print "\$1 is $1\n";
my $type = lc $1;
print "[1] \$type is '$type'\n";
print "[2] \$type is '$type'\n";
}
I have run this script under perl 5.8.0 on RedHat 8.0, RedHat 7.2 and
FreeBSD 5.0. I believe that it also occurs under Perl 5.6.1, but I
don't have one handy to test for right now.
Unfortunately, I'm not familiar enough with the perl source to attempt
to fix this problem. :-(
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.0:
Configured by bhcompile at Sun Sep 1 23:55:07 EDT 2002.
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.18-11smp, archname=i386-linux-thread-multi
uname='linux daffy.perf.redhat.com 2.4.18-11smp #1 smp thu aug 15 06:41:59 edt 2002 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -march=i386 -mcpu=i686',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/include/gdbm'
ccversion='', gccversion='3.2 20020822 (Red Hat Linux Rawhide 3.2-5)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lpthread -lc -lcrypt -lutil
libc=/lib/libc-2.2.92.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.2.92'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.0:
/home/dom/libexec/perl
/home/dom/libs
/usr/lib/perl5/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/5.8.0
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
.
---
Environment for perl v5.8.0:
HOME=/home/dom
LANG=C
LANGUAGE (unset)
LC_ALL=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/dom/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/local/sbin:/home/dom/bin
PERL5LIB=/home/dom/libexec/perl:/home/dom/libs
PERL_BADLANG (unset)
SHELL=/bin/zsh
Thread Next
-
[perl #21372] utf8 in regex leads to corruption when used with uc($1)
by perlbug-followup