Front page | perl.perl6.language |
Postings from June 2002
Regexes and untainting
Thread Next
From:
mosullivan
Date:
June 21, 2002 00:36
Subject:
Regexes and untainting
Message ID:
1024606383.3d1240af9495d@webinbox.com
SUMMARY
By default, regexes shouldn't untaint. Also, provide a toolkit for Safer
Untainting.
DETAILS
We're all aware of how you go about untainting data: run it through a regex and
grab the stuff that was in the parens:
unless ($var =~ m/^(\w+)$/)
or die 'unsafe data';
$var = $1;
Untainting using the example above makes sense, but I've always been
uncomfortable with the fact that $1 is always untainted, even if that wasn't my
intent in using parens. Consider this example of parsing a string of fixed-
length records:
@fields = $raw =~ m/^(.{3})(.{25})(.{30})/;
The data that comes back from that expression is always untainted. Frankly,
few non-advanced programmers are going to notice that. Yes, there are
techniques for retainting, but when you get to that point it starts feeling
like you're working against the language, not with it.
Ergo, I propose that regexes only untaint stuff in parens if you specifically
tell them to do so. A capital-T switch would work nicely:
unless ($var =~ m/^(\w+)$/T)
or die 'unsafe data';
The T regex switch will help lazy programmers like me, but there's more that
could be done to encourage responsible untainting. As has been pointed out ad
nauseum, /(.*)/ unravels the whole taint sweater. Granted, anybody who uses
tainting but also uses that regex is just plain goofy, but more subtle mistakes
still happen, like filtering out the bad instead of filtering in the good. So
I'd like to propose that a standard module be included with Perl that does a
series of common, useful taint checks, and furthermore that use of that untaint
module is encouraged as the beginner's tool for untainting. Here's some
examples of how it would work:
use Untaint ':all';
# Untaint $var only if $var =~ m/^(\w+)$/
# Why is that the default? See below.
untaint($var);
# Same as default
untaint($var, -format=>Untaint::WordOnly);
# Untaint $var only if $var is the path to a
# file that already exists.
untaint($var, -format => Untaint::FileExists );
# Untaint $var only if $var is the path to a file, and
# that file is within /tmp/myfiles
untaint($var, -format => Untaint::FileExists, -tree=>'/tmp/myfiles' )
# Untaint $var unconditionally, which is dangerous
untaint($var, -format => Untaint::Dangerous )
# I'm sure we could think of many others
Why do I propose that the default for untaint is to only untaint vars that are
only word characters? I do so because it puts into beginner's heads the idea
that untainting is something you only do so carefully screened data, and
furthermore because word-char-only strings are one of the most common and
safest forms of untainting (IMHO). untaint would include the ability to
untaint everything, but you have to acknowledge that it's Untaint::Dangerous to
do so.
-Miko
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
Thread Next
-
Regexes and untainting
by mosullivan