develooper Front page | perl.perl6.language | Postings from June 2002

Regexes and untainting

Thread Next
From:
mosullivan
Date:
June 21, 2002 00:36
Subject:
Regexes and untainting
Message ID:
1024606383.3d1240af9495d@webinbox.com
SUMMARY

By default, regexes shouldn't untaint.  Also, provide a toolkit for Safer 
Untainting.

DETAILS

We're all aware of how you go about untainting data: run it through a regex and 
grab the stuff that was in the parens:

	unless ($var =~ m/^(\w+)$/)
		or die 'unsafe data';
	$var = $1;

Untainting using the example above makes sense, but I've always been 
uncomfortable with the fact that $1 is always untainted, even if that wasn't my 
intent in using parens.  Consider this example of parsing a string of fixed-
length records: 

	@fields = $raw =~ m/^(.{3})(.{25})(.{30})/;

The data that comes back from that expression is always untainted.  Frankly, 
few non-advanced programmers are going to notice that.  Yes, there are 
techniques for retainting, but when you get to that point it starts feeling 
like you're working against the language, not with it.

Ergo, I propose that regexes only untaint stuff in parens if you specifically 
tell them to do so.  A capital-T switch would work nicely:

	unless ($var =~ m/^(\w+)$/T)
		or die 'unsafe data';

The T regex switch will help lazy programmers like me, but there's more that 
could be done to encourage responsible untainting.  As has been pointed out ad 
nauseum, /(.*)/ unravels the whole taint sweater.  Granted, anybody who uses 
tainting but also uses that regex is just plain goofy, but more subtle mistakes 
still happen, like filtering out the bad instead of filtering in the good.  So 
I'd like to propose that a standard module be included with Perl that does a 
series of common, useful taint checks, and furthermore that use of that untaint 
module is encouraged as the beginner's tool for untainting.  Here's some 
examples of how it would work:

 use Untaint ':all';
 
 # Untaint $var only if $var =~ m/^(\w+)$/
 # Why is that the default?  See below.
 untaint($var);
 
 # Same as default
 untaint($var, -format=>Untaint::WordOnly);
 
 # Untaint $var only if $var is the path to a
 # file that already exists.  
 untaint($var, -format => Untaint::FileExists );
 
 # Untaint $var only if $var is the path to a file, and
 # that file is within /tmp/myfiles
 untaint($var, -format => Untaint::FileExists, -tree=>'/tmp/myfiles' )
 
 # Untaint $var unconditionally, which is dangerous
 untaint($var, -format => Untaint::Dangerous )
 
 # I'm sure we could think of many others

Why do I propose that the default for untaint is to only untaint vars that are 
only word characters?  I do so because it puts into beginner's heads the idea 
that untainting is something you only do so carefully screened data, and 
furthermore because word-char-only strings are one of the most common and 
safest forms of untainting (IMHO).  untaint would include the ability to 
untaint everything, but you have to acknowledge that it's Untaint::Dangerous to 
do so.

-Miko

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About