Front page | perl.libwww |
Postings from July 2002
Re: Credentials in LWP
Thread Previous
From:
Sean M. Burke
Date:
July 2, 2002 15:01
Subject:
Re: Credentials in LWP
Message ID:
5.1.0.14.1.20020702154150.0225a4d0@mail.spinn.net
At 09:53 2002-07-02 -0500, Kenny G. Dubuisson, Jr. wrote:
>[...]I don't understand what it means by NetLoc and Realm[...]
Try hitting http://www.unicode.org/mail-arch/unicode-ml and it'll
say "enter your username and password for
Unicode-MailList-Archives". That "Unicode-MailList-Archives" string is the
realm name. NetLoc is the hostname plus colon plus the port number, by
default ":80" -- in this case, "www.unicode.org:80".
Here's an extract from chapter 11 of my new book, /Perl and LWP/
(<http://www.amazon.com/exec/obidos/ASIN/0596001789>)
which you might find useful and worth buying:
Authenticating via LWP
To add a username and password to a browser object's key ring, call the
credentials method on a user agent object:
$browser->credentials(
'servername:portnumber',
'realm-name',
'username' => 'password'
);
In most cases, the port number is 80, the default TCP/IP port for HTTP. For
example:
my $browser = LWP::UserAgent->new;
$browser->name('ReportsBot/1.01');
$browser->credentials(
'reports.mybazouki.com:80',
'web_server_usage_reports',
'plinky' => 'banjo123'
);
my $response = $browser->get(
'http://reports.mybazouki.com/this_week/'
);
One can call the credentials method any number of times, to add all the
server-port-realm-username-password keys to the browser's key ring,
regardless of whether they'll actually be needed. For example, you could
read them all in from a datafile at startup:
my $browser = LWP::UserAgent->new( );
if(open(KEYS, "< keyring.dat")) {
while(<KEYS>) {
chomp;
my @info = split "\t", $_, -1;
$browser->credential(@info) if @info == 4;
}
close(KEYS);
}
Security
Clearly, storing lots of passwords in a plain text file is not terribly
good security practice, but the obvious alternative is not much better:
storing the same data in plain text in a Perl file. One could make a point
of prompting the user for the information every time,* instead of storing
it anywhere at all, but clearly this is useful only for interactive
programs (as opposed to a programs run by crontab, for example). In any
case, HTTP Basic Authentication is not the height of security: the username
and password are normally sent unencrypted. This and other security
shortcomings with HTTP Basic Authentication are explained in greater detail
in RFC 2617. See the Preface for information on where to get a copy of RFC
2617.
* In fact, Ave Wrigley wrote a module to do exactly that. It's not part of
the LWP distribution, but it's available in CPAN as LWP::AuthenAgent. The
author describes it as "a simple subclass of LWP::UserAgent to allow the
user to type in username/password information if required for authentication."
An HTTP Authentication Example: The Unicode Mailing Archive
Most password-protected sites (whether protected via HTTP Basic
Authentication or otherwise) are that way because the sites' owners don't
want just anyone to look at the content. And it would be a bit odd if I
gave away such a username and password by mentioning it in this book!
However, there is one well-known site whose content is password protected
without being secret: the mailing list archive of the Unicode mailing lists.
In an effort to keep email-harvesting bots from finding the Unicode mailing
list archive while spidering the Web for fresh email addresses, the
Unicode.org sysadmins have put a password on that part of their site. But
to allow people (actual not-bot humans) to access the site, the site
administrators publicly state the password, on an unprotected page, at
http://www.unicode.org/mail-arch/, which links to the protected part, but
also states the username and password you should use.
The main Unicode mailing list (called unicode) once in a while has a thread
that is really very interesting and you really must read, but it's buried
in a thousand other messages that are not even worth downloading, even in
digest form. Luckily, this problem meets a tidy solution with LWP: I've
written a short program that, on the first of every month, downloads the
index of all the previous month's messages and reports the number of
messages that has each topic as its subject.
The trick is that the web pages that list this information are password
protected. Moreover, the URL for the index of last month's posts is
different every month, but in a fairly obvious way. The URL for March 2002,
for example, is:
http://www.unicode.org/mail-arch/unicode-ml/y2002-m03/
Deducing the URL for the month that has just ended is simple enough:
# To be run on the first of every month...
use POSIX ('strftime');
my $last_month = strftime("y%Y-m%m", localtime(time - 24 * 60 * 60));
# Since today is the first, one day ago (24*60*60 seconds) is in
# last month.
my $url = "http://www.unicode.org/mail-arch/unicode-ml/$last_month/";
But getting the contents of that URL involves first providing the username
and password and realm name. The Unicode web site doesn't publicly declare
the realm name, because it's an irrelevant detail for users with
interactive browsers, but we need to know it for our call to the credential
method. To find out the realm name, try accessing the URL in an interactive
browser. The realm will be shown in the authentication dialog box, as shown
in Figure 11-1.
In this case, it's "Unicode-MailList-Archives," which is all we needed to
make our request.
my $browser = LWP::UserAgent->new;
$browser->credentials(
'www.unicode.org:80', # Don't forget the ":80"!
# This is no secret...
'Unicode-MailList-Archives',
'unicode-ml' => 'unicode'
);
print "Getting topics for last month, $last_month\n",
" from $url\n";
my $response = $browser->get($url);
die "Error getting $url: ", $response->status_line
if $response->is_error;
If this fails (if the Unicode site's admins have changed the username or
password or even the realm name), that will die with this error message:
Error getting http://www.unicode.org/mail-arch/unicode-ml/y2002-m03/:
401 Authorization Required at unicode_list001.pl line 21.
But assuming the authorization data is correct, the page is retrieved as if
it were a normal, unprotected page. From there, counting the topics and
noting the absolute URL of the first message of each thread is a matter of
extracting data from the HTML source and reporting it concisely.
my(%posts, %first_url);
while( ${ $response->content_ref }
=~ m{<li><a href="(\d+.html)"><strong>(.*?)</strong>}g
# Like: <li><a href="0127.html"><strong>Klingon</strong>
) {
my($url, $topic) = ($1,$2);
# Strip any number of "Re:" prefixes.
while( $topic =~ s/^Re:\s+//i ) {}
++$posts{$topic};
use URI; # For absolutizing URLs...
$first_url{$topic} ||= URI->new_abs($url, $response->base);
}
print "Topics:\n", reverse sort map # Most common first:
sprintf("% 5s %s\n %s\n",
$posts{$_}, $_, $first_url{$_}
), keys %posts;
Typical output starts out like this:
Getting topics for last month, y2002-m02
from http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/
Topics:
86 Unicode and Security
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0021.html
47 ISO 3166 (country codes) Maintenance Agency Web pages move
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0390.html
41 Unicode and end users
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0260.html
27 Unicode Search Engines
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0360.html
22 Smiles, faces, etc
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0275.html
18 This spoofing and security thread
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0216.html
16 Standard Conventions and euro
http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0418.html
This continues for a few pages.
[end extract]
--
Sean M. Burke http://www.spinn.net/~sburke/
Thread Previous