develooper Front page | perl.perl6.language | Postings from February 2001

Re: Auto-install (was autoloaded...)

Thread Previous | Thread Next
From:
Branden
Date:
February 9, 2001 09:49
Subject:
Re: Auto-install (was autoloaded...)
Message ID:
OE63UzmRf6Ci6nx8dyD0000319f@hotmail.com
This is the alpha version of the PDD about archives. I actually didn't have
the time to format it as a POD, and probably won't have the time to do it
until Monday, I don't even think I'll have time to check the lists on the
weekend. Nevertheless, I'm sending it on mail-message format for your
appreciation. I think it has most the information that was discussed here,
plus some thoughts I had.

I expect to collect some comments about it before writing the beta version,
this one in POD format. I'm starting to think this discussion should move to
another list, like -build, -stdlib. Maybe -source-control is related too. I
realise they're not active, but I actually think this is not a -language
issue. Anyway, I'm posting it here, if anyone thinks we should move it to
another list, just name the list and we move there.

Hope you like the text. Help to correct spell and grammar would be very
appreciated.

- Branden

------------------------ (cut here) -----------------------



PDD: `par' -- The Perl Achive


1. Introduction

`par' stands by `Perl Archive'. It's a way to provide deploy and
installation of Perl programs/scripts/modules. It helps taking care of
module dependencies, which is mostly painful in Perl 5.



2. Motivation

When a programmer writes a Perl script (or module) and wants to share it
with other users, he probably uploads the code to CPAN, as a .tar.gz file or
something like that. Users are supposed to get the script via http or ftp,
untar it, and run it.

So far it's fine, but the problem begins when the script depends on modules
that aren't included in Perl's standard library, and are supposed to be
installed by users. The first headache starts when the user tries to run the
script and gets a message saying Perl can't find module X. He takes the time
to go to CPAN, download X, install it and try to run the script again, only
to find that the script also needed module Y or that X actually needed Z.
This process continues until there are no more needed modules or the user
gets tired of it and gives up running the script, whatever comes first.

This is a downside for developpers too, because they want to get around
this, and try to give the users installing instructions, telling them they
should install modules X, Y and Z to run the script. And since most layman
users don't care about installing modules, probably they'll also give up
running the script. The outcome is that only scripts that depend only on
standard modules achieve wide acceptance among layman users.




3. Goals


The proposal of this PDD is to define a standard way for developpers to
package their Perl code so that all needed dependencies are included in one
only file. And also define a standard way for users to install or run the
scripts distributed on this archive form.

The specification of `par' should satisfy the following properties:

a) It should allow distribution of Perl code in platform independent form.

b) It should allow distribution of Perl code and compiled C extensions in
platform dependent form.

c) It should allow distribution of Perl code in both bytecode and source
form, even both mixed in the same archive.

d) It should allow the user to install the contents of the archive. It
should also allow the user to uninstall an archive or upgrade to a new
version of it. It should allow the user to choose how the modules contained
on the archive will be installed in relation to the system modules. It
should allow the user to override the directories used for installation.

e) It should allow the user to run the scripts and use the modules contained
in the archive without requiring installation.

f) It should allow the inclusion of documentation and resource files
(images, sounds, dbms, text databases, ...) for all scripts and modules
included in the archive, and documentation for the archive itself.

g) The archive file should be stored in a format that can be easily created
and inspected with widely available external tools.

h) It should allow the developper to merge existing archives together with
files into new archives, so that the developper can package his scripts with
existing binary archives of depended modules.

i) There should be tools that help the developper to find dependencies of
his scripts, and to package his scripts along with modules it depends on an
archive.

j) The `par' mechanism should be bundled by default in Perl 6.0.0. Having it
added later would possibly cause problems with users of Perl 6.0.0 that
don't have it installed and would want to use archives. Not having anything
else on 6.0.0 isn't really that critical, because by using `par' it'll be
easy to install whatever missing module.





4. Proposed Implementation


The zip file format is proposed for packaging the files. It's independent of
platform and has no licensing constraints (must check that...). The reason
to choose zip instead of tar/gzip is that with zip one can access one file
without having to decompress the whole archive. That's needed to run scripts
from an archive without unpackaging the archive. This should be implemented
using tied filehandles of Perl, that would read the archive file directly,
uncompress it on-the-fly and pass the code to Perl as it were a regular
file.

The archives should have a standard name, composed by the name of the
script/module/program, the version, the platform for which it's targeted,
and the extension `.par'. This is inspired in the naming scheme of rpm's.
Platform specific archives (ones that include compiled C code) should have
the name of the platform extracted from their names, and compared with the
platform recorded in Perl config, to avoid that the wrong platform archive
gets installed or runned. Archives that contain only Perl code should have
`.noarch' as the platform. Archives that contain C source code should have
`.src' as the platform. Note that this last type requires compilation, and
would require the user to have a C compiler and a `make' utility. The
version number in the name is used to determine if a later version is
available for upgrading.

Perl modules are traditionally stored on $PERLLIB and $PERLSITELIB. This
should be standardized so that only modules that actually are standard Perl
modules are in $PERLLIB. This would be useful for developpers to determine
which modules aren't standard and should be included in the archive.
Installed modules that should be global would get installed to $PERLSITELIB.
If the user wishes to install them locally, they would be installed to a
subdirectory in his home directory, like ~/.perl/lib. Executable scripts
would also get installed to a standard directory, like /usr/local/bin for
scripts that should be installed globally, and ~/bin for local ones.

Resources should be installed in a standard place, and scripts/modules that
need resources should access them using the module PAR::Resource to find
their location. PAR::Resource would also give transparent access to
resources in the case the scripts/modules are running from inside the
compressed archive, in which case they would read the resource file from a
magic filehandle that does on-the-fly uncompression. The PAR:: and all the
namespaces below it should be reserved for `par' related modules.

The directories in an archive would be organized in a way that it would be
easy to determine which are the scripts and modules it provides, which
versions of them are there, and which files are code, which are resources,
which are docs, etc. The directory structure is what ultimately determines
which files are installed where.



There should be two utilitaries named `par' and `pun'. `par' would be used
to build archives. It would get data from parameters or from a configuration
files, and read data from files containing scripts, resources, docs, ...,
and other archives containing pre-packaged modules, and generate a new
archive. The parameters or configuration files would determine what
architecture the archive belongs to. A `par' target should be included in
MakeMaker's Makefile target, so that `make par' would actually create an
archive of the current project. `par' should have a command line option to
make it find the dependencies for the scripts/modules. It would parse the
Perl code looking for `use' or `require', and add the used modules to a
list. If it sees something like `require $x' or anything that would only be
defined at runtime, it raises a warning telling the developper to check that
line of the code to see if that means a dependency not found by `par'.

A `pun' utility would be used by final users to unpack and install or run
the archives. Tipically, the usage of `pun' would be such that it's only
necessary to call `pun -i archivename-1.0.noarch.par' to install an archive,
`pun -r archivename-1.0.noarch.par' to run the main script included in an
archive without installing it, `pun -u archivename-1.0.noarch.par' to
uninstall the files provided by an archive, and `pun -U
archivename-1.1.noarch.par' to upgrade `archivename' to version 1.1. `pun'
would have a configuration file that would contain the default locations and
directories chose by the user. `pun' could actually receive a longer list of
parameters to override the config file defaults or even to modify the config
file. `pun' would probably maintain a database of installed archives with
information about the files, to make it easy to uninstall the files of an
archive. `pun' should be modelled after rpm, since that's probably the
installer utility that achieves more successfully these goals.

Both `par' and `pun' would be distributed with Perl, and would be included
in `bin' directory in Perl's tree, so that they would probably be in the
system $PATH.





5. Developper Support


A standard directory layout for writing and building both scripts and
modules should be defined and well documented. This will be specially
important when building archives, since `par' will use knowledge about this
layout for finding depended modules and knowing what is code, what is
binary, what is resources, what is documentation, and so on.

Currently, this is partly supported for modules by using h2xs, that builds a
directory layout and template files for creating an extension. This process
should be generalized for scripts, pure Perl modules, as well as external
language extensions.

A developper should create all his projects as subdirectories of a directory
he chooses, so that the different projects are all subdirectories of the
same parent directory. The name of the subdirectory should be defined as the
name of the project, which for modules is defined as the namespace of the
module with :: replaced by -.

Name of projects that are supposed to provide scripts for the user could
have an arbitrary name, even if there's actually no script with that name in
it. This is supposed to support sets of tools be distributed as scripts. One
example of this would be RDB
(http://www.cse.ucsc.edu/research/compbio/rdb/), which is comprised by
several perl scripts, such as `column', `row', `sorttbl' and `ptbl', but
actually none named `rdb'.

Special files containing meta-information would be stored on the project's
root. These files include README, CHANGES, LICENSE, and similars. Other
examples of meta information files would be the ones that describe
dependencies, VERSION, PLATFORMS, and other files that are intended to be
more directly processed by machines rather than by humans. A
Makefile/Makefile.PL/equivalent would be also stored in the project root.

Inside of a project directory, there would be subdirectories for special
purpose files. For example, t/ for test/qa files, eg/ for examples of usage,
src/ for C and other external languages source code, obj/ for object
compiled files, rsrc/ for resource files, lib/ for binary external
libraries, bin/ for native or perl executables that should be installed in a
directory in machine's $PATH, etc. All projects would potentially have all
this subdirectories, althought src and obj would be rarely found on projects
for scripts, and modules would probably not have bin, for example.

This directory structure should be well documented in one of the perldoc
documents, including which extensions/types of files are allowed in each
subdirectory (for example, lib/ should have .so or .dll, eg/ should have
.pl, .sh or .bat, ...) and if subdirectories are allowed inside of each
subdirectory. The directory structure should allow using CVS/RCS for team
project development.

There should be a tool to create the directory structure and file templates
(like h2xs does now). The developper should tell this tool what's the name
of the project, and which type it is (script, module written in Perl, module
with external language extension), and the tool would create the directories
that most probably suit the type of the project. There should be another
tool that checks the structure of an existing project and flags possible
errors that would make automated tools loose themselves in the structure.

Both `make' and `par' should use the directory structure to determine which
files should be built and which ones should be included in the archive.

`make' should support a `par' target, so that `make par' would call the
`par' utility with all necessary parameters for the current project. When
dependencies are needed, they are searched in the directory above the root
project directory, since all projects would be subdirectories of the same
parent directory. Other locations to search for dependencies should include
the $PERLSITELIB directory, and directories included in files of meta
information inside the project directory.

Conversely, `par' should be used to make the project if requested to. If the
system supports a cross-compiler, `par' should be able to use it to create
binary archives for more than one architecture at once.








6. References

rpm -- RedHat Package Manager (http://www.redhat.com)

jar -- The Java Archive (http://java.sun.com)

OSD -- http://www.w3.org/TR/NOTE-OSD.html





7. TODO

a) Discuss OSD as a possible approach to represent dependencies, analyse
PROS and CONS.

b) Discuss tying filehandles with zip/tar archives as a separate module (and
probably a separate PDD), since the feature seems to be wanted for many
other uses apart from `par'.

c) Define standard arch/os/cpu/compiler names for binary archives, if that's
even possible at all...

d) Define (at least sketch) standard tree structures and names/types/formats
of files that contain meta-information about scripts/modules, both inside
and outside of the archive.

e) Define ways to integrate this process with Perl's module building tools,
like h2xs and MakeMaker, or whatever replaces them in Perl 6.




Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About