develooper Front page | perl.cpan.workers | Postings from August 2000

CPAN meeting minutes

Thread Next
From:
Adam Turoff
Date:
August 16, 2000 13:35
Subject:
CPAN meeting minutes
Message ID:
20000816163220.A27993@panix.com
Here's my first draft of the minutes of last month's CPAN meeting.
I have a (docbook-based) HTML version available that can be posted
somewhere appropriate.

Z.

* CPAN Categorization

 + Vaults of Parnassus

   The general opinion in the community at large about CPAN is that
   it is better than the Python community's "Vaults of Parnassus".

   Python doesn't have the same distinction between "script/program"
   and "module".  This is due to standard a python idiom that allows a
   module to be run as a program.  Therefore, it is easier to find both
   "modules" and "scripts" in the same archive, with one interface.

 + CPAN Scripts

   There are 22 top level categories on CPAN for modules.  Using this
   categorization for scripts would be appreciated.  Switching the
   breakdown from Module/Script:Category to Category:Module/Script may
   help.

 + ppt

   Some scripts (and modules) like ppt defy simple categorization.  ppt is
   a complete distribution, but the individual modules belong in different
   categories.

   ppt demonstrates that the categorization problem is hard (which is why
   there's a 'Master of Library Science' degree), and that the hierarchy
   will never be perfect.  Ideally, there should be 'sideways' links across
   categories.

 + Metadata

   Much more metadata is needed with CPAN modules.  This can be
   accomplished with OSD, PPM or some derived/enhanced format.

 + Use of OSD/PPM

   In order to spur adoption of a more metadata-rich format, CPAN should
   start refusing uploads if the OSD files are not included.  Such a
   solution is acceptable if and only if it is announced well in advance
   and phased in over a reasonable amount of time.

 + Navigation

   CPAN primary categorization is the by-author directory, which is not
   particularly navigable.  Some persistant URL interface is required, and
   the by-author structure serves this purpose, but perhaps a better
   directory structure is available.

   Maintaining a sane on-disk structure is important, as Tom and others use
   'ls' or ftp to navigate around CPAN.  This is quite similar to the *BSD
   Ports tree.
 
 + Searching

   Graham's search.cpan.org is an excellent new interface for browsing
   CPAN.  The possibility of replicating or mirroring this search interface
   was discussed.

   The categorization problem has an impact on searching CPAN.  Adding
   symlinks to individual modules from mulitiple categories may help the
   searching problem, but adding more keywording to the search database
   would help more.

 + Keywords

   CPAN should be extended to categorized modules by keyword.  Exactly
   what those keywords should be and how that list of keywords should grow
   is a separate discussion.

* Site Maintenance

 + CPAN Scalability

   CPAN is currently hovering around 700MB of storage, and is mirrored
   worldwide through rsync and FTP mirroring.  This model is adequate and a
   radical switch to something like napster or Coda/Intermezzo isn't
   absolutely necessary.  And Andreas finds Coda/Intermezzo to be broken
   presently.

   Currently, about 90% of all public sites mirror from funet.fi, and while
   funet may not remain the canonical master, one master site seems
   adequate.

   CPAN is averaging roughly 20 new uploads or fewer per day, 150 new
   uploads over the last 30 days according to Andreas.

 + Private mirroring

   Private mirrors of CPAN are not tallied, and Jarkko doesn't want to know
   about it.

 + Adding mirrors

   Jarkko wants to write simple step-by-step instructions for setting up a
   CPAN mirror, or possibly write a web form for setting up new mirrors.

   rsync service is available for mirroring, and that needs to be
   publicized better.

 + CPAN multiplexer
 
   The CPAN multiplexer on perl.com hasn't been operational for some time
   now.  

   Since the multiplexer was written as a simple CGI script, other
   multiplexing devices and services have become available.  These include
   Akamai's service and a Cisco MUX.

   Tom and Jon said they'd talk to the appropriate people at O'Reillynet to
   fix the problem.

 + PAUSE Backup?

   Perhaps it would be worthwhile to have a redundant PAUSE server in case
   Andreas' machine goes down.

   Then again, perhaps not.  No one seems to mind when the server goes down
   for four hours these days.

   The main PAUSE server dumps out its MySQL database hourly and produces
   1.6 MB files.  This is certainly rsync'able.

* Distribution

 + BSD Ports interface

   The (Free|Open|Net)BSD project maintain "ports" collections of software
   ported to each operating system.  This interface is both powerful and
   simple, and encourages using 'ls' and 'cd' to navigate through the
   categories and packages.
 
 + New package formats

   MakeMaker supports a 'make ppm' target along with 'make dist'.  This
   could easily be extended to support 'make dpkg' and 'make rpm' to build
   Debian and RPM packages.

   Using native package formats (such as ports, dpkg and rpm) will allow
   all Perl modules to be better integrated into the host operating
   system's software registry system, instead of relying on 'perllocal.pod'
   to be the canonical list of modules installed on a system.

 + CPAN::Site

   One possible solution to the various problems behind CPAN as we know it
   would be to allow and encourage private CPAN mirrors with their own
   additional features.  

   For example, this could be a private/personal CPAN that is
   layered on top of CPAN for distribution of local (proprietary?)
   modules inside a corporate network.  This may also take the form
   of carrying only those modules that have been blessed by an
   ad-hoc editor (e.g. mjd's private CPAN, ny.pm's QC'd CPAN, etc.)

   This allows the advantage of reusing the existing CPAN tools such as
   the CPAN.pm shell.

 + PAUSE

   The PAUSE scripts can be replicated to help build local CPAN
   repositories (e.g. the "Morgan Stanley CPAN").  Andreas says this won't
   be done until some minor security issues are resolved (e.g. removing
   "eval" from his code.)

   Perhaps some of these local PAUSE sites can feed back into the master
   PAUSE database.

 + PPM

   ActiveState offers binary ready-to-install binary packages of Perl
   modules.  Thus far, CPAN has shunned binary files.  Should CPAN be
   extended to offer PPM or PPM-style binaries?  Note that OSD can be used
   to point to multiple binary versions of a single module.

   Note that there may be licensing issues involved with replicating
   ActiveState's PPM repository.  One solution may be for ActiveState
   to offer multiple PPM servers worldwide that act as customization
   layers on top of a generic CPAN repository.

* Developer Issues

 + Integrate OSD/PPM

   Perl can be gently upgraded over the next few relases of Perl5, so that
   in about a year's time all new uploads will contain OSD/PPM files.

   Perl programmers will probably resist a push to include XML markup with
   their modules.  Make::Maker currently mines the data found in
   Makefile.PL and generates a ppm file; this interface needs to remain and
   be extended.

 + Bundles

   Bundles exist to solve a packaging problem, but they are not widely
   used.  The community needs to spend more time explaining and supporting
   the use of bundle files.

 + PAUSE, New modules

   There's an important piece of information that's not reaching the
   community: how to create and distribute new modules on CPAN.  The
   problem isn't uploading to PAUSE, but rather writing the first
   Makefile.PL.
   
   Simon Cozens recently wrote 'perlnewmod' to explain this.  This needs to
   be advertised better.
 
 + Auto-notification

   It would be quite handy to allow users sign up for an -announce style
   mailing list for new upload announcements of modules they care about,
   or possibly a new-uploads-daily digest message.

   use.perl.org already presents a new uploads listing on a frequent
   basis, but many users care about only two or three modules on CPAN.

   Similarly, users could register interest in a specific module, category
   or author and receive mail whenever "something interesting" happens.

 + Module/Distribution mapping problem

   A better mapping of module -> distributioin and distribution->module
   is required.  The problem is mostly solved, but the remaining issues are
   really nasty and begin with Make::Maker.

 + Perl Census?

   Should PAUSE be extended to act as a registry of all (or most) Perl
   developers?  Even those without modules on CPAN?

 + Abandoned Modules

   Currently, a module author can pro-actively hand over development and
   maintenance of an abandoned module.  What is the process for taking
   over an abandoned module once the original author disappears?

   This needs to take into account the fact that some authors may
   be on prolonged travel/vacations and cannot or will not check
   their mail to see their modules have bugs which need to be fixed.

 + Automatic Probing

   Modern Win32 operating systems can now self-probe and identify which
   components need to be upgraded.  To some degree, CPAN.pm can do this
   and identify which modules have been updated on CPAN since they were
   installed locally.

   Unfortunately, randomly upgrading modules may break existing code, and
   there is no consistent mechanism for identifying when a newer module is
   experimental and should not overwrite a stable installation.

   CPAN.pm (or something similar) should be able to maintain a registry of
   some sort to identify which modules and scripts can/should be upgraded.

   Note that this also deals with the outstanding issue of maintaining
   multiple versions of a single module concurrently.

* Developer Documentation

 + General Documentation problems

   CPAN should open up the job of documenting Perl and CPAN modules to
   every user.  Currently, the CPAN module list does a fairly good job
   advertising and promoting modules, while there is no similar interface
   to gather, advertise and promote documentation-only submissions.

 + OSD

   If there is some OSD-like mechanism for modules, then documentation can
   be tagged and tracked similarly.  Similar issues are involved in
   handling scripts.

 + Smaller docs

   There is a need for a larger amount of smaller pieces of documentation.
   These could take the form of annotations of existing documents, hints
   files, mini-HOWTOs or general notes.

   A tidbit on installing a specific module on Solaris is a good example of
   the scope here.

   use.perl.org and Perl FAQ Prime (perlfaq.com) may serve as good areas to
   handle or develop these pieces of documentation.

 + Document Annotation

   The PHP documentation has an area to comment on the documentation and
   comment on the commentary.  This is much like reinventing the Talmud.
   This system appears to work very well, similar to slashdot or book
   errata.

   This could be useful for bug reports, revisions, amendments.

* CPAN Organization

 + Module Homepages

   Each Perl module could have it's own "homepage" (Graham has long since
   implemented this with search.cpan.org).  It could contain user feedback, 
   rating systems (e.g. Amazon).

   Such a feature may be limited to those modules that produce OSD, as a
   way of encouraging adoption of OSD.

 + Problematic Submissions

   How should documentation-only uploads be categorized?  Where do modules
   go when they don't fit into any of the existing categories?

   Is this a problem that would be solved with better or multiple search
   engine interfaces?

 + Module lists

   Many of the problems with CPAN are caused by the fact that there's 
   only one, incomplete module list.  Perhaps multiple module lists will
   help solve these problems.

   "Namespace Pumpkings" may help here; the Apache, TK and XML namespaces
   have their own independantly maintained module lists.  Accepting that
   this is a good idea and extending it may be a way to incorporate
   multiple module lists.

 + CPAN API

   A better API into the "CPAN service" would be nice.  This should include
   version control, stability information (experimental, release, bugfix,
   etc.) and so forth.

   This API could be as simple as a Perl-ready version of the 03- file;
   Tom uses a mechanism like this currently.

 + CPAN Quality

   Is there too much crap on CPAN?  Does the module list steer people away
   from the "bad modules"?  Should we produce a user interface that steers
   people away from the crap and/or towards the better modules?

   Would user ranking of modules be useful?
 
 + "Blessed" Modules?

   Perhaps one solution to the Quality issue is to split CPAN into
   multiple areas.  The Hitchhiker's Guide to the Galaxy (hhgttg.org)
   is split into an anything-goes scratch area and an official edited
   area.  This isolates the quality controlled, tested modules from random
   uploads that may not be ready for general usage.  

 + Reviews

   More reviews of more modules are needed.  Perhaps they could be found
   on search.cpan.org or some other CPAN site.

* Namespaces, Versioning 

 + Namespace issues

   modules@perl.org serves as an area to discuss proposed module names.
   There have been some complaints that the maintainers of the Perl module
   namespace don't always come up with reasonable and intuitive names for
   modules, and that anyone trying to refute the decision of this group
   faces a losing five-against-one battle.

   It was also mentioned that modules@perl.org doesn't always respond to
   requests in a timely fashion.  Using autoreply would help this solve
   this issue.

   Concerns like this may be encouraging people to leave Perl and switch to
   Python.

 + Module versions

   Versioning of modules is important.  Being able to identify a specific
   module by version number solves part of that problem but creates
   others, since only one version of a module may be installed at once
   (or at least installing and using multiple versions of a single module
   is *very* tricky).

 + Module naming

   Flexibility is needed in implementing and versioning modules.  Once a
   module is released, changing its name is as politically correct as
   recinding a domain name by committee, especially since changing a
   module's name will break exisiting code.

   Module names should not mention how the code is implemented (e.g.
   Text::CSV_XS).

   General module naming guidelines are needed.

 + Namespacing

   Currently, CPAN namespaces use the same first-come-first-serve model as
   domain names.  Occasionally, this allows a developer to own a namespace
   even though they have written a poorer implementation of an interface.

   This problem also ignores the fact that some modules (Text::CSV,
   Scalar::Utils) are available in all-Perl and XS implementations.  In
   the case of Text::CSV, chosing the implementation is done by the module 
   user, while in the case of Scalar::Utils, the decision is made by the
   installer.

   Allowing multiple modules to use the same namespace may solve these
   issues.

 + Author-based Namespaces

   Perhaps appending the author's CPAN ID to a module name will better
   identify the specific version of a module to be used:  
   
   "use File::Parse::TIMB;"

   Perhaps using the existing version numbering mechanism will help if it
   can be extended to using a string, such as "#NI_S".

 + Impact on Perl Syntax

   The best solution, which we may not have seen yet, might require
   significant changes to the Perl language.  Thus, it may need to wait
   until Perl6.

 + Using Interfaces

   Kevin Lenzo pointed out that Modula 2 solves this problem nicely by
   using the 'interface' keyword.  That is, implementations aren't named,
   but interfaces are.  So, any package implementing a known interface is
   interchangeable with any other package implementing the same interface.

   Interfaces also open up the issue of public vs. private interfaces.
   Currently, all Perl modules expose public interfaces, since there is no
   (or cumbersome) data hiding available with Perl moudles.  A named
   interface could identify only the public portion of an interface that
   should be used or reimplmented.

   Some mechanism for versioning and standardizing interfaces would also
   be necessary.

 + Unique Identifiers

   A few of the issues that stem from uniquely identifying a module have
   already been solved.  Mozilla's XPCOM uses "IAD" to give the
   implementation of a component a unique identifier.  CORBA and COM (?)
   use a GSID for the same purpose.  
   
   These identifiers are intended to be globally unique.

 + Corporate namespaces

   Tim Bunce mentioned that Solaris' "kstat" command is now implemented in
   Perl.  Sun also wants to use and control the "Solaris::" namespace
   since it is an extension of their existing trademark on Solaris.  

   One possible solution to the corporate namespace problem would be to
   insure that any module sitting in an "owned" namespace not maintained
   by the namespace owner (e.g. "Solaris::*" modules not written by Sun)
   explicitly acknowledge the trademark owner.  
   
   Jon Orwant has promised to figure out the legal wording necessary.

 + Removing Modules

   The issue of corporate namespaces brings up the issue of removing
   modules from CPAN.  That is, if there is a corporate namespace such as
   OReilly (or O'Reilly), how is a module removed when it doesn't belong
   in that namespace?

   Misuse of a corporate or otherwise "owned" namespace is one reason for
   removing a module off of CPAN.  Other reasons may exist, such as
   removing unmaintained modules that can no longer work with modern Perls.

 + Cute Names
 
   Modules with names like "IMA::DBI", "Math::ematica" and "D'Oh" need to
   be addressed.  Should they be renamed to be more conformant with the
   moudle naming guidelines?  Should they be left back if/when CPAN splits
   into scratch/edited areas?

* Licensing

 + Additional Disclaimers

   Morgan Stanley adds a paragraph to the standard Artistic License that
   effectively states "if you use our module, you can never sue us."

 + License Identification

   Better identification if the license used with a module needs to be
   tracked.  This could be done upon upload to PAUSE.  That is, the module
   author can specify the license flavor used for the module.

   Once this is done, identifying which modules can be distributed on
   CDROM will help publishers respect a module author's individual
   licensing concerns.

* Schwern's Quality Assurance presentation

 + Malicious modules

   In order to prove a point, one module author wrote a module that
   printed out the error message "I am deleting all of your files"
   to prove a point: there are no security mechanisms for checking modules
   or installing modules.  
   
   This needs to be fixed.

 + cpan-testers

   The cpan-testers effort is great, but it has problems.  First,
   it is not automated.  Second, it is incomplete.

 + CPANTS: The CPAN Testing Service

   Schwern proposes instituting an automated series of quality tests.
   These tests are intended to identify a possible lack of quality, not
   the presence of quality in a module/script.

   Perfect identification of all lack-of-quality indicators is not the
   goal.  Achieving 80% accuracy during automated testing is OK;
   cpan-testers could be recast to handle the more difficult 20% of the
   problem.

 + Levels of quality

   CPANTS is designed to identify quality control problems in simple tiers
   of boolean tests.  After passing one tier of tests, a module can
   proceed to the next tier of tests.

   Each tier of testing is designed to identify a specific set of common
   problems.  

   Two tiers of simple boolean tests are currently proposed: "Veto tests"
   and "Boolean Kwalitee Tests".  

 + Veto tests
   
   The veto tests are intended to find compile errors, incomplete
   distributions and incompatibility with common Perls.

   This list is incomplete, but is representative of the kind of veto
   tests envisioned for a first-pass quality check.

   = Are README, INSTALL, Manifest and Makefile.PL files present?

   = Do the modules have tests?

   = Do the modules pass their own tests?

   = Does every .pm file compile (perl -c)?

   = Does it blow up because it's supposed to, and if so, does it blow up
   in the Makefile?

   = Does the distribution pass its tests on all stable and popular Perls?

   = Does the distribution pass its tests on all stable and popular
   configurations?  (64bit, malloc, sfio, etc.)

   = Does the distribution pass its tests on all popular and sane
   architectures?   (Linux, Solaris, Win32, etc.)

   = Does the distribution play well with others?  (e.g. are there
   security violations in Makefile.PL)


 + Boolean Kwalitee tests

   The kwalitee tests are designed to examine the code in an
   automated fashion to sound alarms upon the presence of questionable
   results.

   Failing a kwalitee test is non-fatal, it just provides a signal
   (a "red-flag") that there may be problems with a module.  Upon
   failing a test, the author will be notified.  The author can
   then fix the problem (e.g.  unintentional overuse of '$&') or
   provide an explanation (e.g. "The code needs to be this way.")
   This explanation may be an annotation of the module's OSD file.

   Upon failing a test, once an author's explanation is provided, that
   test will cease to be run on future versions of a module, and the
   author's explanation will serve to document why a specific kwalitee
   test isn't run.

   = Use Devel::Coverage to determine if at least N% of the code 
   is tested with the module's test suite.

   = Use B::Fathom (or something better) to see if the code has a 
   complexity rating of less than N.

   = Is documentation present?

   = Does the code look "yucky"?  That is, does it exhibit any signatures
   of Ineffective Perl Programming?  (Overuse of $_, %_ and @_).

   = Does the code use problematic features such as fork, sig, alarm?

   = Does the code use experimental features?

   = Does the code load in less than N seconds?

   = Is a ChangeLog present?

  + Human tests
  
    Some tests can only be done by people looking at the code and
	distribution.  Again, these are red-flag warnings, not veto tests.

	= How complete is the documentation for this module?

	= How readable is the documentation for this module?

	= How up-to-date is the documentation for this module?

	= Is the interface "overdone"?

	= Is there a book available for this module?

	= Is this module backwards compatible with previous releases?

 + CPAN integration

   These tests should not be triggered on a commit into CPAN.  Modules
   should be useable prior to being committed to CPAN.  That is, some of
   these tests should be run before a module is accepted into CPAN.

 + Test Results

   There are issues to be resolved with keeping a permanent record of all
   test results for a module.  Keeping a record of every submission that
   failed will probably be counterproductive and discourage module
   auhtors from submitting modules to CPAN.

 + Automated Testing

   Kurt Starsinic's Perl Labs would probably be the best vehicle for
   getting CPANTS off the ground.

   Note that this process is a distributed computing effort like
   SETI@Home, but much more dangerous since random, unchecked code is
   being run against many machines.  Being able to run each set of tests
   in a secure sandbox or running it on a throw-away, restorable
   configuration will be important.

   Strange customer configurations should be allowed into the testing
   framework.

 + Testing Requirement

   Testing like this will be an acceptance test for Perl6.  Getting this
   ready for CPAN before Perl6 requires it will help improve CPAN earlier.

 + Automate, Automate, Automate

   Everything possible that can be automated in this process should be
   automated and distributable.  CPANTS should spin off as many pieces as
   possible for distribution and automation.

 + Perl Metrics

    CPANTS can be extended to pick up as many metrics as possible about
	the Perl code on CPAN.  These metrics need to be publicized and
	extended.

	= Are deprecated features still being used?

	= How much code uses experimental features?

	= How "complex" is the average piece of Perl code?

	= Of the features slated for removal in Perl6 (e.g. formats, $#), how
	commonly are they used?

	= How common are object-oriented modules?  Non-object-oriented
	modules?

 + Trusted groups

   Many groups (e.g. ny.pm, Yahoo, etc.) can join in and target a
   handful of modules for manual testing.  These "trusted groups"
   can then publish their blessings for their favorite modules.

 + Karma

   CPAN/CPANTS could adopt a Slashdot/Advogato style karma rating to
   reward module authors and reviewers.

 + Automated mailing lists

   CPANTS can maintain two -announce style lists: one for nagging module
   authors/users to fix code, and one "kudos" list for recognizing module
   authors and reviewers who make contributions to CPAN/CPANTS.

 + Requirements

   CPANTS needs an automated testing framework, possibly something like
   bonsai/tinderbox from Mozilla.  It also might need an area on
   sourceforge for development of the testing framework.

   CPANTS also needs a wide variety of configurations, such as
   those offered by Perl Labs.  It also needs contacts throughout
   the Perl community for more specific and obscure yet important
   system configurations.

* Merijn Broeren's Report on the CPAN BOF

  What follows is a report on the issues raised at the CPAN BOF at TPC5.
  Many of these issues have been raised elsewhere in this document.  

  Some of the points mentioned here come from the CPAN BOF, others come
  from our discussion of points raised by the BOF.

 + One page per module on use.perl.org (or possibly use.cpan.org)

   This already exists on search.cpan.org: 
   http://search.cpan.org/dist?ModuleName

 + Module comparisons

   Comparisons, reviews and surveys of similar modules would be useful for
   many users.  This could be done on the web, and might be made available
   through some enhancement to CPAN.pm.

 + User education

   There are some outstanding questions about CPAN that have been
   answered in many places, but the information isn't reaching CPAN users.

 + Namespace Pumpkings / Propogandist

   Namespaces like Apache and TK have independantly maintained module
   lists and namespace management.  This should be extended into other
   commonly used and large namespaces.

 + More Metrics

   In order to support laziness and impatience of Perl users, CPAN should
   identify some common metrics about modules to show some standard of
   quality.

 + Upload / Download counts

   Since CPAN is distributed, getting accurate download statistics is
   difficult, but something is better than nothing.  

   Since search.cpan.org is one centralized repository, examining the
   statistics from CPAN searches may be worthwhile.

   The mirror setup can be simplified to provide standard logging that is
   more easily harmonized with other CPAN mirrors.

 + More docs

   Module authors need to include more usage examples of their modules.
   More documentation in general would also be well received.
 
 + Supersets / Personal SDKs

   Some users want to mirror CPAN and add their own modules on top of it.

   Some users want to create their own personal groups of modules for
   distribution as a complete SDK.  

 + Module Lint

   A mechanism for identifying common problems with modules would be
   appreciated.

 + Revisit CTAN, Debian archives

   There are some interesting features in other software archives that
   have been developed since CPAN was first released.

 + User Census for which modules are installed

   A simple way for a user to submit the details of which (non-core) CPAN
   modules they have installed will help identify how popular specific
   modules are.  This should be an opt-in process.

   NB: OpenBSD asks users to send their dmesg output to dmesg@OpenBSD.org
   to track what kind of hardware OpenBSD has been installed on.

 + Monger Involvement?

   There's an open quesiton on how involved the Perl Mongers should be in
   improving CPAN.  Individual Perl Monger groups are an invaluable source
   of volunteers for some of the more labor intensive aspects of improving
   CPAN.

 + Module patch repository?

   There is no consistent way of patching a module on CPAN and making that
   patch available for general use.  The development model on CPAN today
   forces users to wait for the next release of a module (which may or may
   not fix the bug in question).  This is especially problematic for
   unsupported modules.

 + Module/Perl version synchronization

   Currently, there's no easy way to indentify what minimum version
   of Perl is required for a specific module.  This information may be 
   mentioned in a module's documentation, in the sources, or absent
   entirely.  Making this information available on CPAN (especially prior
   to download) would be extremely helpful.

 + Upgrade OSD separately?

   Upgrading the metadata associated with a module is an open issue.  It
   is undetermined at this time whether the metadata information for a
   module should be included with that module, or if it should be external
   to that module.

   After a module is released, the metadata information for that module
   may change (e.g. versions of Perl known to work with this module,
   binary distributions of this module, etc.).  For these reasons,
   updating the metadata separately (and externally) may need to be
   addressed.

 + Save changes for the next CPAN?

   Many of the changes proposed here require a significant amount of
   effort and could conceiveably be held off until CPAN was redesigned and
   relaunched.

   The consensus at the BOF (and the CPAN meeting) was that these changes
   should be implemented on today's CPAN, and not wait for a full CPAN
   redesign.

 + Decommissioned modules?

   How are old modules identified and/or removed from CPAN?  Some modules
   are old and unmaintained, but still useful.  Should they be removed,
   annotated as "old and unmaintained" or moved to a separate area of
   CPAN?

 + Forward/Reverse dependancies

   Knowing which modules a specific module requires is important.
   Similarly, knowing which modules require this module is equally
   important.

   This involves both knowing modules by name and specific versions
   of those modules.  Having this information will help administrators
   identify when a new module requires an upgrade/downgrade to an
   installed module, and when that update will break existing code.

 + Better descriptions

   Both short and long module descriptions are necessary.  A better
   hierarchy of module classifications would be welcome, as would allowing
   a single module fit into multiple categories.

   This issue revolves around creating better metadata for modules, and
   specifically identifying how the metadata can be improved.

 + CPAN is a library

   The Library of Congress is a library, and has solved many of
   these issues already.  Much of the discussion of CPAN metadata
   revolves around problems that librarians have solved already.

 + Better scripts repository

   Increasing the number of scripts and classification of scripts would be
   nice.

 + Better README

   Currenlty, the README is an unstructured text file.  Adding structure
   to it, possibly in POD, would help.

 + Versioning

   A clearer mechanism of identifying what module versions are found in a
   larger package/distribution would help.  Right now, the distinction is
   available, but it is often unclear.

 + Inspection

   Currently, the only real way to examine a module is to download it.
   While search.cpan.org makes the docs available, it would be much better
   if all of the relevant information about a module were easily found and
   machine readable, so that a complete summary were available on the web
   or through CPAN.pm.

* Fixing Today's CPAN

  What follows is a list of outstanding issues identified above that need
  to be addressed in CPAN today, or can be implemented today to improve
  CPAN.

 + Module Versioning

   The issue of allowing multiple versions of a module to be installed at
   once is a big issue that involves changes to the Perl language.  All
   discussion of versioning (including namespaces, interfaces, multiple
   implementations of a single module) are for Larry to think about.

 + Digital Signatures

   Tracking MD5 signatures (or something better) should be implemented
   soon.  Other software libraries do this today to help insure that the
   software downloaded hasn't been tampered with.

 + Globalization / Internationalization / Localization

   There are CPAN and PAUSE issues involved in offering improved
   i18n support for CPAN.

 + Saving CPAN.pm options

   CPAN.pm can already save some options, but the number of things that
   can be made optional should expand.

 + SourceForge

   SourceForge has solved some of the problems CPAN is facing today.
   Adopting some of their solutions can help improve CPAN.

 + Structure/Politics of PAUSE

   PAUSE should expand to address some of the social issues that exist
   today, such as supporting multiple implementations of one module, etc.

 + Quality Control

   Differentiating CPAN into different quality levels may help some of the
   issues involved in finding quality modules.

   In this manner, CPAN as we know it would be a first-level staging area
   (APAN).  Once some rudimentary automated quality checking is done, a
   distribution is made available on some intermediate staging area
   (BPAN).  After a module has been reviewed by an editor, it can be made
   available on a quality-controlled area (the new CPAN).

 + Easy inclusion of modules

   CPAN, the "anything-goes distributed-repository" should remain.  

   If CPAN morphs into a multi-tier quality controlled framework,
   then the scratch area (APAN) serves the purpose of an anything-goes
   mirrored repository.  Rejecting a crufty or incomplete module
   does not prevent it from being mirrored.  Such modules are still
   propogated (in APAN) as they are today.

 + Derived views of CPAN

   Rather than re-centralizing CPAN, the next CPAN should allow and
   encourage users and organizations to create their own layers on top of
   CPAN.  This could be used for adding local, private modules into a
   private CPAN mirror, or offering a privately edited list of modules
   that one person or group has "blessed".

   For example, this might take the form of "my.cpan.org/mjd" or
   "cpan.plover.com" for Mark-Jason Dominus' view of the Best of CPAN.

 + APAN/BPAN/CPAN

   The names APAN, BPAN and CPAN are functional distinctions to the
   "multiple levels of CPAN" and are not intended to be their final
   names.

   APAN serves as the initial "anything goes" staging area.

   BPAN serves as the area for blessed, well named modules that have gone
   through some rudimentary quality checking.

   CPAN serves as the area for fully QC'd modules, and acts as a good 
   baseline for derived (personal/private) views of CPAN.

   From here forward, CPAN shall mean either the unified repository as we
   know it today, or a hypothetical multi-tiered APAN/BPAN/CPAN.

 + Save CPAN's perception

   CPAN is very well respected both within the free software
   community and within the Perl community.

   Any changes to CPAN must maintain that high level of respect outside
   the Perl community and level of contribution within the Perl community.

 + Searchability

   CPAN should remain searchable.  If at all possible the searchability
   should improve (possibly through keyword searching).

   Should CPAN split into multiple tiers, each tier should be
   individually searchable, as well as searchable together with
   data from the other tiers.

 + Redistribute CPAN code

   One goal for improving CPAN may be to package all of the PAUSE and CPAN
   programs used for maintaining CPAN for use with other software
   libraries like CTAN or the Vaults of Parnassus.

 + Improve the Module List

   Currently, the module list contains about 1/4 of all modules on CPAN.
   Adding a new module onto the module list can sometimes be a political
   issue.

   The reason why the module list is limited in size is largely a
   historical accident.  There is no technical reason why the module list
   cannot be expanded and/or split across multiple files.

   It would also help if the by-modules directory were comprehensive.
   Currently, it is not.

 + Alternative module lists

   The master module list should merge in the XML, Apache and TK module
   lists.  

 + Strict Formatting

   The strict formatting of the module list does not necessarily need to
   be maintained, if a better format (or Perl-readable format) is
   available.

 + Perl5/Perl6

   As Perl6 is developed, some mechanism for identifying both Perl5 and
   Perl6 modules in CPAN will be necessary.  

   Modules specific to Perl5 or Perl6 should both be on the same CPAN.

 + FTP Interface

   Should the FTP interface into CPAN be deprecated?

   Currently, there is a less problematic replacement:
   http://~~~~/get?module=Foo::Bar

   The get?module= request may not be as complete as the FTP interface,
   but it can be extended and doesn't have the problems some FTP
   servers/firewalls have.

 + Persistent URLs

   One consistent format needs to be chosen to identify a Perl module in
   any CPAN mirror.  Currently, any module can be found through the
   by-author directory structure.  This may or may not need to be
   revisited, but there should be one canonical URL for any module.

 + Winnepeg Auto-Installer

   There is a little-known utility offered by the University of Winnipeg
   that will auto-install Perl modules.  This should be publicized and
   extended, and perhaps integrated into CPAN.

 + Binary Distributions

   If binary distributions are available through CPAN, a CPAN overlay or
   some similar mechanism, then there needs to be a way to identify the
   underlying source distribution for a given binary distribution.

   This might be done already through some OSD-like metadata.

 + modules@perl.org

   The modules list needs to expand to add more people and more moderators
   into the discussion.  

   Perhaps a broader, more distributed list is required.

 + backpan

   Ask maintains a backup of CPAN that contains all versions of a module
   posted since backpan was created.  This should be publicized and
   possibly expanded.

 + CPAN API

   The mechanism for finding and downloading CPAN modules is reasonably
   ad-hoc at the moment.  Clarifying this API, possibly using SOAP or some
   other XML format would make it easier for more CPAN interfaces to be
   created.

   CPAN.pm already offers some of these features.  This is mostly a
   request to improve and/or better advertise CPAN.pm, or otherwise 
   offer a richer server-side API into searching and downloading CPAN for
   modules and other distributions.

 + Metadata Distribution

   The entire CPAN repository is hovering around 1GB today.  Once richer
   metadata is available for CPAN modules, this metadata can be replicated
   widely, perhaps to a user's local machine.  This would offer a
   space-efficient mechanism for browsing CPAN locally without constantly
   resynchronizing the entire repository.  

   This mechanism is very similar to the *BSD Ports collection, which
   encodes a few GB of data into a few MB of metadata.  The metadata is
   used to identify dependencies and install all required prerequisites
   when installing a single package.

 + More HTML interfaces

   Simplifying or promoting the underlying data used to build a CPAN
   interface would allow more people to try and create better interfaces
   into CPAN.  This would be a good thing.

 + Trademarks

   CPAN (and CPAN interfaces) need to do a better job of acknowledging
   trademarks.  This is especially necessary if CPAN moves to acknowledge
   corporate namespaces.

 + Licensing

   In order to simplify redistribution, more rigid tiers of CPAN
   may that require a module use a standard open source license
   (GPL, Artistic, etc.) instead of a custom license that requires
   users to examine the license before use.

 + Liability

   If CPAN mutates into a multi-tiered, quality controlled repository, the
   liability issues with that quality control will probably need to be
   expressly disclaimed.

 + Mirror Setup

   The setup and policy of creating a new CPAN mirror can be simplified.  

   This may also include generating standardized logs that can be forwarded
   to a central repository for producing more complete CPAN download
   statistics.

 + Module Install

   Module installation can be improved to allow a user to register their
   use of a module, and possbily even sign up for a -announce (or even
   -discuss) style mailing list about that module.

 + Module bug report repository

   Today, every module needs to maintain its own bug tracking system.

   Extending or duplicating Perl's bug tracking system for individual
   modules would help track down and hopefully fix bugs in CPAN modules.

 + Integration with SourceForge?

   Many modules are already on SourceForge or would benefit from migrating
   to SourceForge.  A better integration between SourceForge and CPAN
   would help module users.

 + Perl Census

   There are many reasons why a Perl Census would be helpful.  This would
   necessarily be an opt-in effort.

   Techniques for conducting this census include 

   = census.pl: a Perl program for examining a Perl installation and
   sending results to a central repository.  This could be completely
   anonymous, identify the organization or completely identify the
   machine/individual.

   = mail perllocal.pod: This is a low impact, simple way of seeing what
   modules are installed on any particular machine.

   = suppress non-CPAN modules: As a security measure, tally only those
   modules that are available on CPAN; ignore any private modules.

   = (uname -a ; perl -V) | mail: collect statistics on which platforms 
   are popular, and which build options are popular.

* Action Items

 + use.cpan.org - Elaine Ashton

 + RFC on namespaces - Kevin Lenzo

   Kevin will focus on the issues he's seen with the Festival namespace

 + Naming Guidelines - Jon Orwant

 + Trademark Issues - Jon Orwant

   This will focus on issues revolving around Sun owning the Solaris::*
   and O'Reilly owning the OReilly::* / O'Reilly::* namespaces.

   A policy on removing a module from an owned namespace may come out of
   this.

 + Versioning - Larry Wall

   Many of the issues around implementing namespaces and versions 

 + Discussion of COM's GSIDs - Gurusamy Sarathy

 + OSD / Cryptographic Signatures - Andreas Koenig and Graham Barr

 + Module Security Issues - Merijn Broeren

 + Structuring master/subsidiary indexes, Global Distribution - Merijn Broeren

 + Module reviews and comparisions - Adam Turoff

 + cpan-workers mailing list - Ask Bjorn Hansen

 + Fixing the CPAN Multiplexer - Tom Christiansen

 + Cleanup the CPAN Backbone - Jarkko Hietaniemi
 
   This includes redistributing the programs which maintain CPAN as well
   as simplifying the mirroring policy.

 + CPANTS - Michael Schwern

   CPANTS is the CPAN testing service

 + xx.cpan.org -> Ask Bjorn Hansen

   This involves setting up country code aliases for local cpan.org
   mirrors (us, ca, uk, etc.).

   This also may involve some measure of load balancing and round-robin
   DNS.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About