Front page | perl.perl6.internals |
Postings from January 2001
This is PDD #1--a high-level overview of the perl system
Thread Next
From:
Dan Sugalski
Date:
January 3, 2001 12:50
Subject:
This is PDD #1--a high-level overview of the perl system
Message ID:
5.0.2.1.0.20010103154643.01d5eb40@24.8.96.48
Here's PDD #1, the first of the internals perl documents. (Bcc'd to the
RFC librarian, so he doesn't get a zillion replies)
----Cut here----
=head1 TITLE
A high-level overview of the perl system
=head1 VERSION
=head2 CURRENT
Maintainer: Dan Sugalski
Class: Meta
PDD Number: 1
Version: 1
Status: Developing
Last Modified: 02 January 2001
PDD Format: 1
Language: English
=head2 HISTORY
None--this is the first version
=head1 CHANGES
None. (Yet...)
=head1 ABSTRACT
This PDD provides a high-level overview of the perl system.
=head1 DESCRIPTION
=head2 Major components
The perl system generally looks like this:
+----------------------------------------------------+
| Embedding App |
+----------+------------+-------------+--------------+
| | | | |
| parser <-> compiler <-> optimizer <-> interpreter |
| | | | |
+----------+------------+-------------+--------------+
| Extensions to perl |
+----------------------------------------------------+
=item Parser
The parser takes source code of some sort (presumably perl source, but
we're not picky--if you want to write a parser module that takes C,
Python, or klingon that's OK with us) and creates a syntax tree of
that source.
The parser module is designed to be extended both with perl and
compiled languages, and much of the parser is written in perl. (This
is the plan, at least) Generally there will be one parser, though
there's no reason that there can't be multiple independent
parsers.
=item Bytecode compiler
The bytecode compiler module takes a syntax tree from the parser and
emits an unoptimized stream of bytecode. This code is suitable for
passing straight to the interpreter, though it is probably not going
to be very fast.
=item Optimizer
The optimizer module takes the bytecode stream from the compiler and
optionally the syntax tree the bytecode was generated from, and
optimizes the bytecode.
=item Interpreter
The interpreter module takes the bytecode stream from either the
optimizer or the bytecode compiler and executes it. There must always
be at least one interpreter module available for any program that can
handle all of perl, since it's required for use statements and BEGIN
blocks.
While there must be at least one interpreter, there may be multiple
interpreter modules linked into an executable. This would be the case,
for example, for programs that produced Java bytecode, where one of
the interpreter modules would take the bytecode stream and spit out
java bytecode instead of interpreting it.
=head2 Independent subsystems
Perl also has a number of subsystems that are independent of any
single module.
=item PerlIO subsystem
The PerlIO subsystem provides source- and platform-independent
asynchronous I/O to perl. With this, perl 6 is independent of C's
stdio system. (And good riddance--it sucks) How this maps to an OS'
underlying I/O code is not generally perl's concern, and a platform
isn't obligated to provide asynchronous I/O.
Additionally, the PerlIO subsystem allows a program to push filters
onto an input stream if necessary, to manipulate the data before it is
presented to a perl program.
=item Regex engine
The regular expression engine's somewhat decoupled from the guts of
perl. Its job is to turn regexes into objects, and apply those regex
objects to strings.
=head2 API levels
=item Embedding
The embedding API is the set of calls exported to the embedding
application. This is a small, simple set of calls, requiring minimum
effort to use.
The goal is to provide an interface that a competent programmer who is
uninterested in perl can use to provide access to a perl interpreter
within another application with very little programming or
intellectual effort. Generally it should take less than thirty minutes
for a simple interface, though more complete integration will take
longer.
Backwards binary compatibility at this level is guaranteed across the
life of perl 6.
=item Extensions
The extension API is the set of calls exported to perl
extensions. They provide access to most of the things an exension
needs to do, while hiding the implementation details. (So that, for
example, we can change the way scalars are stored without having to
rewrite, or even recompile, an extension)
Binary compatibility is a serious goal, though it may be broken if
absolutely necessary.
=item Guts
The guts-level APIs are the routines used within a component. These
aren't guaranteed to be stable, and shouldn't be used outside a
component. (For example, an extension to the interpreter shouldn't
call any of the parser's internal routines)
No binary compatibility is guaranteed, and routines here may be
changed without notice.
=head1 VARIATIONS ON A THEME
One of the explicit goals of perl 6 is to generate Java bytecode and
.NET code, as well as to run on small devices such as the Palm. The
modular nature of perl 6 makes this reasonably straightforward.
=item Perl for small platforms
For small platforms, the parser, compiler, and optimizer modules are
replaced with a small bytecode loader module which reads in perl
bytecode and passes it to the interpreter for execution. No string
eval, do, use, or require is available, though loading of precompiled
modules via do, use, or require may be supported.
=item Bytecode compilation
One straightforward use of modular perl is to precompile perl source
into bytecode and save it for later use. This is easily done by having
a second interpreter module. The standard perl interpreter is used
during compilation to evaluate BEGIN blocks and suchlike things, but a
simple freeze-to-disk module is used when mainline execution
begins. Then, rather than executing the bytecode, it gets frozen to
disk for later loading.
=item Perl in, Java (or whatever) out
This is a variant of the bytecode compilation. Instead of freezing the
bytecode to disk, it's instead translated to something else. That
something could be Java bytecode or .NET code, or an executable of
some sort. Perl could also be a front end to other modular compilers
such as gcc or Compaq's GEM compiler system.
=item Standalone pieces
Each piece of perl can, with enough support hidden away (in the form
of an interpreter for the parsing module, for example), stand on its
own. This means it's feasable to have separate executables that parse
perl to a syntax tree, turn a syntax tree into bytecode, optimize the
bytecode, and execute the bytecode.
This allows us to develop pieces independently--the first version of
the parser, for example, can be written mainly in perl 5 using an
embedded interpreter. It also means we can have a standalone optimizer
which can spend a lot of time grovelling over bytecode, far more than
you might want to devote to optimizing one-liners or code that'll run
only once or twice.
=item The perl assembler
The parser and bytecode compiler can be replaced with a unit that will
eat a textual representation of the bytecode--essentially a perl
assembler. This can be useful in a number of ways, allowing programs
to emit perl bytecode without having to know the gory details of the
binary interface, or in fact having perl immediately available at
all. (It also means we can cobble up real perl programs without having
a full parser built yet, though that's more an issue of initial
implementation than anything else)
Thread Next
-
This is PDD #1--a high-level overview of the perl system
by Dan Sugalski