[DRAFT PPD] External Data Interfaces

Front page | perl.perl6.internals | Postings from August 2002
[DRAFT PPD] External Data Interfaces

Thread Next
From:
Brent Dax
Date:
August 17, 2002 15:44
Subject:
[DRAFT PPD] External Data Interfaces
Message ID:
00b201c24641$9c7a3b30$6501a8c0@deepblue
The POD below my sig is a proposed PDD on external data interfaces, that
is, the way embedders and extenders will access Parrot's data types.  It
covers Strings, Buffers, and PMCs, as well as a few related functions.

Let me know what you think.

--Brent Dax <brentdax@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

He who fights and runs away wasted valuable running time with the
fighting.


=head1 TITLE

External Data Interfaces

=head1 VERSION

1.0

=head2 CURRENT

    Maintainer: Brent Dax <brentdax@cpan.org>
    Class: Internals
    PDD Number: TBD
    Version: 1.0
    Status: Proposed
    Last Modified: 13 August 2002
    PDD Format: 1
    Language: English

=head2 HISTORY

=over 4

=item version 1

None. First version

=back

=head1 CHANGES

=over 4

=item Version 1.0

None. First version

=back

=head1 ABSTRACT

This PDD describes the external interfaces to Parrot data structures,
such as PMCs and Strings.  These interfaces are shared by the embedding
and extending systems.

=head1 DESCRIPTION

One of the major flaws of Perl 5 was that the extension interfaces were,
for lack of a better term, "raw".  The same interfaces were used by
extenders and core developers; this necessitated much gnashing of teeth
when a function used by extenders was no longer needed or proved
insufficient for a task--and sweeping changes were next to impossible.

One of the intents of Parrot is to provide much cleaner extension
interfaces.  Most other languages in Perl's class have clean extension
interfaces, where the internal functions aren't used by extenders and
the external functions aren't used by internals developers.  This PDD
describes the parts of the overall embedding/extending interface related
to user-level data; these are defined separately from embedding and
extending interfaces because they are shared by both.

"User-level data" is defined to include PMCs, Strings, and Buffers.

The design of the external data interfaces has two major objectives:

=over 4

=item 1.
To be small and simple.

=item 2.
To be complete.

=back

Obviously, these two goals conflict.  For this reason, there isn't much
redundancy in the interfaces.  For example, all keyed PMC functions
accept only PMCs as sources, indices, and destinations.

=head1 IMPLEMENTATION

=head2 Strings

Parrot-level C<String>s are to be represented by the type
C<Parrot_String>.  This type is defined to be a pointer to a C<struct
parrot_string_t>.

The functions for creating and manipulating C<Parrot_String>s are listed
below.

=over 4

=item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes,
Parrot_Int len, Parrot_String enc)>

Allocates a Parrot_String and sets it to the first C<len> bytes of
C<bytes>.  C<enc> is the name of the encoding to use (e.g. "ASCII",
"UTF-8", "Shift-JIS"); if a case-insensitive match of this name doesn't
result in an encoding name that Parrot knows about, or if NULL is passed
as the encoding, the platform's default encoding is assumed.[1]  Values
of NULL and 0 can be passed in for C<bytes> and C<len> if the user
desires an empty string.

Note that it is rarely a good idea to not specify the encoding if you're
using C<bytes> and C<len>.

=item C<Parrot_String Parrot_string_copy(Parrot_Interp, Parrot_String
dest, Parrot_String src)>

Sets C<lhs> to C<rhs> and returns C<dest>.  If C<dest> is NULL, a new
Parrot_String is allocated, operated on and returned.  If C<dest> and
C<src> are the same, this is a noop.  This may or may not be a
copy-on-write set; the embedder should not care.

B<XXX> Is this a good policy?

=item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>

Sets C<dest> to the first C<len> bytes of C<bytes> and returns C<dest>.
C<enc> is taken to be the encoding of C<bytes>; the Parrot_String will
retain its original encoding.  (Call C<Parrot_string_transcode> on the
Parrot_String first if you want to retain C<enc>.)

=item C<Parrot_String Parrot_string_encoding(Parrot_Interp,
Parrot_String str)>

Returns the encoding of C<str> as a Parrot_String.

=item C<void Parrot_string_transcode(Parrot_Interp, Parrot_String str,
Parrot_String enc)>

Transcode C<str> to C<enc>.  If C<enc> isn't recognized as a valid
encoding name by a case-insensitive match, or if it is NULL, the default
encoding is used.

=item C<Parrot_String Parrot_string_concat(Parrot_Interp, Parrot_String
dest, Parrot_String lhs, Parrot_String rhs)>

Set C<dest> to the concatenation of C<lhs> and C<rhs> and return the
value of C<dest>.  If C<dest> is NULL, a new Parrot_String is allocated,
operated on and returned.  C<dest>'s value may be the same as either or
both of C<lhs> and C<rhs>.

=item C<Parrot_String Parrot_string_chop(Parrot_Interp, Parrot_String
dest, Parrot_String lhs, Parrot_Int len)>

Copy C<lhs> to C<dest> and remove the last C<len> characters from it,
returning C<dest>.  If C<dest> is NULL, a new Parrot_String is
allocated, operated on and returned.

=item C<Parrot_UInt Parrot_string_length(Parrot_Interp, Parrot_String
str)>

Returns the length of C<str> in characters.  Note that this is
"characters", not "bytes"; the string's encoding defines what
"character" means.

=item C<Parrot_UInt Parrot_string_ord(Parrot_Interp, Parrot_String str,
Parrot_UInt index)>

Returns the value of the character at C<index> in C<str>.  Note that
this is "character", not "byte"; the string's encoding defines what
"character" means.

=item C<Parrot_String Parrot_string_substr(Parrot_Interp, Parrot_String
dest, Parrot_String str, Parrot_UInt index, Parrot_UInt len)>

Sets C<dest> to the substring of C<str> starting at character C<index>
and continuing for C<len> characters and returns C<dest>.  Note that
this is "characters", not "bytes"; the string's encoding defines what
"character" means.  If C<dest> is NULL, a new Parrot_String is
allocated, operated on and returned.

=item C<void Parrot_string_replace(Parrot_Interp, Parrot_String str,
Parrot_UInt index, Parrot_UInt len, Parrot_String rep)>

Replaces the substring of C<str> starting at character C<index> and
continuing for C<len> characters with the value of C<rep>.  Note that
this is "characters", not "bytes"; the string's encoding defines what
"character" means.  C<rep> need not be the same length as the substring
being replaced.

=item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char*
cstr)>

Creates a Parrot_String from the given C string.  Assumes the native
encoding.

=item C<char* Parrot_string_to_cstr(Parrot_Interp, Parrot_String str)>

Creates a null-terminated C string from the given Parrot_String.  If
necessary, transcodes to the native encoding.

Use of this function is discouraged for several reasons--information can
be lost in the transcoding and null characters in the string can cause
problems.  However, this function is sometimes necessary, so it's
included.

The storage for the C string is created with C<Parrot_alloc()> and must
be freed with C<Parrot_free()>.

=back

=head2 Buffers

Parrot-level C<Buffer>s are to be represented by the type
C<Parrot_Buffer>.  This is defined to be a pointer to a C<struct
parrot_buffer_t>.

The functions for creating and manipulating C<Parrot_Buffer>s are listed
below.

=over 4

=item C<Parrot_Buffer Parrot_buffer_new(Parrot_Interp, Parrot_UInt
size)>

Allocates a new C<Parrot_Buffer> with C<size> bytes of memory in it.

=item C<void Parrot_buffer_resize(Parrot_Interp, Parrot_Buffer buf,
Parrot_UInt newsize)>

Allocates C<newsize> bytes of memory, copies the contents of C<buf> to
it, and places the new memory into C<buf>.

=item C<Parrot_Buffer Parrot_buffer_copy(Parrot_Interp, Parrot_Buffer
dest, Parrot_Buffer src)>

Copies the contents of C<src> into C<dest>, resizing C<dest> if
necessary, and returns C<dest>.  If C<dest> is NULL, a new Parrot_Buffer
is allocated, operated on and returned.

=item C<Parrot_UInt Parrot_buffer_size(Parrot_Interp, Parrot_Buffer
buf)>

Returns the size of the contents of C<buf>.

=item C<void* Parrot_buffer_contents(Parrot_Interp, Parrot_Buffer buf)>

Returns a pointer to the contents of C<buf>.  This pointer can be used
to directly manipulate C<buf>'s contents.

B<Warning>: Make sure to block the garbage collector before calling this
function!  Otherwise, the pointer may become invalid, resulting in
badness ranging from losing data to core dumps.

B<Warning>: Make sure that this pointer doesn't last beyond when garbage
collection is unblocked!

=back

=head2 PMCs

Parrot-level C<PMC>s are to be represented by the type C<Parrot_PMC>.
This is defined to be a pointer to a C<struct parrot_pmc_t>.

The functions for creating and manipulating C<Parrot_PMC>s are listed
below.

=over 4

=item C<Parrot_PMC Parrot_pmc_new(Parrot_Interp, Parrot_String type)>

Creates a new Parrot_PMC of the type C<type>.  If C<type> is not a
case-insensitive match of any type already registered with Parrot, this
function will throw an exception.

=item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, Parrot_VTable
vtable)>

Creates a new Parrot_PMC using C<vtable>.  This can be used for
"private" PMC types.

B<XXX> Is this a good idea or not?

=item C<Parrot_Int Parrot_pmc_get_integer(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_integer() >>.

=item C<Parrot_Float Parrot_pmc_get_number(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_number() >>.

=item C<Parrot_String Parrot_pmc_get_string(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_string() >>.

=item C<Parrot_PMC Parrot_pmc_get_pmc(Parrot_Interp, Parrot_PMC src)>

Returns the result of C<< src->vtable->get_pmc() >>.

=item C<Parrot_PMC Parrot_pmc_set_integer(Parrot_Interp, Parrot_PMC
dest, Parrot_Int src)>

Calls C<< dest->vtable->set_integer(src) >> and returns C<dest>.[2]

=item C<Parrot_PMC Parrot_pmc_set_number(Parrot_Interp, Parrot_PMC dest,
Parrot_Float src)>

Calls C<< dest->vtable->set_number(src) >> and returns C<dest>.

=item C<Parrot_PMC Parrot_pmc_set_string(Parrot_Interp, Parrot_PMC dest,
Parrot_String src)>

Calls C<< dest->vtable->set_string(src) >>.

=item C<Parrot_PMC Parrot_pmc_set_pmc(Parrot_Interp, Parrot_PMC dest,
Parrot_PMC src)>

Calls C<< dest->vtable->set_pmc(src) >>.

=item C<Parrot_PMC Parrot_pmc_get_indexed(Parrot_Interp, Parrot_PMC src,
Parrot_PMC index)>

Constructs a key from C<index> and calls C<<
src->vtable->get_pmc_keyed(key) >>.[3]

=item C<Parrot_PMC Parrot_pmc_get_indexed_i(Parrot_Interp, Parrot_PMC
src, Parrot_Int index)>
Calls C<< src->vtable->get_pmc_keyed_integer(index) >>.

=item C<Parrot_PMC Parrot_pmc_set_indexed(Parrot_Interp, Parrot_PMC
dest, Parrot_PMC index, Parrot_PMC src)>

Constructs a key from C<index> and calls C<<
dest->vtable->set_pmc_keyed(key, src, NULL) >>.

=item C<Parrot_PMC Parrot_pmc_set_indexed_i(Parrot_Interp, Parrot_PMC
dest, Parrot_Int index, Parrot_PMC src)>

Calls C<< dest->vtable->set_pmc_keyed_integer(index, src, 0) >>.

=item C<Parrot_PMC Parrot_pmc_call(Parrot_Interp, Parrot_PMC sub,
Parrot_PMC args)>

Pushes C<args> onto the stack, calls C<sub>, pops the return value(s)
off the stack, and returns them.

=item C<Parrot_PMC Parrot_pmc_methcall(Parrot_Interp, Parrot_PMC object,
Parrot_String method, Parrot_PMC args)>

Finds C<method> in C<object>, pushes C<object> and C<args> onto the
stack, calls the method, pops the return value(s) off the stack, and
returns them.

=back

=head2 Miscellanea

=over 4

=item C<void *Parrot_alloc(Parrot_UInt size)>

Calls the system C<malloc()> with C<size>.

=item C<void Parrot_free(void * ptr)>

Calls the system C<free()> with C<ptr>.

=item C<void Parrot_block_gc(Parrot_Interp)>

Blocks the garbage collector on the selected interpreter.  Note that
this is done by incrementing a counter, so three calls to
C<Parrot_block_gc()> require three calls to C<Parrot_unblock_gc()>
before GC is reactivated.

=item C<void Parrot_unblock_gc(Parrot_Interp)>

Unblocks the garbage collector on the selected interpreter.

=back

=head1 ATTACHMENTS

None.

=head1 FOOTNOTES

[1] A string is used so that Parrot can support pluggable string
encodings but still degrade gracefully if the given encoding hasn't been
plugged in.

[2] This allows for code like C<Parrot_PMC
*mypmc=Parrot_pmc_set_integer(interp, Parrot_pmc_new(interp, "PerlInt"),
1)>.

[3] Note how limited keyed support is.  This is to keep things simple.
I thought about doing combinations of return types and key types, but
that caused a combinatorial explosion, and I didn't think it wise to
expose keys to the outside.

=head1 REFERENCES

PDD 10 (Embedding)

PDD 11 (Extending)

L<perlembed>, L<perlxs>
Thread Next
[DRAFT PPD] External Data Interfaces by Brent Dax
Re: [DRAFT PPD] External Data Interfaces by Juergen Boemmels
Re: [DRAFT PPD] External Data Interfaces by Nicholas Clark
RE: [DRAFT PPD] External Data Interfaces by Brent Dax
RE: [DRAFT PPD] External Data Interfaces by Bryan C. Warnock
nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About