develooper Front page | perl.perl6.internals | Postings from August 2002

Re: Unifying PMCs and Buffers for GC

Thread Previous | Thread Next
From:
Mike Lambert
Date:
August 4, 2002 02:44
Subject:
Re: Unifying PMCs and Buffers for GC
Message ID:
Pine.LNX.4.44.0208040428160.1222-100000@jall.org
> Okay, I finally give. For purposes of liveness tracing and GC, we're
> going to unify PMCs and strings/buffers. This means we trace through
> strings and buffers if the flags are right, and we need to add a GC
> link pointer to strings/buffers. It'll make things a bit larger,
> which I don't like, but it lifts some restrictions I see looming,
> which I do like.
>
> Anyone care to take a shot at this?

I've started on this task, although it seems to be rather involved. :)

What follows is basically a brain dump on my current ideas that I'm
tossing around in an attempt to resolve the unification issues while
retaining the current speed.

a) The current hash implementation works against the GC, not with it.
Since we currently need a PerlHash PMC surrounding every buffer, these are
directly related by such a unification, and it would be good to allow for
them, and other such data structures.

I'm currently favoring allowing for header pools on a per-type basis, not
just a per-size basis. This would give us a 'hash' pool. The pool
structure would contain function pointers for collection and/or dod
purposes. (stuff that would otherwise be in a PMC vtable.)

Since collection phases are done on a per-header-pool basis already, it
wouldn't be difficult to make per-pool collection functions that are
responsible for iterating over their elements and handling them.

This would help speed up hashes, and make them easier to implement, since
they could update their internal pointers on hash relocation, while it's
all still in the cache.


However, dod functions are a bit harder to handle. mark_used currently
calls pmc->vtable->mark to handle its behavior, and buffers don't do
anything special. This is what prevents hashes from being implemented as
buffers, GC-wise...they need special collection logic. Currently, any
buffer that contains pointers *must* be surrounded by a PMC which
indicates it's behavior, or it's considered a dumb data pointer, like
strings.

One idea, which is most closely in line with the current semantics, is to
add a pool pointer to every header. I've found a few times in the past
where such a pointer would have come in handy. This would allow us to call
the pool's mark() function, to handle stuff like pointing-to-buffers, etc.

It's main drawback is the additional size of the pointer in the header.
I believe this might be okay for a few reasons:

a) our main types, pmc and string, are already quite large. This isn't
that much in their scale of things.

b) it allows us to make new types of buffer-like headers on par with
existing structures. This should hopefully make the core GC code change
less often, and push it out onto the implementation of the headers.

c) currently pmc's have a vtable pointer. If we're really concerned about
the additional data element, we could do something like:
pmc->pool.vtable->add_used instead of the traditional vtable-> . I'm not
convinced of the merit of this idea, and if the 'add' is deemed too slow,
we can just keep a vtable *and* pool pointer in the PMC header.


One implication of c) is that every pmc type has its own pool. This means:

a) no pmc type morphing. once in a pool, it stays in a pool. I don't see
this as a big loss, since type morphing is error-prone to begin with, imo.

b) data members! Since not all pmcs are the same size, pmcs are able to
store data elements in their structure. This allows us to make a SV-like
PMC which stores str-value, int-value, float-value, etc. All without
imposing on the base PMC buffer size. (no, data and cache aren't enough to
handle the above three values, without having the data point to a header
pointing to a buffer containing the values.)


Thoughts on all of the above? The main drawback that I see is that we can
have a lot more pools. Currently, we don't take advantage of sized header
pools, so making them per-type won't hurt us. However, by making different
pools for different pmc types, an explosion in base pmc types could cause
an explosion in pools and create wasteful memory usage as each pool stores
'extra' headers for allocation. This can probably be tuned in some form
to reduce over-allocation's affect, but I thought it wise to bring it up.


Finally....the unification of buffers and PMCs means that buffers can now
point to things of their own accord, without requiring that they be
surrounded by an accompanying PMC type. (This is a seperate question from
the above discussion, as this problem occurs regardless of what we do
above.) This imposes additional work on the DOD, since instead of just
buffer_lives-ing a buffer, it must now stick it on the DOD list so that it
can be properly traced later. This then requires that each buffer contain
a next_for_GC pointer, so it can be added to the to-do list. Alternately,
we can use pool-specific memory to handle the various pointers that are
required for DOD....but the point remains that this further increases the
memory footprint of buffers, and I wanted to verify that it was okay.


Comments and/or suggestions, please?

Thanks,
Mike Lambert


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About