develooper Front page | perl.perl6.language | Postings from April 2003

A17 early discussion: Perl6 Threading Proposal

From:
Austin Hastings
Date:
April 15, 2003 11:29
Subject:
A17 early discussion: Perl6 Threading Proposal
Message ID:
20030415182938.6630.qmail@web12306.mail.yahoo.com
=head1 NAME

Perl6::Threads - a proposal for threads in Perl6

=head1 REFERENCES

The primary reference documentation for threading in Perl6 is the
documentation and tutorial associated with ithreads in Perl 5.8. See
L<http://www.perldoc.com/perl5.8.0/pod/perlthrtut.html> for
enlightenment.

=head1 OVERVIEW

This section is broken down into an overview of thread concepts for
P6, followed by the threads language structures.

=head2 Threads in Perl6

Threads in Perl6 will be implemented at the interpreter level. The
number of threads that can be created is effectively unlimited. Each
thread will act as a modified Continuation (it's running, instead of
sitting there). It will duplicate the innermost lexical scope in which
it was created, and share all the outer scopes visible at creation
time. Non-lexical scopes will be shared as well.

=head3 Closures, Continuations, and Threads

Threads are created using a continuation. Continuations in Perl are
(obviously) closures over their lexical scope. In the interests of
performance and data-sharing, continuations do B<not> duplicate their
entire stack. Rather, they duplicate the lowest lexical scope on the
stack: all data outside the lowest lexical scope is shared.

NOTE: This may be how all continuations are implemented, depending on
how PDT feels about preserving lexical vars versus preserving the scope
itself. (That is, a continuation may not care about the preservation of
lexvars, so long as the stackframe stays. Ask @Larry.) I think this
isn't what DanS has been blogging, but a lot can change in a week...

Threads created B<in line> will duplicate the immediately enclosing
scope by taking a continuation in line. Threads created using the
object
interface will duplicate the lexical scope in the object creation
method, which immediately leaves, destroying the duplicated scope.

    sub ex1 {
      my $x = 0;
      loop {
        my $y = 1;
	$y += thread ? 100 : 0;  # C<thread> is like fork(), here.
      }
    }

In this example, $x remains shared while $y is duplicated - each
thread has its own copy of $y.

    sub ex1 {
      my $x = 0;
      loop {
        my $y = 1;
	$thr = new Thread: &other_sub;
      }
    }

In this example, $x and $y are shared, while the thread has a
duplicate copy of the internal scope of Thread::new (briefly).

=head3 Continuations versus Closures

Threads may be created using continuations or by passing a closure
(aka a Code object), see above. When threads are created with a
continuation, they have continuation behavior -- the thread doesn't end
unless it does a normal Perlish ending-action. When threads are created
using a Code element, leaving the Code element terminates the thread.

	 sub1 {
	   if (thread) {
	       print "In thread\n";
	   } else {
		 print "In main\n";
  	   }
	 }

sub1;
print "Hello\n";

Will print both "In thread" and "In main", as expected. However,
because the thread is created continuation-style, C<sub1> will return
twice (once in each thread) and "Hello" will print twice as
well. Whereas, in:

	  thread { print "In thread\n"; }
	  print "In main\n";
	  print "Hello\n";

because the thread is created Code-style, the new thread will print
"In thread" and then terminate, while the main thread prints "In main"
and "Hello" before ending.

=head2 Data Synchronization

There are three basic concerns involving data synchronization and
threaded programming. (There are some advanced concerns, too, like
Priority and plug-in Schedulers and such. We're not there yet.)

=head3 Race Conditions

When two threads are interleaving updates to the same data items, a
race condition exists. Perl6 has direct support for avoiding race
conditions -- assignments to non-thread-local variables are
automatically synchronized at the expression level.

=head3 Deadlocks

To avoid deadlocks -- two or more threads waiting on each other --
Perl6 provides grouped locking facilities. None of the resources is
locked until all of them are locked.

=head3 Multiprocessing

Running in a multiprocessor environment, two threads doing something
at the same time can mean they really I<are> running at the same
time. Perl6 extends locking and thread-safe behavior to include
multiprocessor support when required.

=head2 Threads and other Perl6 Features

Within the constraints that threads may duplicate the innermost
lexical scope of their creator but share everything else, other Perl6
features are relatively straightforward:

=head3 Grammars, Rules, and Rexen

Since rules are akin to subs, they can be used as a basis for threads.
The responsibility is on the programmer to produce a parse tree that
can survive shared updates, if any. Likewise, it is the programmer's
problem to construct a parsing environment amenable to multiple
threads.

=head3 Implicit Items

The current topic, bindings for $_ and any other "implicit" elements
are propagated forward through thread creation. These are per-thread
entities, and may change independently.

  for @ary {
    if (thread) {
      # $_ is bound to an element of @ary in child thread
      given $something_else { /* Topic in child, not parent. */ }
    } else {
      # Different $_ is bound to same element of same @ary in parent.
    }
  }

If a Continuation-style thread alters its implicit items and returns
past the original thread-creation point on the stack, the implicit
items are treated as usual. (E.g., if a thread uses the C<given>
statement, then the topic is changed for good I<in this thread>.)

=head3 Exceptions: C<die>, C<throw> and C<fail>

A continuation-style thread can ascend the entire (shared) stack. As
such, any exception handling in place will be used.

If a thread (of any kind) fails to catch an exception before reaching
the limit of its execution, it will terminate and the exception will
be suspended. Another thread subsequently waiting on a thread with a
pending exception will receive the exception as a result of the
C<wait> call.

=head3 File IO

IO is, at its heart, a shared resource. However, there's no reason why
operations such as chdir and the like shouldn't be thread-local. (That
is, thread 1 could have cwd = "/" while thread 2 has cwd = "/tmp".)

Filehandles are variables, and may be globally or lexically
scoped. Just as with any continuation-based operation, changes made
through a filehandle must be coded so as to keep the (singular)
resource behind the filehandle in sync. This is the developers
responsibility -- don't write bad code.

However, access to shared filehandles is serialized by Perl6 on an
operation basis (NOT a line, or separator, basis). If two threads try
to write to STDOUT, Perl6 guarantees that one write call will complete
before the other begins. However, if the threads are writing out
individual characters, that doesn't help very much.

=head3 Properties

Properties are either compile time or run time. Obviously,
compile-time properties have no impact on run-time threads. (I haven't
even begun to think about compile-time threading, but since Perl6 is
probably going to be written in Perl6, I hope someone thinks it
through...).

Run-time properties, however, are value-properties. Any value that is
assigned to a variable carries its properties with it (modulo C<int>
and C<bit> variables, and the like). If that variable is visible in
another thread, the other thread sees the property as well as the
value. If the property is a "lexical property" (created with C<new
Property>) the property is thereby visible to the other thread. Caveat
hacker.

=head3 Memoization

Functions are memoized (C<is cached>) because calling a function with
the same arguments will produce the same results. This ignores
context, such as variables not explicitly passed to the function. The
default behavior for memoization will be to share the cached values
across all threads. If the function is context-dependent, consider
declaring the function C<is cached thread_local>.

=head3 Temporization

Temporization substitutes a temporary value in place of a value
otherwise visible in the current scope. If a thread is created with a
temporized value visible in scope, the thread honors the
temporization. If the thread is a Continuation thread and subsequently
exits the scope of the temporization, the temporization is
removed. Once the temporization is removed, the original value (also a
shared value) is visible.

The thread does not see the original value as "the original value as
it was when the temp was taken" but rather "the original value as it
may have been updated by other threads." The point is that the
B<access> to the original value was suppressed, not the value itself.

=head3 Let: Hypothetical values

See also exceptions, above, since a C<wait> may cause a heretofore
successful thread to suffer a C<fail>ure.

Hypothetical values entirely contained in thread scope are no
different from other hypothetical values. (That is, if a thread makes
a hypothesis about internal data, no other thread tracks it.)

When a thread supposes a value for a shared data item, that data item
is updated in accordance with Perl6's shared-data-update rules, so
that the value is stored correctly. If the thread fails past the scope
of the hypothesis, the value is restored, per usual. If multiple
threads are supposing values for a variable, a conflict is likely to
ensue.

In this, hypotheses are no different from ordinary assignment. To
ensure that only one thread makes changes to a shared object over a
period of time, use C<lock>.

=head3 Let: Alternate Strategy

An alternate idea would be for a thread C<let> of a shared object to
temporize the object to the thread. This would allow other threads to
suppose their own values for the shared object. The usual rule for
confirmation of values would apply (whenever the C<let> goes out of
scope). When a hypothesis was confirmed, part of the cleanup for the
scope would be the (thread-safe) assignment of the new value to the
original shared object.

=head3 Scheduling

There will be multiple thread schedulers available, depending on
requirements. Much like changing the Perl6 grammar, changing the
scheduler will be relatively painless.

=head1 DETAILS

=head2 Thread Termination

Threads which are constructed using active continuations (i.e., via the
C<thread> keyword, or by passing a continuation to the Thread.new
method) will terminate when they C<leave> or C<return> from global
scope.

Threads which are constructed using a Code object (block or sub) will
terminated when they C<leave> or C<return> from the Code object used to
construct them.

Regardless of how it was constructed, a thread will terminate if it:
calls C<exit>; does not handle an exception from C<die>, C<fail>, or
C<throw>; calls C<leave Thread>.

Also, the C<snap> function will cause a thread to cease running, but
remain runnable. There will have to be some way of restarting a thread,
but it can probably be left for a fleshout pass.

=head2 Keywords or Global Functions

=head3 thread

The C<thread> function creates a new, running thread. If called without
args, it appears to return twice in the current codestream (compare the
Unix fork function). If called with a continuation, the continuation is
resumed in the child thread. If called with a Code object, a thread is
created that will terminate when the Code object ends.

If called without args, C<thread> behaves much like Unix fork,
returning
once in the parent and once in the child thread. C<thread> returns
C<parent.id but true> in the child thread, and C<child.id but false> in
the parent.

=head3 wait

The C<wait> informs the thread schedule that this thread has nothing to
do. The function returns when a variable changes or when some
unspecified amount of time has passed, depending on arguments:

=head4 Unspecified waiting

If C<wait> is called with no arguments, the thread is presumed to be
looping in a complex check to determine when to move ahead:

  while (!$is_ready || $interrupt_level < 100 && ! Tuesday()) {
	wait;
  }

=head4 Waiting on a variable

When C<wait> is called with arguments, the variables passed are
monitored for updates. When the specified variables have been updated,
the wait is over.

  wait $ready;
  wait @queue;
  wait any(@job_postings);
  wait all($weekend, $have_car, $got_date);

The meaning of wait can vary by type. Scalar wait requires a value
change, while array wait wants an insertion or deletion (push, pop,
etc.).

=head4 Waiting on Code or Continuations

Calling C<wait> on a Code reference or a continuation will suspend
execution until some (other) thread calls the function or invokes the
continuation in question.

  loop {
	wait &sbrk;
	print "Grew the heap!\n";
  }

=head4 Waiting on a Thread

Calling C<wait> on a Thread(s) will suspend the current thread until
the
target thread(s) have terminated.

=head4 Return value

Bare C<wait> returns nothing. Other flavors of C<wait> return a
reference to the object in question, save that C<wait> on a thread
returns the thread result, and C<wait> on a function returns the C<.id>
of the thread that called the function.

If a C<wait>ed thread has an uncaught exception, C<wait> will throw it
in the current thread. Likewise, if an external event causes the
C<wait> to return prematurely, it will throw an exception.

=head3 Thread Results

When a thread terminates, its result is determined just like a function
or program: by the last expression, leave, return, or exit arguments.
This result is stored in the thread descriptor for this thread, and
will be made available to any thread that calls C<wait> on the
terminated thread. (The result field is a member of the Thread class,
available for modification like any other field, if a thread wishes to
use this to broadcast status.)

=head2 Locking

Threads automatically lock any objects that are not accessed using
symbols created in the scope of the thread during scalar assignment,
++, --, Array or Hash editing.

Calling C<lock> on a data item will block until no other locks exist on
the data item. At this point, the calling thread will be able to modify
the data item without interference from other threads.

Calling C<lock> on multiple data items simultaneously will block until
locks are available on all items, at which point (ONLY) the locks will
be taken and control returned.

Calling C<lock> on a disjunction of references to data items will block
until a lock is available on any of the items.

Calling C<lock> on a subroutine will do what you think, lock-wise, but
will not prevent other (unlocked) calls. To ensure that only one thread
calls a sub at any time, use the C<is locked> compile-time trait.

Setting the C<but shared> trait on an object will cause all method
calls to lock the object prior to execution.

Setting C<is shared> on a variable will cause the thread to lock all
accesses to the variable.

=head1 STANDARD STUFF

=head2 Queues

Arrays can be used as queues, since the update operations are
guaranteed
atomic.

=head2 Mutexes

The C<lock> function provides mutex behavior.

=head3 Semaphores

Semaphore behavior is available using C<wait>. Threads automatically
synchronize assignment for basic expressions, so locking is not
required.

  my $sem = 0;
  $thr1 = thread { wait $sem; print "done\n"; }
  $thr2 = thread { print "Doink!\n"; ++$sem; }

Note that the behavior of wait depends on the type of the object being
waited -- thus Scalars wait for a change, while arrays wait until a
member is added or removed. Implementing "conventional" semaphore
behavior is left as an exercise for the interested reader.

=head1 OBJECT INTERFACE

Should fall out from the stuff above.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About