Front page | perl.perl6.internals |
Postings from July 2001
Opcode Dispatch
Thread Previous
|
Thread Next
From:
Bryan C . Warnock
Date:
July 29, 2001 21:54
Subject:
Opcode Dispatch
Message ID:
01073000545800.04399@wakko.idiocity.nut
Not that this is the most scientific testing in the world, but I did write a
couple variations of an opcode dispatch loop just to see how they compare.
Of course, I violated rule number one of writing fast code - I didn't run it
on a slow machine. Shame on me.
Given an 80K opcode stream consisting of 25 opcodes run through 4,000 times,
with a terminating 26th opcode, and with 26 individual opcode dispatch
functions. Too small a dataset, but I wanted to get a feel for it.
Each of the functions took the opcode pointer and a data pointer. One
function handled either the buffer reset, or the program exit. The
remainder did some simple arithmetic on the addresses of both local and
argument variables, and added (or subtracted) that to some global variable,
which was used at exit. (In an effort to keep the busy work from being
optimized out.)
The two major variations were the Big Switch, and a function pointer table
lookup.
int main () {
char *buf = "....";
char *opcode = buf;
char *(*fn[])(char *,int *) = {
opcodefA, ... , opcodefZ }; /* See why we love Perl? */
int x;
do {
Here's the switch, basically.
switch (*opcode) {
case 'A': opcode = opcodefA(opcode, &x); break;
...
case 'Z': opcode = opcodefZ(opcode, &x); break;
default: break;
}
and here's the lookup:
opcode = fn[(int)(*opcode - 'A')](opcode,&x);
I then ran each as compiled with -g, and then with -O2. I ran each program
six times, throwing away the first results, and averaging the remaining
five. For both cases, the optimized version ran about twice as fast as the
debuggable version. The lookup was slightly faster for unoptimized code,
while the switch was quicker after optimizations.
lookup: 8.9u 15.7u
switch: 7.65u 16.3u
I then chose an opcode to make a noop. For the switch, that meant changing
a case to be a simple opcode increment, but for the lookup, that involved
changing the opcode function to do nothing except return the incremented
pointer. So a noop in the lookup would still have the overhead of a
function call (unless the compiler was clever enough to optimize it away
through the function pointer). The lookup actually ran slightly slower when
optimized, while the switch sped up a little.
lookup: 9.1u 15.6u
switch: 7.25u 16.0u
I then added an array bounds check on the lookup (since I didn't want to be
executing arbitrary code with an invalid opcode, such as from a future
version of Perl. Which reminds me, we need a version opcode.) For the
switch, that was handled by the default case. For both, the result was a
noop. It had no discernable affect on either loop.
lookup: 9.1u
switch: 7.25u
Some additional things to consider:
I wasn't checking the opcode pointer, assuming I always had a good one. (I
made sure I always had a good one by appending the termination opcode. I
don't know if we can get away with this in real life.) With real opcodes,
you also won't need to do arithmetic on the function table index. (However,
when I padded the function table and removed it, I didn't see any noticeable
change, so I left it alone.)
If we're going to be allowing replaceable opcodes, or allow pre- and post-
opcode function functions, that may shift the focus a little. If you're
replacing the functions, having a lookup table would allow you to bubble the
new function right to the top, whereas the switch requires the default
opcode handler to call the function for you. Furthermore, even with the
switch, you're going to handle pre- and post- functions with a lookup
anyway, but then so is the lookup, so I guess they cancel out. The
advantage the switch has, on the other hand, is that not all opcodes should
be pluggable. (Since we're designing this as a software CPU, process
control opcodes, such as jumps, program termination, etc. shouldn't be
allowed to be overridden.) Those, if they are simple enough, can then be
inlined into the switch, much like the noop, and save some overhead on
function calls.
I haven't gotten events plugged in yet, but the event loop, if we're still
planning for a two-loop architecture, should go under similar design
considerations. However, events won't (we hope) occur as often as opcodes
do, and will probably be overridden more than opcodes. In that particular
case, we could very well have a mix-and-match situation, where a switched
opcode dispatch is faster under normal situations, but a lookup dispatch
table is faster for the events normally. Dunno. Also, to bury a question
way down here where it won't be seen, are we having a fixed number of event
priorities, or is it unbounded?
Yes, I'm well aware of what Knuth says about premature optimizations.
Luckily, it's fairly trivial to switch (no pun intended) between the two, if
that they are both setup for that. (I'm not sure I could switch Perl 5 from
one to the other, for instance.) And I'm sure there are some things that
I'm not considering. Nick and Dan may have some additional insight.
--
Bryan C. Warnock
bwarnock@capita.com
Thread Previous
|
Thread Next