The
ability to provide uniform shared-memory access to a significant
number of processors in a single SMP node brings us much
closer to the ideal PRAM parallel computer. In this paper,
we develop new techniques for designing a uniform shared-memory
algorithm from a PRAM algorithm and present the results
of an extensive experimental study demonstrating that the
resulting programs scale nearly linearly across a significant
range of processors and across the entire range of instance
sizes tested. This linear speedup with the number of processors
is one of the first ever attained in practice for intricate
combinatorial problems. The example we present in detail
here is for evaluating arithmetic expression trees using
the algorithmic techniques of list ranking and tree contraction;
this problem is not only of interest in its own right,
but is representative of a large class of irregular combinatorial
problems that have simple and efficient sequential implementations
and fast PRAM algorithms, but have no known efficient parallel
implementations. Our results thus offer promise for bridging
the gap between the theory and practice of shared-memory
parallel algorithms.