However, if array dimensions which are accessed indirectly are mapped to the processors within a node, the overheads of runtime parallelization can be avoided due to the shared access. Despite these differences both models usually result in the same degree of parallelism. Thus, for many applications only minor performance differences can be observed. In particular, this is true for codes which are characterized by a high degree of locality and independent computations. In the following we present an experimental evaluation of the new language extensions and the hybrid parallelization strategy as provided by VFC using two benchmark codes on a Beowulf-type SMP PC cluster.

On the Origin2000, the SGI data placement directives [23] form a vendor specific extension of OpenMP. g. ”affinity scheduling“ of parallel loops is the counterpart to the ON clause of HPF. Compaq has also added a new set of directives to its Fortran for Tru64 UNIX that extend the OpenMP Fortran API to control the placement of data in memory and the placement of computations that operate on that data [7]. Chapman, Mehrotra and Zima [10] have proposed a set of OpenMP extensions, similar to HPF mapping directives, for locality control.

822 3290 3232, Fax: +822 928 8909 Abstract. Merge sort is useful in sorting a great number of data progressively, especially when they can be partitioned and easily collected to a few processors. Merge sort can be parallelized, however, conventional algorithms using distributed memory computers have poor performance due to the successive reduction of the number of participating processors by a half, up to one in the last merging stage. This paper presents load-balanced parallel merge sort where all processors do the merging throughout the computation.

