Section#17: Sorting on hypercubic networks
(CS838: Topics in parallel computing, CS1221, Tue, Mar 30, 1999, 8:00-9:15 a.m.)


The contents

  1. Even-Odd MergeSort
  2. Even-Even MergeSort
  3. Bitonic MergeSort
  4. MergeSort algorithms in hypercubic networks
    1. Implementation of Even-Odd MergeSort
    2. Implementation of Bitonic Sort
The most important sorting algorithms for hypercubic topologies are based on recursively structured sorting networks whose main components are networks for merging two sorted sequences. These algorithms are not asymptotically optimal, but they are sufficiently fast and very simple. Alternatively, they are called Batcher's algorithms.

Back to the beginning of the page Back to the CS838 class schedule

Even-Odd MergeSort

Even-Odd MergeSort (EOMS) sorts by recursively merging larger and larger monotonically sorted sequences. We will consider for simplicity only sorting into an ascending order. Consider an input unsorted sequence of 2N numbers. Assume for simplicity that N is a power of two. Clearly, it takes [ log N] parallel merge operations to sort 2N numbers and the length of merged sequences doubles each time. An EOMS network for sorting 2N numbers is basically a cascade of log N Even-Odd Merge networks. It has obviously recursive structure as implied by Figure , where EOMS(k) denotes the EOMS network for sorting input sequences of length k and EOM(k,k) denotes the network for Even-Odd Merge of two sequences of length k each.

CAPTION: The structure of Even-Odd MergeSort (a) and Merge (b) network.

Functional description of the EOMS algorithm implemented by the network on Figure (a) is then

{EOMergeSort}(a0,...,a2N-1)=
{EOMerge}({EOMergeSort}(a0,...,aN-1),{EOMergeSort}(aN,...,a2N-1)).
Clearly, the key trick of the EOMS algorithm is hidden in the structure of the merge network. It turns out that it can also be constructed recursively!!!

Consider two sorted sequences A=a0,a1,..,aN-1 and B=b0,b1,..,bN-1. The sorted sequence L={EOMerge}(A,B) of length 2N can be constructed as follows:

  1. Partition A and B into odd and even subsequences:
    even(A)=a0,a2,..,aN-2   odd(A)=a1,a3,..,aN-1
    even(B)=b0,b2,..,bN-2 odd(B)=b1,b3,..,bN-1
    Note that since A and B are sorted, so are their odd and even subsequences.
  2. Recursively,
  3. Interleave C and D to form a sequence L'=c0,d0,c1,d1,..,cN-1,dN-1 (see the last stage in Figure (b)).
  4. Apply compare-and-exchange to each pair ci,di.
The resulting Even-Odd Merge network is depicted in Figure (b). Its functional specification is as follows:
C={EOMerge}(even(A),odd(B))
D={EOMerge}(odd(A),even(B))
L'={Interleave}(C,D)
L={EOMerge}(A,B)={Pairwise_CE}(L')
The main trick of this merge algorithm was discovered first by Batcher: to merge A and B, we just need to interleave C and D and we are almost done. Formally, it can be proven as follows.

Lemma

Even-Odd Merge algorithm merges 2 sorted sequences of length N into a sorted sequence of length 2N in log N+1 parallel CE steps using N(log N+1) comparators.
Proof
(By 0-1 Sorting Lemma.) The lemma obviously holds for N=1. Let N=2k, k>= 1. Let us prove the complexity. Let dm(2N) be the depth of EOM(N,N). For N=1, we can do the merge using one comparator, hence dm(2)=1. Since EOM(N,N) is recursive and each level of recursion adds just one column of comparators, we have dm(2N)=dm(N)+1 which gives dm(2N)=log N+1=log(2N). The number of comparators of M(N,N) is cm(2N)=Ndm(2N)=Nlog(2N).

And the overall complexity of EOMS easily follows.

Lemma

EOMS sorts N numbers in O(log2N) parallel CE steps using O(Nlog2N) comparators.
Proof
Let ds(N) denote the depth of EOMS(N) ( Figure (a)). Then ds(2)=1 and ds(2N)=ds(N)+dm(2N). Since dm(N)=log N, it follows that ds(N)=log N(log N+1)/2. For the number of comparators cs(N), we have similarly cs(2)=1 and cs(2N)=2cs(N)+cm(2N) which gives cs(N)=Nds(N)/2=O(Nlog2N).

CAPTION: Unfolded Even-Odd Merge Sorting network. The numbers denote the depth of the corresponding Merge network. In all stages, the subsequences are sorted monotonically.

Back to the beginning of the page Back to the CS838 class schedule

Even-Even MergeSort

There exists another slightly modified merging scheme which is equivalent to EOMS in terms of the main idea and performance. It is called Even-Even MergeSort (EEMS). It differs from EOMS in just two details.
  1. EEMS merges recursively
  2. The final CE operation does not touch the first element c0 and the last element dN-1 (since c0 is always the least and dN-1 always the greatest number) and compares only odd-even pairs di,ci+1 for i=0,...,N-2.
Figure illustrates the structure and the way it differs from the Even-Odd MergeSort network. The functional specification of the Even-Even Merge network is therefore.
C={EEMerge}(even(A),even(B))
D={EEMerge}(odd(A),odd(B))
{EEMerge}(A,B)={Pairwise_CE}({Even_Odd_Interleave}(C,D))
{EEMergeSort}(a0,...,a2N-1)={EEMerge}({EEMergeSort}(a0,...,aN-1),{EEMergeSort}(aN,...,a2N-1))

CAPTION: The structure of Even-Even MergeSort (a) and Merge (b) network.

Back to the beginning of the page Back to the CS838 class schedule

Bitonic MergeSort

Bitonic Sort (BS) is yet another variant of MergeSort which has easier implementation on hypercubic networks. The parallel time and work complexity is the same as in EOMS. Instead of monotonically sorted sequences, BS works with bitonic sequences. A bitonic sequence has a valley and a peak.

Definition

A sequence of numbers A=a0,a1,...,aN-1 is bitonic if
  1. either there is an index i, 0<= i<= N-1, such that a0,...,ai is ascending and ai+1,...,aN-1 is descending,
  2. or there is a rotation of A such that the previous condition holds.
Instead of merging two ascending sequences of length N to produce an ascending sequence of length 2N, the basic operation used in Bitonic Sort is a merge of one ascending and one descending sequence, both of length N, into one ascending sequence of length 2N. It can be viewed as a transformation of a bitonic sequence into a monotonic one of the same length. This is called Bitonic Merge and its basic operation is called Bitonic Split.

Definition

If A=a0,a1,...,a2N-1 is bitonic, define
AL=min(a0,aN),min(a1,aN+1),...,min(aN-1,a2N-1)
and
AH=max(a0,aN),max(a1,aN+1),...,max(aN-1,a2N-1).
Then the concatenation ALAH is called the bitonic split of A.
The following observation is of key importance.

Lemma

If A=a0,a1,...,a2N-1 is bitonic, then
  1. AL and AH are again bitonic
  2. every number in AL is smaller than every number in AH.
Proof
Figure shows the Bitonic Split of A and the statement of the lemma easily follows from the figure.

CAPTION: The Bitonic Split transforms one bitonic sequence A into two bitonic sequences. (a) Cutting A into left half A1 and right half A2. (b) Overlapping A1 with A2. (c) The Bitonic Split.

We can easily observe that if Bitonic Split is applied recursively to an input bitonic sequence A of length N, the sequence will become sorted into ascending order after log N steps. This is exactly what the Bitonic Merge, denoted by BM and depicted on Figure (b), does. The Bitonic Sort algorithm works similarly as EOMS, but it uses Bitonic Merge. Assume again that N is a power of two.

  1. The input unsorted sequence of 2N numbers is first split into N pairs.
  2. By passing through a comparator, each pair becomes a sorted sequence of length 2. However, the comparators in the first column are alternatively of ascending and descending type. Hence after the first step, every quadruple consists of an ascending and descending half, i.e., it forms a bitonic sequence of length 4.
  3. In the next step, Bitonic Merge networks of depth 2 transform bitonic sequences of length 4 into monotonic sequences so that again adjacent sequences are alternatively ascending and descending. This step produces bitonic sequences of length 8, and so on.
  4. Finally, the last Bitonic Merge network uses bitonic split to produce a monotonic sequence of length 2N form a bitonic one.
Figure shows again the recursive structure of the Bitonic MergeSort network and the Bitonic Merge network. BS{}+(N) denotes the Bitonic MergeSort network sorting sequences of length N in ascending order (BS{}-(N) sorts in descending order) and BM(N) denotes the Bitonic Merge network transforming bitonic sequences of length N into monotonic ones.

CAPTION: Recursive structure of Bitonic MergeSort (a) and the Bitonic Merge (b) networks.

The Bitonic Merge network on Figure (b) produces ascending sequences, but if we reverse the sense of ordering of all comparators, we get the descending case. Functionally, the networks can be specified as follows.

(ALAH)={Bitonic_Split}(A)
{BMerge}(A)={BMerge}(AL){BMerge}(AH)
{BSort+}(a0,...,a2N-1)={BMerge}({BSort+}(a0,...,aN-1){BSort-}(aN,...,a2N-1))
If we unfold the bitonic-merge stages, we again get a cascade of comparator stages of growing depth, so that the parallel time and cost is the same as in EOMS.

CAPTION: Unfolded Bitonic Sorting network.

Back to the beginning of the page Back to the CS838 class schedule

MergeSort algorithms in hypercubic networks

Implementation of Even-Odd MergeSort

EOMS of N=2n numbers can be nicely implemented on ordinary butterfly oBFn in O(n2) steps. The implementation exploits the recursive structure of the butterfly, precisely matching the requirements of the algorithm.

CAPTION: Even-odd Merge operation of two sorted sequences A and B of length 4 implemented on oBF3.

Let us describe the butterfly implementation of the Even-Odd Merge of two sorted sequences.

CAPTION: Even-Odd MergeSort of 16 numbers on Q4.

If we unfold the recursion, we find out that data move in n steps forward and sequence B is reversed when it reaches the rightmost column, whereas sequence A gets there unchanged. During the backward n-step phase, the data use the cross edges whenever adjacent pairs are out of order. Analogously, Even-Odd Merge of sorted sequences of length 2k is implemented as exactly the same forth and back sweep of data through oBFn, but only using its first k stages. The implementation on an N-node hypercube is straightforward by contracting the rows of the ordinary butterfly into hypercube nodes, as shown on Figure . It is obvious that the hypercube implementation requires some extra steps after each merge operation to bring the corresponding even-odd subsequences into correct positions. A more detailed investigation reveals that basically

every other ascending subsequence is reversed into a descending one.
But this is exactly what the Bitonic Sort does for free!!!!

Implementation of Bitonic Sort

Bitonic Sort has almost the same implementation on the butterfly, adjacent butterflies just sort the corresponding subsequences alternatively as ascending and descending. But this means that in the hypercube, no extra permutation steps are needed and the number of parallel CE steps equals exactly the depth of the Bitonic Sort network (see Figure ). Note that the hypercube dimensions used by Bitonic Sort are used in the following order: 1, 2, 1, 3, 2, 1, 4, 3, 2, 1, ...

CAPTION: Bitonic MergeSort of 16 numbers on Q4.

Bitonic MergeSort has a simple implementation on shuffle-based networks. This follows from that fact that it always compares numbers with indexes that differ by exactly one bit in their representation and corresponding pairs of numbers can be brought together by shuffle operation on the SE graph.

Back to the beginning of the page Back to the CS838 class schedule