Section#17: Sorting on hypercubic networks
(CS838: Topics in parallel computing, CS1221, Tue, Mar 30, 1999, 8:009:15 a.m.)
The contents
 EvenOdd MergeSort
 EvenEven MergeSort
 Bitonic MergeSort
 MergeSort algorithms in hypercubic networks
 Implementation of EvenOdd MergeSort
 Implementation of Bitonic Sort
The most important sorting algorithms for hypercubic topologies are based on recursively structured sorting networks whose main components are networks for merging two sorted sequences. These algorithms are not asymptotically optimal, but they are sufficiently fast and very simple. Alternatively, they are called Batcher's algorithms.
EvenOdd MergeSort (EOMS) sorts by recursively merging larger and larger monotonically sorted sequences. We will consider for simplicity only sorting into an ascending order.
Consider an input unsorted sequence of 2N numbers. Assume for simplicity that N is a power of two.
 EOMS treats the input sequence as a sequence of N pairs.
 By passing through a comparator, each pair becomes a sorted sequence of length 2.
 These subsequences are then merged into N/2 sorted subsequences of length 4, and so on.
 In the last merge step, two ascending sequences of length N are merged into the final ascending sequence of length 2N.
Clearly, it takes [ log N] parallel merge operations to sort 2N numbers and the length of merged sequences doubles each time. An EOMS network for sorting 2N numbers is basically a cascade of log N EvenOdd Merge networks. It has obviously recursive structure as implied by Figure , where EOMS(k) denotes the EOMS network for sorting input sequences of length k and EOM(k,k) denotes the network for EvenOdd Merge of two sequences of length k each.
CAPTION: The structure of EvenOdd MergeSort (a) and Merge (b) network.
Functional description of the EOMS algorithm implemented by the network on Figure (a) is then
{EOMergeSort}(a_{0},...,a_{2N1})=
{EOMerge}({EOMergeSort}(a_{0},...,a_{N1}),{EOMergeSort}(a_{N},...,a_{2N1})).
Clearly, the key trick of the EOMS algorithm is hidden in the structure of the merge network. It turns out that it can also be constructed recursively!!!
Consider two sorted sequences A=a_{0},a_{1},..,a_{N1} and B=b_{0},b_{1},..,b_{N1}. The sorted sequence L={EOMerge}(A,B) of length 2N can be constructed as follows:
 Partition A and B into odd and even subsequences:
even(A)=a_{0},a_{2},..,a_{N2} odd(A)=a_{1},a_{3},..,a_{N1}
even(B)=b_{0},b_{2},..,b_{N2} odd(B)=b_{1},b_{3},..,b_{N1}
Note that since A and B are sorted, so are their odd and even subsequences.
 Recursively,
 merge even(A) with odd(B) into a sorted sequence C=c_{0},c_{1},..,c_{N1},
 merge odd(A) with even(B) into a sorted sequence D=d_{0},d_{1},..,d_{N1}.
 Interleave C and D to form a sequence L'=c_{0},d_{0},c_{1},d_{1},..,c_{N1},d_{N1} (see the last stage in Figure (b)).
 Apply compareandexchange to each pair c_{i},d_{i}.
The resulting EvenOdd Merge network is depicted in Figure (b). Its functional specification is as follows:
C={EOMerge}(even(A),odd(B))

D={EOMerge}(odd(A),even(B))

L'={Interleave}(C,D)

L={EOMerge}(A,B)={Pairwise_CE}(L')

The main trick of this merge algorithm was discovered first by Batcher: to merge A and B, we just need to interleave C and D and we are almost done. Formally, it can be proven as follows.
Lemma
EvenOdd Merge algorithm merges 2 sorted sequences of length N into a sorted sequence of length 2N in log N+1 parallel CE steps using N(log N+1) comparators.
Proof
(By 01 Sorting Lemma.) The lemma obviously holds for N=1. Let N=2^{k}, k>= 1.
Let us prove the complexity. Let d_{m}(2N) be the depth of EOM(N,N). For N=1, we can do the merge using one comparator, hence d_{m}(2)=1. Since EOM(N,N) is recursive and each level of recursion adds just one column of comparators, we have d_{m}(2N)=d_{m}(N)+1 which gives d_{m}(2N)=log N+1=log(2N). The number of comparators of M(N,N) is c_{m}(2N)=Nd_{m}(2N)=Nlog(2N).
And the overall complexity of EOMS easily follows.
Lemma
EOMS sorts N numbers in O(log^{2}N) parallel CE steps using O(Nlog^{2}N) comparators.
Proof
Let d_{s}(N) denote the depth of EOMS(N) ( Figure (a)). Then d_{s}(2)=1 and d_{s}(2N)=d_{s}(N)+d_{m}(2N). Since d_{m}(N)=log N, it follows that d_{s}(N)=log N(log N+1)/2. For the number of comparators c_{s}(N), we have similarly c_{s}(2)=1 and c_{s}(2N)=2c_{s}(N)+c_{m}(2N) which gives c_{s}(N)=Nd_{s}(N)/2=O(Nlog^{2}N).
CAPTION: Unfolded EvenOdd Merge Sorting network. The numbers denote the depth of the corresponding Merge network. In all stages, the subsequences are sorted monotonically.
There exists another slightly modified merging scheme which is equivalent to EOMS in terms of the main idea and performance. It is called EvenEven MergeSort (EEMS). It differs from EOMS in just two details.
 EEMS merges recursively
 even(A) with even(B) to produce C
 odd(A) with odd(B) to produce D
 The final CE operation does not touch the first element c_{0} and the last element d_{N1} (since c_{0} is always the least and d_{N1} always the greatest number) and compares only oddeven pairs d_{i},c_{i+1} for i=0,...,N2.
Figure illustrates the structure and the way it differs from the EvenOdd MergeSort network.
The functional specification of the EvenEven Merge network is therefore.
C={EEMerge}(even(A),even(B))

D={EEMerge}(odd(A),odd(B))

{EEMerge}(A,B)={Pairwise_CE}({Even_Odd_Interleave}(C,D))

{EEMergeSort}(a_{0},...,a_{2N1})={EEMerge}({EEMergeSort}(a_{0},...,a_{N1}),{EEMergeSort}(a_{N},...,a_{2N1}))

CAPTION: The structure of EvenEven MergeSort (a) and Merge (b) network.
Bitonic Sort (BS) is yet another variant of MergeSort which has easier implementation on hypercubic networks. The parallel time and work complexity is the same as in EOMS. Instead of monotonically sorted sequences, BS works with bitonic sequences. A bitonic sequence has a valley and a peak.
Definition
A sequence of numbers A=a_{0},a_{1},...,a_{N1} is bitonic if
 either there is an index i, 0<= i<= N1, such that a_{0},...,a_{i} is ascending and a_{i+1},...,a_{N1} is descending,
 or there is a rotation of A such that the previous condition holds.
Instead of merging two ascending sequences of length N to produce an ascending sequence of length 2N, the basic operation used in Bitonic Sort is a merge of one ascending and one descending sequence, both of length N, into one ascending sequence of length 2N. It can be viewed as a transformation of a bitonic sequence into a monotonic one of the same length. This is called Bitonic Merge and its basic operation is called Bitonic Split.
Definition
If A=a_{0},a_{1},...,a_{2N1} is bitonic, define
A_{L}=min(a_{0},a_{N}),min(a_{1},a_{N+1}),...,min(a_{N1},a_{2N1})
and
A_{H}=max(a_{0},a_{N}),max(a_{1},a_{N+1}),...,max(a_{N1},a_{2N1}).
Then the concatenation A_{L}A_{H} is called the bitonic split of A.
The following observation is of key importance.
Lemma
If A=a_{0},a_{1},...,a_{2N1} is bitonic, then
 A_{L} and A_{H} are again bitonic
 every number in A_{L} is smaller than every number in A_{H}.
Proof
Figure shows the Bitonic Split of A and the statement of the lemma easily follows from the figure.
CAPTION: The Bitonic Split transforms one bitonic sequence A into two bitonic sequences. (a) Cutting A into left half A_{1} and right half A_{2}. (b) Overlapping A_{1} with A_{2}. (c) The Bitonic Split.
We can easily observe that if Bitonic Split is applied recursively to an input bitonic sequence A of length N, the sequence will become sorted into ascending order after log N steps. This is exactly what the Bitonic Merge, denoted by BM and depicted on Figure (b), does. The Bitonic Sort algorithm works similarly as EOMS, but it uses Bitonic Merge. Assume again that N is a power of two.
 The input unsorted sequence of 2N numbers is first split into N pairs.
 By passing through a comparator, each pair becomes a sorted sequence of length 2. However, the comparators in the first column are alternatively of ascending and descending type. Hence after the first step, every quadruple consists of an ascending and descending half, i.e., it forms a bitonic sequence of length 4.
 In the next step, Bitonic Merge networks of depth 2 transform bitonic sequences of length 4 into monotonic sequences so that again adjacent sequences are alternatively ascending and descending. This step produces bitonic sequences of length 8, and so on.
 Finally, the last Bitonic Merge network uses bitonic split to produce a monotonic sequence of length 2N form a bitonic one.
Figure shows again the recursive structure of the Bitonic MergeSort network and the Bitonic Merge network. BS{}^{+}(N) denotes the Bitonic MergeSort network sorting sequences of length N in ascending order (BS{}^{}(N) sorts in descending order) and BM(N) denotes the Bitonic Merge network transforming bitonic sequences of length N into monotonic ones.
CAPTION: Recursive structure of Bitonic MergeSort (a) and the Bitonic Merge (b) networks.
The Bitonic Merge network on Figure (b) produces ascending sequences, but if we reverse the sense of ordering of all comparators, we get the descending case. Functionally, the networks can be specified as follows.
(A_{L}A_{H})={Bitonic_Split}(A)

{BMerge}(A)={BMerge}(A_{L}){BMerge}(A_{H})

{BSort^{+}}(a_{0},...,a_{2N1})={BMerge}({BSort^{+}}(a_{0},...,a_{N1}){BSort^{}}(a_{N},...,a_{2N1}))

If we unfold the bitonicmerge stages, we again get a cascade of comparator stages of growing depth, so that the parallel time and cost is the same as in EOMS.
CAPTION: Unfolded Bitonic Sorting network.
Implementation of EvenOdd MergeSort
EOMS of N=2^{n} numbers can be nicely implemented on ordinary butterfly oBF_{n} in O(n^{2}) steps. The implementation exploits the recursive structure of the butterfly, precisely matching the requirements of the algorithm.
CAPTION: Evenodd Merge operation of two sorted sequences A and B of length 4 implemented on oBF_{3}.
Let us describe the butterfly implementation of the EvenOdd Merge of two sorted sequences.
 Assume an oBF_{n} and two sorted sequences A and B of 2^{n1} numbers.
 Initially, A is input into the upper half of column 0 of oBF_{n} and B into the lower half.
 During the first step, the numbers are moved to column 1 using straight edges for A and cross edges for B.
 After the first step, we use recursion to form
 C by merging even(A) with odd(B) in the blue subbutterfly oBF_{n1} induced by the even rows (the numbering of rows starts from zero from the top)
 D by merging odd(A) with even(B) in the red subbutterfly oBF_{n1} induced by the odd rows.
 After the recursion has finished, the elements of C and D are located in nodes of column 1 of oBF_{n} in the order c_{0},d_{0},c_{1},d_{1},.... oBF_{n} arranges the elements of C and D so that they are interleaved and directly form the sequence L'!!!
 The final sequence L={EOMerge}(A,B) is produced by moving the data to nodes in column 0 of oBF_{n} using straight or cross edges depending on whether the numbers c_{i} and d_{i} are outoforder or not.
CAPTION: EvenOdd MergeSort of 16 numbers on Q_{4}.
If we unfold the recursion, we find out that data move in n steps forward and sequence B is reversed when it reaches the rightmost column, whereas sequence A gets there unchanged. During the backward nstep phase, the data use the cross edges whenever adjacent pairs are out of order. Analogously, EvenOdd Merge of sorted sequences of length 2^{k} is implemented as exactly the same forth and back sweep of data through oBF_{n}, but only using its first k stages.
The implementation on an Nnode hypercube is straightforward by contracting the rows of the ordinary butterfly into hypercube nodes, as shown on Figure . It is obvious that the hypercube implementation requires some extra steps after each merge operation to bring the corresponding evenodd subsequences into correct positions. A more detailed investigation reveals that basically
every other ascending subsequence is reversed into a descending one.
But this is exactly what the Bitonic Sort does for free!!!!
Implementation of Bitonic Sort
Bitonic Sort has almost the same implementation on the butterfly, adjacent butterflies just sort the corresponding subsequences alternatively as ascending and descending. But this means that in the hypercube, no extra permutation steps are needed and the number of parallel CE steps equals exactly the depth of the Bitonic Sort network (see Figure ). Note that the hypercube dimensions used by Bitonic Sort are used in the following order: 1, 2, 1, 3, 2, 1, 4, 3, 2, 1, ...
CAPTION: Bitonic MergeSort of 16 numbers on Q_{4}.
Bitonic MergeSort has a simple implementation on shufflebased networks. This follows from that fact that it always compares numbers with indexes that differ by exactly one bit in their representation and corresponding pairs of numbers can be brought together by shuffle operation on the SE graph.