Sorting on hypercubic networks

Section#17: Sorting on hypercubic networks
(CS838: Topics in parallel computing, CS1221, Tue, Mar 30, 1999, 8:00-9:15 a.m.)

The contents

Even-Odd MergeSort
Even-Even MergeSort
Bitonic MergeSort
MergeSort algorithms in hypercubic networks
1. Implementation of Even-Odd MergeSort
2. Implementation of Bitonic Sort

The most important sorting algorithms for hypercubic topologies are based on recursively structured sorting networks whose main components are networks for merging two sorted sequences. These algorithms are not asymptotically optimal, but they are sufficiently fast and very simple. Alternatively, they are called Batcher's algorithms.

Back to the beginning of the page Back to the CS838 class schedule

Even-Odd MergeSort

Even-Odd MergeSort (EOMS) sorts by recursively merging larger and larger monotonically sorted sequences. We will consider for simplicity only sorting into an ascending order. Consider an input unsorted sequence of 2N numbers. Assume for simplicity that N is a power of two.

EOMS treats the input sequence as a sequence of N pairs.
By passing through a comparator, each pair becomes a sorted sequence of length 2.
These subsequences are then merged into N/2 sorted subsequences of length 4, and so on.
In the last merge step, two ascending sequences of length N are merged into the final ascending sequence of length 2N.

Clearly, it takes [ log N] parallel merge operations to sort 2N numbers and the length of merged sequences doubles each time. An EOMS network for sorting 2N numbers is basically a cascade of log N Even-Odd Merge networks. It has obviously recursive structure as implied by Figure , where EOMS(k) denotes the EOMS network for sorting input sequences of length k and EOM(k,k) denotes the network for Even-Odd Merge of two sequences of length k each.

CAPTION: The structure of Even-Odd MergeSort (a) and Merge (b) network.

Functional description of the EOMS algorithm implemented by the network on Figure (a) is then

{EOMergeSort}(a₀,...,a_2N-1)=
{EOMerge}({EOMergeSort}(a₀,...,a_N-1),{EOMergeSort}(a_N,...,a_2N-1)).

Clearly, the key trick of the EOMS algorithm is hidden in the structure of the merge network. It turns out that it can also be constructed recursively!!!

Consider two sorted sequences A=a₀,a₁,..,a_N-1 and B=b₀,b₁,..,b_N-1. The sorted sequence L={EOMerge}(A,B) of length 2N can be constructed as follows:

Partition A and B into odd and even subsequences:
```
even(A)=a₀,a₂,..,a_N-2   odd(A)=a₁,a₃,..,a_N-1

even(B)=b₀,b₂,..,b_N-2   odd(B)=b₁,b₃,..,b_N-1
```
Note that since A and B are sorted, so are their odd and even subsequences.
Recursively,
- merge even(A) with odd(B) into a sorted sequence C=c₀,c₁,..,c_N-1,
- merge odd(A) with even(B) into a sorted sequence D=d₀,d₁,..,d_N-1.
Interleave C and D to form a sequence L'=c₀,d₀,c₁,d₁,..,c_N-1,d_N-1 (see the last stage in Figure (b)).
Apply compare-and-exchange to each pair c_i,d_i.

The resulting Even-Odd Merge network is depicted in Figure (b). Its functional specification is as follows:

C={EOMerge}(even(A),odd(B))

D={EOMerge}(odd(A),even(B))

L'={Interleave}(C,D)

L={EOMerge}(A,B)={Pairwise_CE}(L')

The main trick of this merge algorithm was discovered first by Batcher: to merge A and B, we just need to interleave C and D and we are almost done. Formally, it can be proven as follows.

Lemma

Even-Odd Merge algorithm merges 2 sorted sequences of length N into a sorted sequence of length 2N in log N+1 parallel CE steps using N(log N+1) comparators.

Proof
(By 0-1 Sorting Lemma.) The lemma obviously holds for N=1. Let N=2^k, k>= 1.

Consider two sorted binary sequences A and B of equal length N.
Assume that A consists of \alpha 0's followed by N-\alpha 1's and B consists of \beta 0's followed by N-\beta 1's.
Then the sequence C={EOMerge}(even(A),odd(B)) has \gamma=[\alpha/2]+[\beta/2] 0's, whereas the sequence D={EOMerge}(odd(A),even(B)) has \delta=[\alpha/2]+[\beta/2] 0's (see Figure ).
Since |\gamma-\delta|<= 1, it follows that the number of 0's in C and D can differ by at most one.

CAPTION: The proof of correctness of EOMS using the 0-1 Sorting Lemma.
So, if we form L'={Interleave}(C,D)=c₀,d₀,c₁,d₁,..,c_N-1,d_N-1, only three cases can occur.

Let us prove the complexity. Let d_m(2N) be the depth of EOM(N,N). For N=1, we can do the merge using one comparator, hence d_m(2)=1. Since EOM(N,N) is recursive and each level of recursion adds just one column of comparators, we have d_m(2N)=d_m(N)+1 which gives d_m(2N)=log N+1=log(2N). The number of comparators of M(N,N) is c_m(2N)=Nd_m(2N)=Nlog(2N).

And the overall complexity of EOMS easily follows.

Lemma

EOMS sorts N numbers in O(log²N) parallel CE steps using O(Nlog²N) comparators.

Proof
Let d_s(N) denote the depth of EOMS(N) ( Figure (a)). Then d_s(2)=1 and d_s(2N)=d_s(N)+d_m(2N). Since d_m(N)=log N, it follows that d_s(N)=log N(log N+1)/2. For the number of comparators c_s(N), we have similarly c_s(2)=1 and c_s(2N)=2c_s(N)+c_m(2N) which gives c_s(N)=Nd_s(N)/2=O(Nlog²N).

CAPTION: Unfolded Even-Odd Merge Sorting network. The numbers denote the depth of the corresponding Merge network. In all stages, the subsequences are sorted monotonically.

Back to the beginning of the page Back to the CS838 class schedule

Even-Even MergeSort

There exists another slightly modified merging scheme which is equivalent to EOMS in terms of the main idea and performance. It is called Even-Even MergeSort (EEMS). It differs from EOMS in just two details.

EEMS merges recursively
- even(A) with even(B) to produce C
- odd(A) with odd(B) to produce D
The final CE operation does not touch the first element c₀ and the last element d_N-1 (since c₀ is always the least and d_N-1 always the greatest number) and compares only odd-even pairs d_i,c_i+1 for i=0,...,N-2.

Figure illustrates the structure and the way it differs from the Even-Odd MergeSort network. The functional specification of the Even-Even Merge network is therefore.

C={EEMerge}(even(A),even(B))

D={EEMerge}(odd(A),odd(B))

{EEMerge}(A,B)={Pairwise_CE}({Even_Odd_Interleave}(C,D))

{EEMergeSort}(a₀,...,a_2N-1)={EEMerge}({EEMergeSort}(a₀,...,a_N-1),{EEMergeSort}(a_N,...,a_2N-1))

CAPTION: The structure of Even-Even MergeSort (a) and Merge (b) network.

Back to the beginning of the page Back to the CS838 class schedule

Bitonic MergeSort

Bitonic Sort (BS) is yet another variant of MergeSort which has easier implementation on hypercubic networks. The parallel time and work complexity is the same as in EOMS. Instead of monotonically sorted sequences, BS works with bitonic sequences. A bitonic sequence has a valley and a peak.

Definition

A sequence of numbers A=a₀,a₁,...,a_N-1 is bitonic if

either there is an index i, 0<= i<= N-1, such that a₀,...,a_i is ascending and a_i+1,...,a_N-1 is descending,
or there is a rotation of A such that the previous condition holds.

Instead of merging two ascending sequences of length N to produce an ascending sequence of length 2N, the basic operation used in Bitonic Sort is a merge of one ascending and one descending sequence, both of length N, into one ascending sequence of length 2N. It can be viewed as a transformation of a bitonic sequence into a monotonic one of the same length. This is called Bitonic Merge and its basic operation is called Bitonic Split.

Definition

If A=a₀,a₁,...,a_2N-1 is bitonic, define
A_L=min(a₀,a_N),min(a₁,a_N+1),...,min(a_N-1,a_2N-1)
and
A_H=max(a₀,a_N),max(a₁,a_N+1),...,max(a_N-1,a_2N-1).
Then the concatenation A_LA_H is called the bitonic split of A.

The following observation is of key importance.

Lemma

If A=a₀,a₁,...,a_2N-1 is bitonic, then

A_L and A_H are again bitonic
every number in A_L is smaller than every number in A_H.

Proof
Figure shows the Bitonic Split of A and the statement of the lemma easily follows from the figure.

CAPTION: The Bitonic Split transforms one bitonic sequence A into two bitonic sequences. (a) Cutting A into left half A₁ and right half A₂. (b) Overlapping A₁ with A₂. (c) The Bitonic Split.

We can easily observe that if Bitonic Split is applied recursively to an input bitonic sequence A of length N, the sequence will become sorted into ascending order after log N steps. This is exactly what the Bitonic Merge, denoted by BM and depicted on Figure (b), does. The Bitonic Sort algorithm works similarly as EOMS, but it uses Bitonic Merge. Assume again that N is a power of two.

The input unsorted sequence of 2N numbers is first split into N pairs.
By passing through a comparator, each pair becomes a sorted sequence of length 2. However, the comparators in the first column are alternatively of ascending and descending type. Hence after the first step, every quadruple consists of an ascending and descending half, i.e., it forms a bitonic sequence of length 4.
In the next step, Bitonic Merge networks of depth 2 transform bitonic sequences of length 4 into monotonic sequences so that again adjacent sequences are alternatively ascending and descending. This step produces bitonic sequences of length 8, and so on.
Finally, the last Bitonic Merge network uses bitonic split to produce a monotonic sequence of length 2N form a bitonic one.

Figure shows again the recursive structure of the Bitonic MergeSort network and the Bitonic Merge network. BS{}⁺(N) denotes the Bitonic MergeSort network sorting sequences of length N in ascending order (BS{}^-(N) sorts in descending order) and BM(N) denotes the Bitonic Merge network transforming bitonic sequences of length N into monotonic ones.

CAPTION: Recursive structure of Bitonic MergeSort (a) and the Bitonic Merge (b) networks.

The Bitonic Merge network on Figure (b) produces ascending sequences, but if we reverse the sense of ordering of all comparators, we get the descending case. Functionally, the networks can be specified as follows.

(A_LA_H)={Bitonic_Split}(A)

{BMerge}(A)={BMerge}(A_L){BMerge}(A_H)

{BSort⁺}(a₀,...,a_2N-1)={BMerge}({BSort⁺}(a₀,...,a_N-1){BSort^-}(a_N,...,a_2N-1))

If we unfold the bitonic-merge stages, we again get a cascade of comparator stages of growing depth, so that the parallel time and cost is the same as in EOMS.

CAPTION: Unfolded Bitonic Sorting network.

Back to the beginning of the page Back to the CS838 class schedule

MergeSort algorithms in hypercubic networks

Implementation of Even-Odd MergeSort

EOMS of N=2ⁿ numbers can be nicely implemented on ordinary butterfly oBF_n in O(n²) steps. The implementation exploits the recursive structure of the butterfly, precisely matching the requirements of the algorithm.

CAPTION: Even-odd Merge operation of two sorted sequences A and B of length 4 implemented on oBF₃.

Let us describe the butterfly implementation of the Even-Odd Merge of two sorted sequences.

Assume an oBF_n and two sorted sequences A and B of 2^n-1 numbers.
Initially, A is input into the upper half of column 0 of oBF_n and B into the lower half.
During the first step, the numbers are moved to column 1 using straight edges for A and cross edges for B.
After the first step, we use recursion to form
- C by merging even(A) with odd(B) in the blue subbutterfly oBF_n-1 induced by the even rows (the numbering of rows starts from zero from the top)
- D by merging odd(A) with even(B) in the red subbutterfly oBF_n-1 induced by the odd rows.
After the recursion has finished, the elements of C and D are located in nodes of column 1 of oBF_n in the order c₀,d₀,c₁,d₁,.... oBF_n arranges the elements of C and D so that they are interleaved and directly form the sequence L'!!!
The final sequence L={EOMerge}(A,B) is produced by moving the data to nodes in column 0 of oBF_n using straight or cross edges depending on whether the numbers c_i and d_i are out-of-order or not.

CAPTION: Even-Odd MergeSort of 16 numbers on Q₄.

If we unfold the recursion, we find out that data move in n steps forward and sequence B is reversed when it reaches the rightmost column, whereas sequence A gets there unchanged. During the backward n-step phase, the data use the cross edges whenever adjacent pairs are out of order. Analogously, Even-Odd Merge of sorted sequences of length 2^k is implemented as exactly the same forth and back sweep of data through oBF_n, but only using its first k stages. The implementation on an N-node hypercube is straightforward by contracting the rows of the ordinary butterfly into hypercube nodes, as shown on Figure . It is obvious that the hypercube implementation requires some extra steps after each merge operation to bring the corresponding even-odd subsequences into correct positions. A more detailed investigation reveals that basically

every other ascending subsequence is reversed into a descending one. But this is exactly what the Bitonic Sort does for free!!!!

Implementation of Bitonic Sort

Bitonic Sort has almost the same implementation on the butterfly, adjacent butterflies just sort the corresponding subsequences alternatively as ascending and descending. But this means that in the hypercube, no extra permutation steps are needed and the number of parallel CE steps equals exactly the depth of the Bitonic Sort network (see Figure ). Note that the hypercube dimensions used by Bitonic Sort are used in the following order: 1, 2, 1, 3, 2, 1, 4, 3, 2, 1, ...

CAPTION: Bitonic MergeSort of 16 numbers on Q₄.

Bitonic MergeSort has a simple implementation on shuffle-based networks. This follows from that fact that it always compares numbers with indexes that differ by exactly one bit in their representation and corresponding pairs of numbers can be brought together by shuffle operation on the SE graph.

Back to the beginning of the page Back to the CS838 class schedule

Section#17: Sorting on hypercubic networks(CS838: Topics in parallel computing, CS1221, Tue, Mar 30, 1999, 8:00-9:15 a.m.)

The contents

Even-Odd MergeSort

Even-Even MergeSort

Bitonic MergeSort

MergeSort algorithms in hypercubic networks

Implementation of Even-Odd MergeSort

Implementation of Bitonic Sort

Section#17: Sorting on hypercubic networks
(CS838: Topics in parallel computing, CS1221, Tue, Mar 30, 1999, 8:00-9:15 a.m.)