--------------------------------------------------------------------- CS 577 (Intro to Algorithms) Lecture notes: Sorting lower bound Shuchi Chawla --------------------------------------------------------------------- (This document is best viewed with a fixed-width font.) A lower bound on Comparison-based Sorting ========================================= We saw in class that the mergesort algorithm for sorting takes time O(n log n) on lists of size n. There are in fact several sorting algorithms that achieve this same time bound, for example, quicksort and heapsort. Is it possible to do any better? We will now see that any algorithm that is only allowed to compare pairs of elements in the list must take time at least Omega(n log n) for sorting the list in the worst case. Amazingly, this statement holds for *all* algorithms, those that we know and those that we don't know, indeed also those that have not been invented as yet. Note that one can in fact construct better algorithms by exploiting special properties of the data, for example, if all of the elements are "small" integers. (See homework 1.) How can we go about proving this? Note that we cannot assume anything about the algorithm, such as that it splits the input as in mergesort or quicksort. Instead, we will think of the algorithm as playing a game of 20 questions. Precisely, let's think of a comparison based algorithm as a decision tree. Let n be fixed and suppose that the algorithm starts by comparing (say) the first element in the list with the tenth element. The root of the decision tree is labeled by this pair -- 1 and 10. Assuming that all the elements are distinct (this really doesn't effect the analysis much), there are just two possible ways in which this comparison can turn out: element 1 is either larger or smaller than element 10. The root therefore has two children, each representing the future actions of the algorithm depending on what the comparison turns out to be. Likewise, at the left child of the root, the algorithm makes a comparison between some pair of elements. The left and right subtrees of this node represent the future actions of the algorithm depending on what the comparison turns out to be. The leaves of the decision tree represent the outcome of the algorithm -- after a number of comparisons, the algorithm decides what the right ordering of the elements should be. For example, suppose that the list contains three elements A, B and C, then the following is a valid algorithm for sorting the list, displayed in the form of a decision tree. A>B? Yes No A>C? B>C? Yes No Yes No B>C? CAB A>C? CBA Yes No Yes No ABC ACB BAC BCA This tree has 6 leaves, one for each possible ordering of the three elements. In order for an algorithm to be correct, the corresponding decision tree must have at least n! leaves, one for each possible ordering of the list. The worst case time complexity of the algorithm is simply the longest possible number of comparisons it makes on any list. In terms of the corresponding decision tree, this is the longest path from the root to any leaf, or the height (depth) of the tree. A binary tree with n! leaves must have a height of at least log n!. Therefore, we get a log n! lower bound on the running time of any comparison-based sorting algorithm. To get a better handle on the expression log n!, note that n! < n^n, but also, n! > (n/2)^(n/2). So, log n! > n/2 log (n/2) = Omega(n log n). ----------------------------------------------------------------------