Consider searching for a given value v in an array of size N.
There are 2 basic approaches: sequential search and
binary search.
Sequential search involves looking at each value in turn (i.e., start
with the value in array[0], then array[1], etc).
The algorithm quits and returns true if the current value
is v; it quits and returns false if it has looked at all of the values in
the array without finding v.
Here's the code:
If the values are in sorted order, then the algorithm can sometimes
quit and return false without having to look at all of the values in the array:
v is not in the array if the current value is greater than v.
Here's the code for this version:
The worst-case time for a sequential search is always O(N).
When the values are in sorted order, a better approach than the
one given above is to use binary search.
The algorithm for binary search starts by looking at the middle item x.
If x is equal to v, it quits and returns true.
Otherwise, it uses the relative ordering of x and v to eliminate half
of the array (if v is less than x, then it can't be stored to the
right of x in the array;
similarly, if it is greater than x, it can't be stored to the left of x).
Once half of the array has been eliminated, the algorithm starts again
by looking at the middle item in the remaining half.
It quits when it finds v or when the entire array has been eliminated.
Here's the code for binary search:
The worst-case time for binary search is proportional to log2 N:
the number of times N can be divided in half before there is nothing left.
Using big-O notation, this is O(log N).
Note that binary search in an array is basically the same as doing a
lookup in a perfectly balanced binary-search tree (the root of a
balanced BST is the middle value).
In both cases, if the current value is not the one we're looking for,
we can eliminate half of the remaining values.
Why isn't it a good idea to use binary search to find a value in a
sorted linked list of values?
Consider sorting the values in an array A of size N.
Most sorting algorithms involve what are called comparison sorts;
i.e., they work by comparing values.
Comparison sorts can never have a worst-case running time less than O(N log N).
Simple comparison sorts are usually O(N2);
the more clever ones are O(N log N).
Three interesting issues to consider when thinking about different
sorting algorithms are:
We will discuss four comparison-sort algorithms:
The idea behind selection sort is:
Here's the code for selection sort:
And here's a picture illustrating how selection sort works:
What is the time complexity of selection sort?
Note that the inner loop executes a different number of times each time
around the outer loop, so we can't just multiply N * (time for inner loop).
However, we can notice that:
What if the array is already sorted when selection sort is called?
It is still O(N2); the two loops still execute the same
number of times, regardless of whether the array is sorted or not.
It is not necessary for the outer loop to go all the way from 0 to N-1.
Describe a small change to the code that avoids a small amount of unnecessary
work.
Where else might unnecessary work be done using the current code?
(Hint: think about what happens when the array is already sorted initially.)
How could the code be changed to avoid that unnecessary work?
Is it a good idea to make that change?
The idea behind insertion sort is:
Here's the code:
Here's a picture illustrating how insertion sort works on the same array
used above for selection sort:
What is the time complexity of insertion sort?
Again, the inner loop can execute a different number of times for every
iteration of the outer loop. In the worst case:
As mentioned above, merge sort takes time O(N log N), which is quite a
bit better than the two O(N2) sorts described above (for example,
when N=1,000,000, N2=1,000,000,000,000, and N log2 N
= 20,000,000;
i.e., N2 is 50,000 times larger than N log N!).
The key insight behind merge sort is that it is possible to
merge two sorted arrays, each containing N/2 items to form one
sorted array containing N items in time O(N).
To do this merge, you just step through the two arrays, always choosing
the smaller of the two values to put into the final array (and only advancing
in the array from which you took the smaller value).
Here's a picture illustrating this merge process:
Now the question is, how do we get the two sorted arrays of size N/2?
The answer is to use recursion; to sort an array of length N:
An outline of the code for merge sort is given below.
It uses an auxiliary method with extra parameters that tell what part
of array A each recursive call is responsible for sorting.
Fill in the missing code in the mergeSort method.
Algorithms like merge sort -- that work by dividing the problem in
two, solving the smaller versions, and then combining the solutions --
are called divide and conquer algorithms.
Below is a picture illustrating the divide-and-conquer aspect of merge sort
using a new example array.
The picture shows the problem being divided up into smaller and smaller
pieces (first an array of size 8, then two halves each of size 4, etc).
Then it shows the "combine" steps: the solved problems of half size
are merged to form solutions to the larger problem.
(Note that the picture illustrates the conceptual ideas -- in an actual
execution, the small problems would be solved one after the other, not
in parallel.
Also, the picture doesn't illustrate the use of auxiliary arrays during the
merge steps.)
To determine the time for merge sort, it is helpful to visualize the calls
made to mergeAux as shown below (each node represents
one call, and is labeled with the size of the array to be sorted by that call):
The height of this tree is O(log N).
The total work done at each "level" of the tree (i.e., the work done by
mergeAux excluding the recursive calls) is O(N):
What happens when the array is already sorted (what is the running time
for merge sort in that case)?
Searching
Sequential Search
public static boolean sequentialSearch(Object[] A, Object v) {
for (int k = 0; k < A.length; k++) {
if (A[k].equals(v)) return true;
}
return false;
}
public static boolean sortedSequentialSearch(Comparable[] A, Comparable v) {
// precondition: A is sorted (in ascending order)
for (int k = 0; k < A.length; k++) {
if (A[k].equals(v)) return true;
if (A[k].compareTo(v) > 0) return false;
}
return false;
}
Binary Search
public static boolean binarySearch(Comparable[] A, Comparable v) {
// precondition: A is sorted (in ascending order)
return binarySearchAux(A, 0, A.length - 1, v);
}
private static boolean binarySearchAux(Comparable[] A, int low, int high, int v) {
// precondition: A is sorted (in ascending order)
// postcondition: return true iff v is in an element of A in the range
// A[low] to A[high]
if (low > high) return false;
int middle = (low + high) / 2;
if (A[middle].equals(v)) return true;
if (v.compareTo(A[middle]) < 0) {
// recursively search the left part of the array
return binarySearchAux(A, low, middle-1, v);
}
else {
// recursively search the right part of the array
return binarySearchAux(A, middle+1, high, v);
}
}
Sorting
Selection sort and insertion sort have worst-case time O(N2).
Quick sort is also O(N2) in the worst case, but its expected
time is O(N log N).
Merge sort is O(N log N) in the worst case.
Selection Sort
The approach is as follows:
Note that after i iterations, A[0] through A[i-1] contain their final
values (so after N iterations, A[0] through A[N-1] contain their final
values and we're done!)
public static void selectionSort(Comparable[] A) {
int j, k, minIndex;
Comparable min;
int N = A.length;
for (k = 0; k < N; k++) {
min = A[k];
minIndex = k;
for (j = k+1; j < N; j++) {
if (A[j].compareTo(min) < 0) {
min = A[j];
minIndex = j;
}
}
A[minIndex] = A[k];
A[k] = min;
}
}
This is our old favorite sum:
N-1 + N-2 + ... + 3 + 2 + 1 + 0
which we know is O(N2).
Insertion Sort
As for selection sort, a nested loop is used;
however, a different invariant holds: after the ith time around the outer loop,
the items in A[0] through A[i-1] are in order relative to each other (but are
not necessarily in their final places).
Also, note that in order to insert an item into its place in the (relatively)
sorted part of the array, it is necessary to move some values to the right
to make room.
public static void insertionSort(Comparable[] A) {
int k, j;
Comparable tmp;
int N = A.length;
for (k = 1; k < N, k++) {
tmp = A[k];
j = k - 1;
while ((j > = 0) && (A[j].compareTo(tmp) > 0)) {
A[j+1] = A[j]; // move one value over one place to the right
j--;
}
A[j + 1] = tmp; // insert kth value in correct place relative to previous
// values
}
}
So we get:
1 + 2 + ... + N-1
which is still O(N2).
Merge Sort
The base case for the recursion is when the array to be sorted is of
length 1 -- then it is already sorted, so there is nothing to do.
Note that the merge step (step 4) needs to use an auxiliary array (to avoid
overwriting its values).
The sorted values are then copied back from the auxiliary array to the
original array.
public static void mergeSort(Comparable[] A) {
mergeAux(A, 0, A.length - 1); // call the aux. function to do all the work
}
private static void mergeAux(Comparable[] A, int low, int high)
{
// base case
if (low == high) return;
// recursive case
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high) / 2;
// Steps 2 and 3: Sort the 2 halves of A
mergeAux(A, low, mid);
mergeAux(A, mid+1, high);
// Step 4: Merge sorted halves into an auxiliary array
Comparable[] tmp = new Comparable[high-low+1];
int left = low; // index into left half
int right = mid+1; // index into right half
int pos = 0; // index into tmp
while ((left <= mid) && (right <= high)) {
// choose the smaller of the two values "pointed to" by left, right
// copy that value into tmp[pos]
// increment either left or right as appropriate
// increment pos
...
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to tmp
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) { ... }
while (right <= high) { ... }
// all values are in tmp; copy them back into A
arraycopy(tmp, 0, A, low, tmp.length);
}
Therefore, the time for merge sort involves
O(N) work done at each "level" of the tree that represents the recursive calls.
Since there are O(log N) levels, the total worst-case time is O(N log N).