Searching and Sorting ===================== Searching --------- problem: given an array of values (of size N) determine whether a given value v is there there are 2 approaches: 1. sequential search ----------------- look at each value in order, quit and return true if the current value is v; quit and return false after looking at all values for (k = 0; k < N; k++) if (A[k] == v) return true; return (false); Note: if values are in sorted order, can sometimes quit early when v is not in the array: for (k=0, k < N; k++) { if (A[k] == v) return true; if (A[k] > v) return false; } return (false); worst-case time is always O(N) 2. binary search (values must be in sorted order) ------------- look at middle item x if x == v return true else eliminate half the array repeat until v found or entire array eliminated low = 0; high = N - 1; while (low <= high) { mid = (Low + High)/2; if (A[mid] == v) return true; if (A[mid] > v) High = mid - 1; else Low = mid + 1; } return false; worst-case time = O(log N) ^ the # of times N can be divided in half | before there is nothing left Note: binary search in an array is basically the same as Lookup in a perfectly balanced binary-search tree Sorting ------- problem: given an array A of size N of values arrange the values in sorted order o most sorting algorithms involve comparing values o the "obvious" algorithms are O(N^2) o the "clever" ones are O(N log N) <-- i.e., N times (log N) o it is not possible to have a comparison sort with a worst-case time better than O(N log N) If values are all in some fixed range, then it is possible to avoid comparisons (and get a time better than N log N). Example: values all ints in range 0 to 100 o use an array of size 101 to record how many of each value there is in array A o overwrite A with sorted values: // initialize auxiliary array for (k = 0; k <= 100, k++) tmp[k] = 0; // record # of occurrences of each value for (k = 0; k < N; k++) tmp[ A[k] ]++; // overwrite A with sorted values index = 0; for (k = 0; k <= 100; k++) { for (j = 0; j < tmp[k]; j++) { A[index] = k; index ++; } } --------------------- A: | 3 | 5 | 3 | 2 | 1 | --------------------- tmp: -------------------------- | 0 | 1 | 1 | 2 | 0 | 1 | ... -------------------------- ^ ^ | |___ how many 1's there are in A |__ how many 0's there are in A --------------------- over-written A: | 1 | 2 | 3 | 3 | 5 | --------------------- Time = N + # of possible values if N is much larger than # values, this is a win! comparison sorts ---------------- o best possible worst-case time is O(N log N) o many comparison sorting algorithms take time O(N^2) o interesting issues: does an algorithm always take its worst-case time? what happens on an already-sorted array? o we will discuss 4 algorithms: l. selection sort worst-case O(N^2) 2. insertion sort worst-case O(N^2) 3. quick sort worst-case O(N^2) but expected time O(N log N) 4. merge sort worst-case O(N log N) o actual code will assume that the values are ints, but any type that allows comparisons would work 1. Selection Sort -------------- Idea: o find the smallest value in A; put it in A[0] o find the 2nd smallest value in A; put it in A[1] o etc. Approach: o use one loop from 0 to N-1 (tells which position in A to fill) o loop invariant: after k iterations A[0] through A[k-1] contain their final values so after N iterations, A[0] through A[N-1] contain their final values and we're done! o each time around, use a nested loop to find the smallest value (and its index) in the unsorted part of the array o swap that value with A[k] void SelectionSort(int A[ ], int N) { int j, k, min, minIndex; for (k = 0; k < N; k++) { min = A[k]; minIndex = k; for (j = k+1; j < N; j++) { if (A[j] < min ) { min = A[j]; minIndex = j; } } Swap(A[k], A [minIndex]); } } Time for Selection Sort o inner loop executes a different # of times each time around outer loop, so can't just multiply ... 1st iteration of outer loop: inner executes N - 2 times 2nd " : " " N - 3 times . . . Nth " " : " " 0 times 0 + 1 + 2 + ... + N - 2 is O(N^2) o Note: selection sort is always O(N^2) (it won't do any swaps if array is already sorted, but still looks at values O(N^2) times)