Arrays that can change their sizeIntroduction
Despite the fact that there are several different kinds of data structures, most of them allow for similar actions to be performed on them. Generally, we want to be able to add and remove data to our structures, as well as having the ability to access data and search. Furthermore, especially in Java, there are many features that we want our data structures to follow. First of all, we want them to be able to hold any type of data. Quite often in Java, we write our structures to hold things of type Object. Because everything is-an Object, we can perform these upcasts for free. We just have to remember to perform the downcast when we get our data out of the structure. It will be attempted throughout the semester to leave our discussion of the data structures as non-Java specific as possible. A second feature we would like to see is that our structure can hold as many pieces of data without having to worry running out of room: that is, we should be able to add without having to worry about overflow. A similar error to overflow is known as underflow which occurs when we try to access an element in an empty data structure. A Dynamic Array
BackgroundOur first data structure is referred to as a dynamic array: an array which automatically changes the number of elements it can hold whenever the structure is too full. In Java, this sort of structure is referred to as a Vector. These are also commonly referred to as a bag. We will see how to implement this sort of structure, as well look at the operation times for the various activities we have to perform.We begin by describing the actions we can perform on our dynamic array. We start by wanting the ability to add data anywhere in the array. This method will take an element to be inserted, as well as the index at which to put the element. As a corollary, we should be able to remove the data anywhere from the array. The signature for this method will include one parameter (the index to be removed), and one return type (the element which is removed). We also want to be able to search the array, based upon an item to search for as a parameter. The index the element is at in the array will be the return value. The value -1 will be returned if the the item is not in the array. Finally, we want to be able to access elements by using the get method, which will take an index value and return the value at the specific index.
To build this structure, we need several data values. First of all,
we will need an array to hold all of our data. This array, which we
will call We further design our code so that there are no holes in the array. This means removing an element does not leave a gap: all elements following the removal are shifted back one spot. Similarly, when adding an element, we do not simply overwrite existing data. Instead, we shift all elements over one spot, creating a hole at the indicated index, and then plug this hole with the data to be inserted.
Below I illustrate some of the functionality of the DynamicArray, by
using an instance of a DynamicArray, called
ImplementationWe now discuss how to actually implement the dynamic array. First, we discuss the data members we will need:get :
get(index) if size == 0 error: underflow if index < 0 or index >= size error: index out of bounds return A[index]Our first if statement checks to make sure there is actually a value in the array. If not, we issue an underflow error. Because we survived the first check, our second check knows that there are values in the array, but we wish to make sure that the user is trying to access a value which is within the range of the array. If not, issue an error, notifying the user they are trying to do something illegal. It is important to note that we compare the index with the size and not the capacity . Once we have
gotten this far, we know we have a valid index, so we simply go ahead
and return the value at this index.
Our next method is the search method. We can make no assumptions about the ordering of the data in the array, so we must perform a linear search. That is, we have to examine each element in turn looking for the specified item. If the item is found, return the index value of where it is found. If it is not found, return -1 as an indicator:
search(item) for i = 0 to size - 1 if A[i] == item return i return -1 This method scans the entire array for the item. As soon as it is found, we return the index. If after scanning the entire array we haven't found the item and returned, we will exit the for loop and return -1. None of the two above methods were terribly exciting. Let's liven things up with the add method:
add(item, index) // verify the index is within range if index < 0 or index > size error: index out of bounds // make sure there is room for our new item if size == capacity expandArray() // copy all following elements over for i=size down to index+1 A[i] = A[i-1] // update the items in our array and the size A[index] = item size = size + 1 The major theme for the add method is to make sure that we have room for our data at the given index. The first if statement checks to make sure that the location the value is to be inserted at is within a valid range. Note that it is possible to insert an item at index = size: this is simply making the new item the last thing in the array. We next make sure that we have room to insert our new item: if the size and the capacity are equal, we have to make room for our new item. We do this by expanding A. This expansion does not change the ordering of the values in A in any way: after the expansion, all elements are at the same index they used to be. We have simply doubled the size of the array. Code for this is given below.
expandArray() Create an array of capacity 2*capacity, tmp for i=0 to size-1 tmp[i] = A[i] A = tmpOur next step in the add method, the for loop, shifts the appropriate items in the array over. This is necessary because we do not overwrite the data, we simply make room for it. The values we want to shift are the values at indices [index, size-1]. The ending locations for these values are [index + 1, size]. (Wouldn't it be nice if there was a way to grab a chunk of memory and simply move it somewhere else?). You should verify that the for loop is correct. We then insert our new element and update the size of our dynamic array. Finally, we discuss the problem of removing an element from our dynamic array. The parameter will be the index of the value we want removed. If we simply delete the item, we create holes in our array. These holes have to be compacted by shifting all elements from index+1 to size-1 one spot to the "left".
remove(int index) oldValue = get(index) for i=index to size-2 A[i] = A[i+1] A[size - 1] = null size = size - 1Because we are going to return the removed value, we have to make sure to grab that item first. At first glance, it may appear that we are not doing our underflow and out of range checking that we did in other methods. In fact, we let the get method handle that for us. Once we have the value, we shift all of our other values over. We next set the former last value to null: we prefer to not have duplicate references lying around, allowing the garbage collector can do its job. Finally, we have to remember to decrease the size by one. It is important to note that remove does not change the capacity of the array at all. Obviously, it is possible to create more methods than the ones I have listed above. For instance, it might be useful to have a set(...) method which changes a value at an index, or a method called trim(...) which changes the capacity of the array to match the size of the array. Time AnalysisAs the methods defined, there is only one operation that can be done in constant time: the get(...) method simply accesses and returns a value. The search method has to look through all of the elements in the array, which takes O(n) time. Performing the shifts in the add and remove methods are also O(n): on the average, we have to shift half of the elements over, which takes n/2 time, corresponding to O(n) performance.Typical UsageAlthough we allowed the user to add values wherever they wanted, this is not how the class would be typically used. More standard use is to either always insert the values at the end, or to keep the data in sorted order.The first is an example of what is typically done when you are trick-or-treating. When you go to a house and get your candy, you don't put your treat in any particular spot in your bag: you just throw it in there. When you want to get a peanut butter cup, you paw through all of the sweets in your bag until you find one. Then you take it out and eat it. It is straight forward to modify the code to reflect this sort of behavior. The only method it becomes necessary to rewrite is add:
add(item) if size == capacity expandArray() A[size] = item size = size + 1Because we have eliminated all of the shifts, the running time of the new add is now usually O(1). I use the qualifier "usually" because there is one exception: when performing an add to a full array, it may be necessary to expand the array. It takes O(n) time to make the expansion. However, we only have to perform this expansion every n adds. Therefore, we can think of the expansion as being spread out amongst all of the n adds prior to it, which still leaves us at O(1) time. This "spreading out" concept is known as amortization. To remove an item, we generally have to search for it, and then use the returned index to remove the value. This set of operations could be combined into a new method: one which takes an item to be removed, instead of an index. We write it using a combination of existing methods:
removeItem(item) index = search(item) if index == -1 return false remove(index) return true Sorted arraysQuite often, rather than throwing our data anywhere, we want to keep it sorted. Perhaps a real world example will help motivate things:Its likely many of you see or even own a CD binder: a soft sided binder which holds CDs. Many people like to put their CDs into this binder in one of two ways. Some may choose to simply add their CDs to the end of the binder. Then, when they want to find a CD, they have to flip through the entire book. Other people like to keep their CDs in sorted order. That way, they can find their CD quickly because they have a better idea where to look. The problem with keeping things sorted, though, is that shifting the locations of all of these CDs can be pretty annoying. Quite often, though, it may be advantageous to use this sorted array. You can keep the array in sorted order by using a technique similar to that given in insertion sort when performing your add. This takes O(n) time, which is worse than the O(1) time it took to perform our addition with the modified bag. The advantage comes when we a search: if the array is in sorted order, we can use binary search to find the value for which we are looking. Thus, we can find an element in O(log n) time rather than the O(n) time it took to do a lookup in our bag. Thus, if we perform a lot more lookups than we do insertions, the sorted array may be better than the unsorted bag because of the advantage we gain with the faster lookup. |