Array based data structures

Arrays that can change their size

Introduction

Despite the fact that there are several different kinds of data structures, most of them allow for similar actions to be performed on them. Generally, we want to be able to add and remove data to our structures, as well as having the ability to access data and search.

Furthermore, especially in Java, there are many features that we want our data structures to follow. First of all, we want them to be able to hold any type of data. Quite often in Java, we write our structures to hold things of type Object. Because everything is-an Object, we can perform these upcasts for free. We just have to remember to perform the downcast when we get our data out of the structure. It will be attempted throughout the semester to leave our discussion of the data structures as non-Java specific as possible.

A second feature we would like to see is that our structure can hold as many pieces of data without having to worry running out of room: that is, we should be able to add without having to worry about overflow. A similar error to overflow is known as underflow which occurs when we try to access an element in an empty data structure.

A Dynamic Array

Background

dynamic array

Vector

We begin by describing the actions we can perform on our dynamic array. We start by wanting the ability to add data anywhere in the array. This method will take an element to be inserted, as well as the index at which to put the element. As a corollary, we should be able to remove the data anywhere from the array. The signature for this method will include one parameter (the index to be removed), and one return type (the element which is removed). We also want to be able to search the array, based upon an item to search for as a parameter. The index the element is at in the array will be the return value. The value -1 will be returned if the the item is not in the array. Finally, we want to be able to access elements by using the get method, which will take an index value and return the value at the specific index.

To build this structure, we need several data values. First of all, we will need an array to hold all of our data. This array, which we will call A will change in size when we need it to. To keep things simple, the expansion will simply double the size of the array. We will also need to keep track of the number of elements currently in the array (its size) and the total number of elements we can hold in the array, or capacity.

We further design our code so that there are no holes in the array. This means removing an element does not leave a gap: all elements following the removal are shifted back one spot. Similarly, when adding an element, we do not simply overwrite existing data. Instead, we shift all elements over one spot, creating a hole at the indicated index, and then plug this hole with the data to be inserted.

Below I illustrate some of the functionality of the DynamicArray, by using an instance of a DynamicArray, called d, which is initially empty (size == 0), and has a capacity of 3. We will be inserting letters into the array.

Action Result

d.add(b, 0)

d.add(z, 0)

d.add(a, 1)

d.remove(0)

returns: z

d.add(c, 2)

d.contains(b)

returns: 1

d.contains(z)

returns: -1

d.get(2)

returns: c

d.add(e, 3)

Implementation

An array, A which will hold all of our data
The number of elements currently in our array, which we refer to as the size
The number of elements we can store in A, aka the capacity

get

get(index) Input: the index of the value to get Returns: the value at the specified index

    if size == 0
        error: underflow
    if index < 0 or index >= size
        error: index out of bounds
    return A[index]

size

capacity

Our next method is the search method. We can make no assumptions about the ordering of the data in the array, so we must perform a linear search. That is, we have to examine each element in turn looking for the specified item. If the item is found, return the index value of where it is found. If it is not found, return -1 as an indicator:

search(item) Input: the item to find in the array Returns: the index of the first occurrence of item; -1 if the item is not found

    for i = 0 to size - 1
        if A[i] == item
            return i
    return -1

This method scans the entire array for the item. As soon as it is found, we return the index. If after scanning the entire array we haven't found the item and returned, we will exit the for loop and return -1.

None of the two above methods were terribly exciting. Let's liven things up with the add method:

add(item, index) Input: the value to be inserted, item; the location to insert the item, index Postcondition: Array A now contains item at index. All values formerly at locations index or greater have been shifted to an index one greater than their former index.

    // verify the index is within range
    if index < 0 or index > size
        error: index out of bounds
    // make sure there is room for our new item
    if size == capacity
        expandArray()
    // copy all following elements over
    for i=size down to index+1
        A[i] = A[i-1]
    // update the items in our array and the size
    A[index] = item
    size = size + 1

The major theme for the add method is to make sure that we have room for our data at the given index. The first if statement checks to make sure that the location the value is to be inserted at is within a valid range. Note that it is possible to insert an item at index = size: this is simply making the new item the last thing in the array. We next make sure that we have room to insert our new item: if the size and the capacity are equal, we have to make room for our new item. We do this by expanding A. This expansion does not change the ordering of the values in A in any way: after the expansion, all elements are at the same index they used to be. We have simply doubled the size of the array. Code for this is given below.

expandArray() Postcondition: The capacity of A has been doubled. Items in A are not reordered

    Create an array of capacity 2*capacity, tmp
    for i=0 to size-1
        tmp[i] = A[i]
    A = tmp

Finally, we discuss the problem of removing an element from our dynamic array. The parameter will be the index of the value we want removed. If we simply delete the item, we create holes in our array. These holes have to be compacted by shifting all elements from index+1 to size-1 one spot to the "left".

remove(int index) Input: the index of the value to be removed Returns: the item formerly at index Postcondition: The item at index has been removed and data formerly at locations greater than index have been shifted to compact the vacancy

    oldValue = get(index)
    for i=index to size-2
        A[i] = A[i+1]
    A[size - 1] = null
    size = size - 1

Obviously, it is possible to create more methods than the ones I have listed above. For instance, it might be useful to have a set(...) method which changes a value at an index, or a method called trim(...) which changes the capacity of the array to match the size of the array.

Time Analysis

Typical Usage

The first is an example of what is typically done when you are trick-or-treating. When you go to a house and get your candy, you don't put your treat in any particular spot in your bag: you just throw it in there. When you want to get a peanut butter cup, you paw through all of the sweets in your bag until you find one. Then you take it out and eat it.

It is straight forward to modify the code to reflect this sort of behavior. The only method it becomes necessary to rewrite is add:

add(item) Input: The new value to add to the array, item Postcondition: item has been added as the last value in the array

    if size == capacity
        expandArray()
    A[size] = item
    size = size + 1

amortization

To remove an item, we generally have to search for it, and then use the returned index to remove the value. This set of operations could be combined into a new method: one which takes an item to be removed, instead of an index. We write it using a combination of existing methods:

removeItem(item) Input: The value to be removed, item Returns: true if the item was found, false otherwise Postcondition: the first occurrence of item has been removed from the array, and the hole created by the removal has been filled

    index = search(item)
    if index == -1
        return false
    remove(index)
    return true

Sorted arrays

Its likely many of you see or even own a CD binder: a soft sided binder which holds CDs. Many people like to put their CDs into this binder in one of two ways. Some may choose to simply add their CDs to the end of the binder. Then, when they want to find a CD, they have to flip through the entire book. Other people like to keep their CDs in sorted order. That way, they can find their CD quickly because they have a better idea where to look. The problem with keeping things sorted, though, is that shifting the locations of all of these CDs can be pretty annoying.

Quite often, though, it may be advantageous to use this sorted array. You can keep the array in sorted order by using a technique similar to that given in insertion sort when performing your add. This takes O(n) time, which is worse than the O(1) time it took to perform our addition with the modified bag. The advantage comes when we a search: if the array is in sorted order, we can use binary search to find the value for which we are looking. Thus, we can find an element in O(log n) time rather than the O(n) time it took to do a lookup in our bag. Thus, if we perform a lot more lookups than we do insertions, the sorted array may be better than the unsorted bag because of the advantage we gain with the faster lookup.