When you declare a variable with a reference type, you get space for
a reference (pointer) to that type, not for the type itself.
You must use "new" to get space for the type itself.
This is illustrated below.
Remember that class objects are also reference types.
For example, if you declare a variable of type Sequence, you
only get space for a pointer to a sequence; no actual sequence exists
until you use "new".
This is illustrated below, assuming the array implementation of sequences,
and assuming that the Sequence constructor initializes the
items array to be of size 3.
Note that because it is an array of Objects, each array element is
(automatically) initialized to null (shown using a diagonal line
in the picture).
An important consequence of the fact that non-primitive types are really
pointers is that assigning from one variable to another can cause
aliasing (two different names refer to the same object).
For example:
Note that in this example, the assignment to B[1] changed not only that value,
but also the value in A[1] (because A and B were pointing to the same
array)!
However, an assignment to B itself (not to an element of the array pointed
to by B) has no effect on A:
A similar situation arises when a non-primitive value is passed as an
argument to a method.
For example, consider the following 4 statements, and the definition
of method changeArray:
Note that the method call causes A and X to be aliases (they both point
to the same array).
Therefore, the assignment X[1] = 10 changes both X[1]
and A[1].
However, the assignment X = new int[2] only changes X, not A,
so when the call to changeArray finishes, the value of A[1] is
still 10, not 0.
For each line of the code shown below, draw the corresponding
conceptual picture.
Note that a linked list consists of one or more nodes.
Each node contains some data (in this example, item 1, item 2,
etc) and a pointer.
For each node other than the last one, the pointer points to the next
node in the list.
For the last node, the pointer is null (indicated in the example using
a diagonal line).
To implement linked lists in Java, we will define a Listnode
class, to be used to represent the individual nodes of the list.
Since we are defining the Listnode class just for the use of
the Sequence class, we will put the Listnode class
definition in the same file as the Sequence class definition
(which means that we will not make Listnode a public
class). Also, we will not make the Listnode methods and fields
either public or private. This means that the methods and fields will
have the default access, package access, so they will be accessible
to any code in the same package, which includes the Sequence class
code (which is in the same file).
To understand this better, consider writing code to create a linked
list with two nodes, containing "ant" and "bat", respectively, pointed
to by a variable named L.
First we need to declare variable L;
here's the declaration together with a picture showing what we have so far:
To make L point to the first node of the list, we need to use new to
allocate space for that node.
We want its data field to contain "ant" and (for now) we
don't care about
its next field, so we'll use the 1-argument Listnode
constructor (which sets the next field to null):
To add the second node to the end of the list we need to create the new node
(with "bat" in its data field and null in its next field), and we need to
set the next field of the first node to point to the new one:
Assume that the list shown above (with nodes "ant" and "bat") has been
created.
Question 1.
Question 2.
And here's the code:
Note that it is vital to first copy the value of n.next
into tmp.next (step 2(a)) before setting n.next to point to
the new node (step 2(b)).
If we set n.next first, we would lose our only pointer to the
rest of the list after node n!
Also note that, in order to follow the steps shown in the picture above,
we needed to use variable tmp to create the new node
(in the picture, step 1 shows the new node just "floating" there, but
that isn't possible -- we need to have some variable point to it so that
we can set its next field, and so that we can set n.next to
point to it).
However, we could in fact accomplish steps 1 and 2 with a single
statement that creates the new node, fills in its data and
next fields, and sets n.next to point to the new node!
Here is that amazing statement:
Draw pictures like the ones given above, to illustrate what happens
when node n is the last node in the list.
Does the statement
Now consider the worst-case running time for this add operation.
Whether we use the single statement or the sequence of three statements,
we are really doing the same thing:
Note that the fact that n.next is still pointing to a node
in the list doesn't matter -- n has been removed from the list,
because it cannot be reached from L.
It should be clear that in order to implement the remove operation, we
first need to have a pointer to the node before node n.
The only way to get to that node is to start at the beginning of the
list.
We want to keep moving along the list as long as the current node's
next field is not pointing to node n.
Here's the appropriate code:
Note that this kind of code (moving along a list until some condition holds)
is very common.
For example, similar code would be used to implement a lookup
operation on a linked list (an operation that determines whether there
is a node in the list that contains a given piece of data).
Note also that there is one case when the code given above will not
work.
When n is the very first node in the list, the picture is
like this:
In this case, the test (tmp.next != n) will always be false, and
eventually we will "fall off the end" of the list (i.e., tmp will
become null, and we will get a runtime error when we try to dereference
a null pointer).
We will take care of that case in a minute;
first, assuming that n is not the first node in the list, here's the
code that removes n from the list:
How can we test whether n is the first node in the list, and
what should we do in that case?
If n is the first node, then L will be pointing to it,
so we can test whether L == n.
The following before and after pictures illustrate removing node n
when it is the first node in the list:
Here's the complete code for removing node n from a linked list,
including the special case when n is the first node in the list:
What is the worst-case running time for this remove operation?
If node n is the first node in the list, then we simply
change one field (L.next).
However, in the general case, we must traverse the list to find the
node before n, and in the worst case (when n is the
last node in the list), this requires time proportional to
the number of nodes in the list.
Once the node before n is found, the remove operation involves
just one assignment (to the next field of that node), which
takes constant time.
So the worst-case time running time for this operation on a list with N
nodes is O(N).
Note that
if your linked lists do include a header node, there is no need for the
special case code given above for the remove operation;
node n can never be the first node in the list, so there
is no need to check for that case.
Similarly, having a header node can simplify the code that adds a node
before a given node n.
Note that if you do decide to use a header node, you must remember
to initialize an empty list to contain one (dummy) node, you must
remember not to include the header node in the count of "real" nodes in the
list (e.g., if you implement a size operation), and you must
remember to ignore the header node in operations like lookup.
Look back at the definition of
the Sequence class in
the first set of notes and think about which fields and/or methods
need to be changed before reading any further.
Clearly, the type of the items field needs to change, since the
items will no longer be stored in an array.
Instead, we will need to maintain a pointer to the first node in the list,
so the new declaration will be:
Given these two changes, let's think again about the three Sequence methods
that were discussed assuming the array implementation:
We have already discussed how to add a new node to a linked list following
a given node.
The only question is how best to handle adding a new node at the end
of the list.
A straightforward approach would be to traverse the list, looking for
the last node (i.e., use a variable tmp as was done above
in the code that looked for the node before node n).
Once that node is found, the new node can be inserted immediately after it.
The disadvantage of this approach is that it requires O(N) time to add a
node to the end of a sequence with N items.
An alternative is to add a lastNode field (often called a
tail pointer) to the Sequence class,
and to implement the methods that modify the linked list so that
lastNode always points to the last node in the list (or is null
if the list is empty).
There is more opportunity for error (since several methods will need to
ensure that the lastNode field is kept up to date), but the
use of the lastNode field will mean that the worst-case running time
for addAfter is always O(1).
Here's a picture of the "ant, bat, cat" sequence, when "bat" is the current
item, and the implementation includes a lastNode pointer:
To ensure O(1) worst-case running time, we could add a beforeCurrent
field to the Sequence class;
this field would be a pointer that always points to the node in the list
just before the current node.
If this field is maintained,
method removeCurrent could use it without having to traverse
the list, and the time for removeCurrent would always be O(1).
Here's a picture of the "ant, bat, cat" sequence, when "bat" is the
current item, and both lastNode and beforeCurrent
pointers are maintained:
Note that (unless you also include a header node), special-case code is
still needed when the node to be removed is the first node in
the list.
Of course, before deciding to adopt the beforeCurrent field, you
should think about which methods would need to be modified to maintain that
field, and whether that maintenance would make any of the other methods
significantly less efficient.
If so, you would need to decide whether the trade-off (a more efficient
removeCurrent method versus some other, less efficient method)
is worthwhile.
In terms of ease of implementation, straightforward implementations of both
the array and linked-list versions seem reasonably easy.
However, the methods for the linked-list version seem to require more
special cases, and
achieving O(1) times for adding and removing items in the
linked-list version requires maintaining extra pointers (to the last node
and to the node before the current one).
Assume that sequences are implemented using linked lists with just a
pointer to the first and current nodes in the list (no additional pointers),
and with no header node.
How much time is required (using Big-O notation) to remove the first
item from a sequence? to remove the last item from a sequence?
How do these times compare to the times required for the same
operations when the sequence is implemented using an array?
Another way to fix the problem is to use a doubly linked list.
Here's the conceptual picture:
Each node in a doubly linked list contains three fields:
the data, and two pointers. One pointer points to the previous node
in the list, and the other pointer points to the next node in the list.
The previous pointer of the first node, and the next pointer of the
last node are both null.
Here's the Java class definition for a doubly linked list node:
To remove a given node n from a doubly linked list,
we need to change the prev field of the node to its right,
and we need to change the next field of the node to its left,
as illustrated below.
Here's the code for removing node n:
Introduction
The first set of notes discussed how to implement the Sequence class
using an array to store the items in the sequence.
Here we discuss how to implement the Sequence class using a
linked-list to store the items.
However, before talking about linked lists, we will review the difference
between primitive and non-primitive types in Java.
Java Types
Java has two "categories" of types:
When you declare a variable with a primitive type, you get enough space
to hold a value of that type.
Here's some code involving a primitive type, and the corresponding
conceptual picture:
1. int[] A = new int[3];
2. A[1] = 6;
3. changeArray(A);
4. System.out.print(A[1]);
5. public static void changeArray(int[] X) {
6. X[1] = 10;
7. X = new int[2];
}
The picture below illustrates what happens when this code executes.
int [] X = new int[2];
X[0] = 1;
int [] Y = new int[3];
Y[0] = 2;
Y = X;
Y[0] = 3;
Intro to Linked Lists
Here's a conceptual picture of a linked list containing N items,
pointed to by a variable named L:
class Listnode {
// fields
Object data;
Listnode next;
// methods
// 2 constructors
Listnode(Object d) {
this(d, null);
}
Listnode(Object d, Listnode n) {
data = d;
next = n;
}
}
Note that the next field of a Listnode is itself of type Listnode.
That works because in Java, every non-primitive type is really a
pointer;
so a Listnode object is really a pointer that is either null or points
to a piece of storage (allocated at runtime) that consists of two fields
named data and next.
Write code to change the contents of the second node's data field from
"bat" to "cat".
Write code to insert a new node with "rat" in its data field
between the two existing nodes.
Linked List Operations
Before thinking about how to implement sequences using linked lists,
let's consider some basic operations on linked lists:
Adding a node
Assume that we are given:
and that the goal is to add a new node containing newdat immediately
after n. To do this we must perform the following steps:
Step 1: create the new node using the given data
Step 2: "link it in":
Here's the conceptual picture:
(a) make the new node's next field point to whatever
n's next field was pointing to
(b) make n's next field point to the new node.
Listnode tmp = new Listnode(newdat); // Step 1
tmp.next = n.next; // Step 2(a)
n.next = tmp; // Step 2(b)
n.next = new Listnode(newdat, n.next); // steps 1, 2(a), and 2(b)
n.next = new Listnode(newdat, n.next);
still work correctly?
We will assume that storage allocation via new takes constant time.
Setting the values of the three fields also takes constant time,
so the whole operation is a constant-time (O(1)) operation.
In particular, the time required to add a new node immediately after
a given node is independent of
the number of nodes already in the list.
Removing a node
To remove a given node n from a linked list, we need to
change the next field of the node that comes immediately
before n in the list to point to whatever n's
next field was pointing to.
Here's the conceptual picture:
Listnode tmp = L;
while (tmp.next != n) tmp = tmp.next; // find the node before n
Listnode tmp = L;
while (tmp.next != n) tmp = tmp.next; // find the node before n
tmp.next = n.next; // remove n from the linked list
if (L == n) {
// special case: n is the first node in the list
L = n.next;
}
else {
// general case: find the node before n, then "unlink" n
Listnode tmp = L;
while (tmp.next != n) tmp = tmp.next;
tmp.next = n.next;
}
Using a header node
There is an alternative to writing special-case code to handle removing
the first node in a list.
That alternative is to use a header node: a dummy node at the
front of the list that is there only to reduce the need for special-case
code in the linked-list operations.
For example, the picture below shows how the list "ant", "bat", "cat", would
be represented using a linked list without and with a header node:
The Sequence Class
Now let's consider what changes need to be made to the definition of
the Sequence class in order to change from the array implementation to the
linked-list implementation.
Remember, we only want to change the implementation (the "internal"
part of the sequence abstract data type), not the interface (the
"external" part of the abstract data type).
That means that the signatures of the public methods will not
change;
nor will the descriptions of what those methods do.
The only thing that will change is how the sequence is represented,
and how the methods are implemented.
private Listnode items; // pointer to the first node in the list of items
What about the current field?
We could still implement it using an integer;
however, that would make some operations less efficient.
For example, recall that adding a node to a linked list after a given
node n takes O(1) time.
However, if we implement current as an integer,
the code for addAfter would first have to
advance a pointer down the list current times, in order
to have a pointer to the node after which the new node should be added.
That would make its worst-case running time O(N) instead of O(1) for
a list with N nodes.
Therefore, a better choice is to implement the current
field as a Listnode (a pointer to the node that contains the
current item).
In that case, here's a picture of how the sequence "ant, bat, cat", with
current item "bat", would be represented:
In the discussion below, we will assume that the linked list used to
store the items in the sequence does not have a header node.
The Sequence constructor
The Sequence constructor needs to initialize the three Sequence fields:
so that the sequence is empty.
The items field should be initialized to represent an empty
linked list;
i.e., it should be initialized to null.
As for the array implementation, the current field needs to
be implemented so that methods isCurrent, advance,
getCurrent, and removeCurrent can tell whether there is
a current item in the sequence.
The only reasonable test those methods can make to determine whether there
is a current item is to compare current with null (a non-null
value means that there is a current item, and null means that there
is no current item).
So current should also be initialized to null to indicate
that there is initially no current item.
Finally, numItems should be initialized, as before, to zero.
addAfter
Recall that method addAfter adds a given value after the current
item if there is a current item, and otherwise adds the value at the end of
the sequence.
In both cases, the newly added item becomes the current item.
removeCurrent
Method removeCurrent can be implemented using the
code given
above for removing a given node from a linked list (plus changing the
current pointer to point to the node after the one that was removed, or
to be null if the removed node was the last node in the list).
The only problem is that it requires O(N) time in the worst case to
remove an item from a sequence with N items (because it is first necessary
to locate the node before the one to be removed).
Comparison: Sequences via Arrays versus via Linked Lists
When comparing the sequence implementations using linked lists and using
arrays, we should consider:
In terms of space, each implementations has its advantages and disadvantages:
In terms of time:
Linked List Variations
There are several variations on the basic idea of linked lists.
Here we will discuss two of them:
Doubly linked lists
Recall that, given (only) a pointer to a node n in a linked list with
N nodes, removing node n takes time O(N) in the worst case,
because it is necessary to traverse the list looking for the node
just before n.
One way to fix this problem is to require two pointers: a pointer
the the node to be removed, and also a pointer to the node just before
that one.
class DblListnode {
// fields
DblListnode prev;
Object data;
DblListnode next;
// methods
// 3 constructors
DblListnode() {
this(null, null, null);
}
DblListnode(Object d) {
this(null, d, null);
}
DblListnode(DblListnode p, Object d, DblListnode n) {
prev = p;
data = d;
next = n;
}
}
// Step 1: change the prev field of the node after n
n.next.prev = n.prev;
// Step 2: change the next field of the node before n
n.prev.next = n.next
Unfortunately, this code doesn't work (causes an attempt to dereference
a null pointer) if n is either the first or the last node in
the list.
We can add code to test for these special cases, or we can use a
circular, doubly linked list, as discussed below.