Heap Management

Overview
Basic Techniques
- Operations on the Freelist
  - Techniques for allocation
  - Techniques for coalescing
- Freelists for Fixed-Size Chunks
Deallocation
- Problems with Explicit Deallocation
- Automatic Deallocation
Summary

Overview

In general, the heap is used for dynamically allocated objects. However, it might be used for other kinds of objects, too. For example, activation records might be allocated on the heap for a multi-threaded language, where calls and returns do not follow a stack protocol (i.e., a "return" is not necessarily from the most recently called subprogram, because the most recently called subprogram could be in one thread, while the return was in another).

Different languages use different syntax for the allocation of storage for dynamically created objects:

Pascal, C++, and Java use new.
C uses calls to special allocation functions: malloc, calloc, etc.

In some languages, deallocation is done by the programmer:

Pascal uses dispose,
C uses free,
C++ uses dispose.

In other languages (e.g., Java), deallocation is done "automatically" (not under the programmer's control): storage is reclaimed (for later reuse) when is it "dead"; i.e., when it is no longer accessible via some variable in the program.

We will first look at basic techniques for implementing the low-level operations on the heap (how to satisfy requests for storage, and what to do when storage is freed). Then will we consider some of the problems of programmer-controlled and of automatic deallocation. Finally, we will look at some different techniques for doing automatic deallocation.

Basic Techniques

Available storage is managed using a free list: a list of available "chunks" of free storage. Some special location is used to hold the address of the first item on the list; each item includes:

the size of the chunk,
the address of the next item on the list, and
the chunk itself.

Actually, the field that holds the address of the next list item is also part of the chunk itself. The size field, however, is not; that field stays "attached" to the chunk, but should not be overwritten by the programmer's code. (In some languages, like C, the programmer can actually overwrite the value in this field; this is usually the result of a logical error, but could also be a deliberate attempt to breach some kind of security.)

Here is a series of pictures to illustrate the way the freelist works. Note that alignment issues are ignored in this example (we assume that an allocated chunk of storage can start at any address). Also, we assume that the heap starts at location 0, which is not a realistic assumption, but is fine for the purposes of this example.

Initially, the freelist might look like this:

              0    4   ...                         103
+---+      +------------------------------------------+
|   |      |     |   |                                |
| o------->| 100 | \ | ...                            |
|   |      |     |   |                                |
+---+      +------------------------------------------+

first       size  next
free

Now assume that a request to allocate 20 bytes is received. The first 20 bytes (after the size field) would be used to satisfy the request (i.e., the address "4" would be returned), and the heap updated to look like this:

              0    4 ...    23   24    28 ...          103
+---+      +------------------+ +-------------------------+
|   |      |     |            | |    |   |                |
| o---+    |  20 |            | | 76 | \ |                |
|   | |    |     |            | |    |   |                |
+---+ |    +------------------+ +-------------------------+
first |     size                 size next
free  |                           ^
      |                           |
      +----------------------------

The single chunk of available storage has been split into two parts: the first part was used to satisfy the storage request; it still has a "size" field, but the value has been updated to reflect the size of the allocated chunk. The second part is the storage that is now available. The "first free" pointer has been updated to point to this chunk, and its "size" and "next" fields have been set.

Now assume that a request for 10 bytes is received. Here is the situation after that request has been satisfied:

              0    4 ...    23   24   28 ... 37   38   42  46...103
+---+      +------------------+ +--------------+ +---------------+
|   |      |     |            | |    |         | |    |   |      |
| o---+    |  20 |            | | 10 |         | | 62 | \ |      |
|   | |    |     |            | |    |         | |    |   |      |
+---+ |    +------------------+ +--------------+ +---------------+
first |     size                 size             size next
free  |                                            ^
      |                                            |
      +---------------------------------------------

Finally, assume that the first chunk of storage that was allocated is now freed (the chunk starting at location 4). That chunk of storage would be added to the front of the freelist (since that is cheaper than adding it to the middle or the end), and the picture would be like this:

              0    4   8 ... 23  24   28 ... 37   38   42  46...103
+---+      +------------------+ +--------------+ +---------------+
|   |      |     |   |        | |    |         | |    |   |      |
| o---+    |  20 | o |        | | 10 |         | | 62 | \ |      |
|   | |    |     | | |        | |    |         | |    |   |      |
+---+ |    +-------|----------+ +--------------+ +---------------+
first |     size  next           size             size next
free  |      ^     |                               ^
      |      |     |                               |
      +------+     +-------------------------------+

Operations on the Freelist

The operations on the freelist that need to be supported are:

When space is requested, find a satisfactory chunk.
When space is freed, return it to the freelist.

A good implementation of those operations should satisfy the following goals:

Only fail to satisfy a request for a chunk of n bytes of storage if there fewer than n free bytes.
Do both operations quickly.

Some questions to consider are:

Given a request for n bytes, which n bytes to return?
Given a "free" of a chunk, how to coalesce it with neighboring free chunks? (This issue would arise, for example, if the chunk of size 10 in the above example were freed.)

Techniques for allocation

The answer to the first question is that there are a number of different schemes for deciding how to allocate a chunk of size n:

Best Fit: Find the chunk on the freelist with the smallest size greater than or equal to n. The idea is to preserve larger chunks (i.e., do not break them up if it is not necessary). However, it has several disadvantages:

It may require a search of the entire freelist (so may be slow).
It tends to leave lots of little pieces of free storage on the list, which may be useless until coalesced.

First Fit:Use the first chunk with size greater than or equal to n. This technique will generally be faster than Best-Fit; however, it may produce little pieces of free storage at the front of the list, which will slow down later searches.

Circular First Fit: Make the freelist circular (i.e., have the last item point back to the first item). When a request for n bytes is made, satisfy it using the first chunk with size greater than or equal to n, but then change the "first free" pointer to point to the chunk following the one that was returned.

Note: if the list is singly linked, then it will not, in general be possible to return the very first chunk, because there will be no way to fix the "next" pointer of the previous item. This problem can be solved by making the list doubly linked (which does not lower the amount of available storage, since the pointer fields are part of the chunk used to satisfy an allocation request). Another possibility is to have special-case code for the case where there is just one item on the list, and otherwise to start the search from the second item, keeping a "trailing" pointer to permit the previous item's "next" field to be updated.

Techniques for coalescing

There are also several possible ways to solve the second problem (how to coalesce freed storage). One approach is to use a doubly linked list (i.e., each list item has a "previous" as well as a "next" pointer). Also, one bit of the "size" field is reserved to indicate whether the chunk is "free" or "in-use". Now when a chunk is freed, we can check the "free-bit" of the storage that immediately follows the freed chunk (using the freed chunk's "size" bit to locate the "size" field of the following chunk of storage). If that following storage is free, then the two chunks can be coalesced. For example, suppose the situation is like this:

           +------------------------------------+   +--------+
           |                                    |   |        |
           v                                    |   |        v
+---+    +-----------------+ +---------+ +------|---|----+ +--------------+
|   |    |   |   |   |     | |    |    | |    | | | | |  | |   |   |   |  |   
| o---+  |   | \ | o |     | | 10 |    | | 20 | o | o |  | |   | o | \ |  |
|   | |  |   |   | | |     | |    |    | |    |   |   |  | |   | | |   |  |
+---+ |  +---------|-------+ +---------+ +---------------+ +-----|--------+
first |  size prev next       size       size prev next    size prev next
free  |    ^       |                      ^ ^                    |
      |    |       |                      | |                    |
      +----+       +----------------------+ +--------------------+

and now the chunk of size 10 is freed. That chunk can be coalesced with the following chunk (of size 20), producing this situation:

             +------------------------+    +--------------+
             |                        |    |              |
             v                        |    |              v
+---+      +-----------------+ +------|----|----------+ +----------------+
|   |      |   |   |   |     | |    | | |  | |        | |   |    |   |   |   
| o---+    |   | \ | o |     | | 34 | o |  o |        | |   | o  | \ |   |
|   | |    |   |   | | |     | |    |   |    |        | |   | |  |   |   |
+---+ |    +---------|-------+ +----------------------+ +-----|----------+
first |    size prev next       size prev next          size prev next
free  |      ^       |          ^ ^                           |
      |      |       |          | |                           |
      +------+       +----------+ +---------------------------+

Note:

Four pointers have changed:
- The "next" pointer of the first item on the list (the one before the item that got coalesced with the newly freed chunk). It now points to the beginning of the newly freed chunk. That pointer was found (to be updated) by following the "prev" field of the chunk of size 20 (the one that got coalesced).
- The "prev" pointer of the last item on the list (the one after the item that got coalesced). It also now points to the beginning of the newly freed chunk. That pointer was found (to be updated) by following the "next" field of the chunk of size 20.
- The "prev" and "next" fields of the newly freed chunk (which are now the "prev" and "next" fields of the coalesced chunk).
The size of the coalesced chunk is 34: four more than the sum of the sizes of the chunks that got coalesced. This is because we "recovered" the "size" field of the second chunk when the two chunks were coalesced.

To allow a newly freed chunk to be coalesced with a free chunk that precedes it in memory (as well as with one that follows it) we need to maintain two "size" fields in every chunk: one at the end of the chunk as well as the one at the beginning. In that case, when a chunk is freed, we will know that the immediately preceding 4 bytes are a "size" field (with a "free-bit"); we can use the free-bit to tell whether the preceding memory is available for coalescing, and we can use the value of the size field to know the extent of the previous list item.

Here is an example. Assume that we start with this situation:

           +-------------------------------------+  
           |                                     |  
           v                                     |  
+---+    +-----------------+ +-----------+ +-----|-----------+ +-----------+
|   |    |  |   |   |   |  | |   |   |   | |   | | |   |  |  | |   |   |   |
| o---+  |  | \ | o |   |  | |   |   |   | |20 | o | \ |  |20| |16 |   |16 |
|   | |  |  |   | | |   |  | |   |   |   | |   |   |   |  |  | |   |   |   |
+---+ |  +--------|--------+ +-----------+ +-----------------+ +-----------+
first | size prev next  size  size    size size prev next  size size     size
free  |    ^      |                          ^
      |    |      |                          |
      +----+      +--------------------------+

Now assume that the last chunk of memory in the picture is freed. The "free-bit" in the 4 bytes immediately to the left of the size field of the newly freed chunk will indicate that the preceding chunk is also free, and can be coalesced. The result is shown below.

           +-------------------------------------+  
           |                                     |  
           v                                     |  
+---+    +-----------------+ +-----------+ +-----|-------------------------+
|   |    |  |   |   |   |  | |   |   |   | |   | | |   |               |   |
| o---+  |  | \ | o |   |  | |   |   |   | |44 | o | \ |               |44 |
|   | |  |  |   | | |   |  | |   |   |   | |   |   |   |               |   |
+---+ |  +--------|--------+ +-----------+ +-------------------------------+
first | size prev next  size  size    size size prev next               size
free  |    ^      |                          ^
      |    |      |                          |
      +----+      +--------------------------+

Note that doing the coalesce only requires updating two size fields (the left field of the preceding chunk, and the right field of the newly freed chunk). The new size is the sum of the two old sizes + 8 (because the right size field of the first chunk and the left size field of the second chunk get "reclaimed"). No pointers need to be changed at all, so this is a faster operation than coalescing with a following chunk. However, it has the disadvantage of requiring an extra size field in every chunk.

Freelists for Fixed-Size Chunks

For languages like Pascal, storage is allocated for fixed-size chunks whose sizes correspond to the pointer types in the program. It is possible to determine at compile time exactly what size chunks may be requested when the program runs. In this case, another strategy can be used: If there are N different possible chunk sizes, divide the heap into n "mini-heaps". Maintain a separate freelist for each possible chunk size, and return the first chunk from that freelist when a chunk of the appropriate size is requested. The freelists can be maintained us usual (using a linked list), or a set of bitmaps can be kept (one for each "mini-hap") with each bit corresponding to one chunk.

This has the following advantages over the previously discussed approaches:

There is no need to search the freelist (thus, allocation can be faster).
There is no need to coalesce chunks when one is freed (thus, deallocation can be faster).
There is no need for any "size" fields (thus, less space is wasted, and some time is saved because there is no need to update those fields).

Deallocation

Problems with Explicit Deallocation

Recall that in some languages (Pascal, C, C++), deallocation is "explicit" (under programmer control), while in other languages (Java) it is done "automatically". The main reason to prefer automatic deallocation is that it is easy for the programmer to make mistakes in their deallocation code, which can lead to errors that are very hard to track down.

Storage Leaks

One potential problem is storage leaks; i.e., some storage is never freed, although it is inaccessible (and so will never be used again by the program). The problems with storage leaks are that they can cause a program to use more memory than necessary. This can slow down execution, or, in the worst case, if the program runs out of memory completely, can cause it to crash.

Here is an example of C code that causes a storage leak:

    Listnode *p = malloc( sizeof(Listnode) );
          .
          .   // no copy from p in this code
          .
    p = ...;

When the second assignment to p is executed it over-writes the address of the allocated chunk of storage that was stored in p. That storage becomes inaccessible; the program can no longer use it, but it cannot be freed for reuse.

Dangling pointers

A second potential problem is the use of dangling pointers. A dangling pointer is one that points to storage that has been freed. This is a problem because if the pointer is dereferenced for reading, garbage may be read (causing incorrect behavior at some future point in the execution); if the pointer is dereferenced for writing, it may mess up the freelist, or (if the storage has been re-allocated since it was freed) may corrupt other, seemingly unrelated values. This kind of error is especially difficult to track down.

Here is an example of C code that illustrates a dangling pointer:

    Listnode *p, *q;
    p = malloc( sizeof(Listnode) );
    q = p;
       .
       . // no assignment to q in this code
       .
    free(p);
       .
       . // no assignment to q in this code
       .
    *q = ...

In this example, q becomes a dangling pointer when p is freed. The final write into the memory pointed to by q might corrupt the freelist, or (if the storage was reallocated between the free of p and the dereference of q) might corrupt some object pointed to by another pointer.

A technique for detecting uninitialized and dangling pointers

In some languages, the compiler can generate code to detect (at run time) an attempt to dereference an uninitialized or dangling pointer. One way to do this is by including a new "invisible" field (like the size field) as part of every chunk of storage, as well as including a new "invisible" field associated with every pointer. The two fields are called the lock and the key, respectively.

The technique works as follows:

The local field of each free chunk of storage is set to 0.
When storage is allocated, both its lock field and the key field of the pointer into which its address is assigned are given a new value (a global counter is used to keep track of the next value to use).
When storage is freed its lock is set back to 0.
When a pointer is dereferenced, the compiler generates code that first checks whether that pointer's key matches the lock on the storage pointed to. If not, this is an error! (If the storage has not been reallocated since being freed, its lock will have value 0, which will not match the pointer's key; if it has been reallocated, it will have been given a new lock value, so again it will not match the pointer's key.)

Note that uninitialized pointers can either have their keys set to some special value (e.g., -1), or the key fields can be uninitialized. In the former case, we are sure to catch an attempt to dereference an uninitialized pointer (since a -1 key won't match any lock); in the latter case we may miss some errors (if by coincidence the value in the uninitialized pointer is an address whose "lock" field happens to match the value in the pointer's (uninitialized) key field. However, that is unlikely, and it may be preferable to save the time that would be needed to initialize all key fields.

Note also that this technique requires that every pointer have a key field, including pointers that are inside dynamically allocated objects. This means that allocation must be done according to the type of the object being allocated (as is done in Pascal, C++, and Java) so that space for the key fields can be included. In C, it is not only possible to allocate storage by requesting a specific number of bytes (rather than using the "sizeof" operator), it is also possible to store pointers in non-pointer variables such as integers (via casting). These kinds of language features make it difficult for a compiler to ensure that techniques like this lock-and-key approach work correctly.

Automatic Deallocation

There are two basic problems that must be solved in order to do automatic storage deallocation:

How to determine whether a chunk of storage is no longer accessible to the program, and
How to make deallocation as efficient as possible; in particular how to avoid long pauses in the program's execution when deallocation is being done.
And there are two basic approaches to doing automatic deallocation:

Reference counting, and
Garbage Collection.

Reference counting
Reference counting involves including yet another "invisible" field in every chunk of storage: its reference count field. The value of that field is the number of pointers that point to the chunk. The value is initialized to 1 when the chunk is allocated, and is updated as follows:

When a pointer is copied, a new reference is created, so the reference count of the object pointed to must be incremented.
When a non-null pointer's value is over-written, a reference is removed, so the reference count of the object pointed to (before the over-write) must be decremented. (Note that this means that pointers must be initialized to null; otherwise, the reference count of some random piece of storage -- whose address happened to be in the uninitialized pointer -- would erroneously be decremented.)
When a reference count becomes zero, it means that no pointers are pointing to the object, so it can be returned to free storage. At that time, if the object itself contains pointers, then the reference counts of the objects that they point to must in turn be decremented. Note that this requires being able to recognize pointers in a chunk of storage (e.g., by knowing its type).
There are two important problems with reference counting:

Every write into a pointer requires a test to see whether the old value was null, and also requires that one or two reference counts be updated; this may slow the program down quite a bit.
Cyclic structures cannot be deallocated. This is illustrated by the following (Pascal) code:
var p: Nodeptr; /* p is a pointer to a node */ new(p); /* p points to newly allocated storage for one node; its reference count is 1 */ p^.next = p; /* the next field of the node also points to the node itself, so now its reference count is 2 */ p = nil; /* p's value is over-written, so the node's reference count is decremented (from 2 to 1) In fact, it is inaccessible (it points to itself, no other pointer points to it), but we can't tell that just from the reference count. */

Garbage collection
The basic idea behind garbage collection is to wait until there is little or no storage left, then:

Find all accessible objects.
Free all other (inaccessible) objects.
There are many different approaches to doing garbage collection (this is an active area of current research). We will discuss two:

Mark and Sweep
Stop and Copy

Mark and Sweep

The Mark and Sweep technique has two phases:

The mark phase finds and marks all accessible object.
The sweep phase sweeps through the heap, collecting all of the garbage (the inaccessible objects) and putting them back on the freelist.
The mark and sweep technique requires a new "invisible" bit in each chunk of storage: its mark bit (this can be one bit of the chunk's "size" field). This bit is:

Initialized to 0.
Set to 1 if the chunk is reached during the mark phase.
The mark phase works as follows:

Start by putting all "active" pointers on a worklist. A pointer is active if it is on the stack (anywhere on the stack, not just in the topmost activation record) or in the static-data area.
While the worklist is non-empty, select and remove a pointer, and look at the object pointed to: If its mark bit is zero, then

change it to 1, and
put all pointers contained in that object on the worklist (note that this requires being able to recognize pointers; more on that later).
When the mark phase has finished, all accessible objects have mark bits set to one, and all inaccessible object have mark bits set to zero.
The sweep phase starts with an empty freelist. It looks at every chunk of storage in the heap in order (note that those chunks can be recognized because we know where the heap starts, and each chunk starts with a "size" field). If the mark bit for a chunk is 0, it means that it is inaccessible. The chunk can simply be added to the freelist, but a better idea (to reverse fragmentation) is to first check the following chunks. If there is a sequence of two or more free chunks, then they can be coalesced, and the coalesced chunk is then added to the freelist.
If the mark bit for a chunk is 1, it means that it is accessible. Therefore, it is not added to the freelist, but its mark bit is set back to zero so that it will be processed the next time the mark-and-sweep garbage collector is started up again.
Below is an example to illustrate the mark-and-sweep process. Assume that memory looks like this when the garbage collector is called; the numbers in the chunks are mark bits, all initially 0. Note that there is just one free chunk and that some of the (non-free) objects contain pointers.
+---------------------------------------------+ | +----------------+ | | | v v +-----|-+ +---|---+ +-------+ +-------+ +-------+ +-------+ | 0 o | | 0 o | | o 0 | | 0 | | 0 | | 0 | +-------+ +-------+ +-|-----+ +-------+ +-------+ +-------+ ^ | ^ ^ | | | | +------------------+ | | | | ptr on stack: ----------+ first-free
Here's the situation after just the mark phase (note that all chunks reachable from the stack pointer now have mark-bits = 1):
+---------------------------------------------+ | +----------------+ | | | v v +-----|-+ +---|---+ +-------+ +-------+ +-------+ +-------+ | 1 o | | 0 o | | o 1 | | 0 | | 0 | | 1 | +-------+ +-------+ +-|-----+ +-------+ +-------+ +-------+ ^ | ^ ^ | | | | +------------------+ | | | | ptr on stack: ----------+ first-free
Finally, here's the situation after the sweep phase has finished; the second inaccessible chunk has been coalesced with the chunk that was free all along, all inaccessible chunks are now on the freelist, and all of the mark bits have been set to 0.
+---------------------------------------------+ | | | +------------------------+ | | v | v +-----|-+ +-------+ +-------+ +-----|-----------+ +-------+ | 0 o | | 0 |\| | | o 0 | | 0 | o | | | 0 | +-------+ +-------+ +-|-----+ +-----------------+ +-------+ ^ | ^ ^ | | | | | | | +----------+ +------------------+ | | | | ptr on stack: ----------+ first-free

Stop and Copy

For the Stop and Copy technique, the heap is divided into two parts: "old" space and "new" space. Old space is used for allocation, and new space is used for garbage collection. There is no free list; instead, a "first-free" pointer is maintained that points to the first free location in "old" space. When a chunk of n bytes is requested, the location pointed to by the first-free pointer is returned, and the first-free pointer is incremented by n (actually, "invisible" size fields are still maintained as part of each allocated chunk, so allocating a chunk would have to include maintaining that field).
When the "old" space is full, or almost full, the stop and copy garbage collection begins. It finds all accessible objects (by following pointers from the static-data area, etc. as for the mark and sweep technique), but instead of marking them, it copies them to "new" space. Once all accessible objects have been copied, the roles of the "old" and "new" space are reversed; the first-free pointer points to the first free location in the "old" space (the location just after the last copied object).
Below are two picture to illustrate the idea. The stack is shown on the left; it contains 2 pointers to heap objects. The heap is shown on the right. Initially, it contains 6 chunks of allocated storage (labeled A - F) in the "old" space. (The first-free pointer points to the small remaining chunk of storage in the "old" space.) Chunk C itself contains a pointer (pointing to chunk D). In the second picture, the three accessible chunks have been copied to what used to be "new" space, leaving behind all garbage. The first-free pointer now points to the first free location in what used to be "new" space, and is now "old" space.
<-------- old space --------> <-------- new space --------> +---------------------------------------------------------+ | A | B | C o | D | E | F | | | +-----------|---------------------------------------------+ ^ | ^ ^ ^ | | | | | | | | o-------------+ +--+ | first | | | free | o---------------------------+ | | +---+ stack <------ new space -------> <------- old space ------> +---------------------------------------------------+ | | C o | D | F | | +---------------------------------------------------+ ^ | ^ ^ ^ | | | | | | | | o-----------------------------------+ +--+ | first | | | free | o--------------------------------------------+ | | +---+ stack
We have glossed over an important part of the stop-and-copy approach: when a chunk of accessible storage is copied, it is vital that all pointers pointing to that storage be updated (to point to its new location in "new" space). It is easy enough to update the pointer that we follow to find the accessible chunk, but what about other pointers (either on the stack, or in accessible heap objects) that point to the same object? The answer is that when an object is copied from "old" to "new" space, a forwarding pointer is left behind; i.e., the address of the object in "new" space. When we follow a pointer P that points to the same object, we must recognize that it has been replaced with a forwarding pointer, and we must copy the value of the forwarding pointer into pointer P. One way to distinguish an object from a forwarding pointer is to set the invisible size field to 0 to indicate a forwarding pointer (this works because an object will never have size 0, and because we don't need the size field in "old" space any more once the object has been copied to "new" space).
The example given above is repeated below, but this time we assume that object F contains a pointer to C (as well as there being a pointer to C from the stack). The first picture shows the situation before garbage collection. The second picture shows the situation after the top-most stack pointer (the one pointing to C) has been followed; C has been copied to "new" space, a forwarding pointer has been left behind, and the stack pointer has been updated. The third picture shows the final situation after garbage collection has finished; all accessible storage has been copied, all pointers to accessible storage have been updated, and the roles of "old" and "new" space reversed.
<-------- old space ---------> <------ new space ------> +----------------+ | | v | +-------------------------|----------------------------+ | A | B | C o | D | E | F o | | | +-----------|------------------------------------------+ ^ | ^ ^ ^ | | | | | | | | o-------------+ +--+ | first | | | free | o---------------------------+ | | +---+ stack <--------- old space ---------> <------ new space ------> +---------------+ | +----------|-----+ v v | | +------------------------|-------|---------------------+ | A | B | o | D | E | F o | | C o | | +----------|-------------------------------------------+ | ^ ^^ ^ +----------|-------+| | | | first | | | | free | o--------------------------|--------+ | | | | o--------------------------+ | | +---+ stack <-------- new space --------> <-------- old space ------> +------------+ | | | +--+ | v | v | +-------------------------------|---------|--------------+ | | C o | D | F o | | +--------------------------------------------------------+ ^ ^ ^ | | | | | first | | | | free | o---------------------------------+ | | | | | o-------------------------------------------+ | | +---+ stack
Stop and Copy garbage collection is currently considered the best approach. It has a number of advantages compared to mark and sweep:

Allocation is cheaper (no need to search the freelist, just advanced the first-free pointer).
The fact that there is no freelist (instead, at any moment, available storage is in one big chunk, pointed to by the first-free pointer), means that there is no problem with fragmentation, and no need to coalesce storage when it is discovered to be garbage.
In general, stop and copy is faster than mark and sweep; this is because there is no phase that scans the entire heap; instead, it requires time proportional only to the number of accessible chunks.
The fact that accessible storage is compacted (copied into a contiguous chunk in the "new" space) leads to better performance by the program after garbage collection; for example, having accessible objects close together may mean fewer cache misses and/or fewer page faults.

Deutsch-Bobrow deferred reference counting
There is a technique called deferred reference counting that combines some of the features of (normal) reference counting and garbage collection. An important insight behind this technique is that much of the (time) overhead of reference counting happens because of traversals of heap data structures, using a local variable as a "temporary" pointer. For example, consider the following code that traverses the linked list pointed to by L:
Listptr tmp = L; while (tmp != null) { ... do something with tmp->data ... tmp = tmp->next; }
(Note: "tmp->next" is C syntax; it refers to the "next" field of the object pointed to by tmp.)
If normal reference counting is used, then before the loop (when the value in L is copied in to tmp), the reference count of the first item on the list is incremented. The assignment "tmp = tmp->next" inside the loop causes the following changes to be made on each iteration:

The reference count of the list item pointed to by tmp is decremented (because tmp is about to be over-written).
The reference count of the next item on the list is incremented (because tmp now points to it, as well as the "next" field of the previous item on the list).

After the loop finishes, all reference counts are back to where they started; a lot of extra work has been done for nothing!
To avoid this kind of extra work, deferred reference counting works as follows:

An object's reference count reflects only the number of pointers in the heap that are pointing to it; pointers from the stack and static-data area are ignored.
Therefore, when an object's reference count becomes zero, it is not safe to free it (since it may still be pointed to by one or more pointers from the stack/static-data area).
Therefore, when an object's reference count becomes zero, it is not freed; instead it is put on a special "zero-count" list.
When free space gets low, or when the zero-count list gets long, garbage collection begins; each object whose true reference count is zero is identified and freed as follows:

For each pointer on the stack or in the static-data area, increment the reference count of the pointed-to object. Note that when this step is finished, all objects have their true reference counts.
For each item on the zero-count list: if the reference count is still zero, then it is garbage; put it on the freelist (and decrement the reference counts of any objects that it points to, putting them at the end of the freelist if their reference counts become zero).
For each pointer on the stack or in the static-data area, decrement the reference count of the pointed-to object. Note that when this step is finished, all objects are back to having their reference counts reflect only heap pointers.

Note that this approach requires the compiler to generate different code for different kinds of assignments:

For an assignment into a program variable (i.e., an assignment into a location on the stack or in the static-data area), no code is generated to update reference counts (since references from the stack/static-data area are ignored until garbage collection starts).
However, an assignment into a pointer on the heap must involve the overhead of reference-count manipulation.
For example, if p is a local variable of type "pointer to list", then assignments to p itself (e.g., "p = new list;", or "p = q;") do not involve any updates to reference counts. However, assignments like "p->next = new list;" do require reference count updates, since "p->next" is a location in the heap.

How to identify pointers
Most of the automatic deallocation techniques discussed above require that it be possible to recognize pointers at runtime. There are several possible ways to do this:

Every word includes a one-bit tag (0 means "not a pointer, and 1 means "is a pointer). This has a number of consequences:

Values (including addresses) cannot use this bit, so the ranges of possible values are smaller than normal.
Operations must preserve this bit. This means that hardware support is necessary.
On method entry, this bit must be initialized for all local variables.
When a chunk of storage is allocated, this bit must be initialized for all of the fields in the allocated object.

Again, every word has a tag, but instead of storing the tag in the word itself, it is maintained in a separate bit-map (that includes one bit for every word in the heap, the stack, and the static-data area). In this case, no bit is "stolen" (so the range of values is not restricted, and nothing special needs to be done to make sure that operations don't clobber the special bit). However, it is still necessary to initialize the bit on method entry and on storage allocation.
A final possibility is to associate with each variable and each allocated object (rather than with each word) a tag telling its type (which could be implemented as an index into an array of type descriptors, maintained at runtime). While the tag would require more than a single bit, this approach might save space because only one tag is required for an entire object, rather than one bit per word.

Summary

The important concepts covered in this set of notes are:

There are two possible approaches to storage deallocation: explicit (programmer-controlled) and automatic.
Because it is often difficult for the programmer to implement explicit deallocation correctly, programs written in languages that require explicit deallocation often have logical errors that can lead to problems:

storage leaks, and
corrupted memory via dangling pointers.

There are two basic approaches to automatic deallocation:

reference counting, and
garbage collection.

Reference counting has several disadvantages:

It has high space and time overheads.
It cannot free cyclic structures.
But it has the advantage over garbage collection that the cost is "distributed" over the execution of the program; there are no long pauses as there are for garbage collection (which are intolerable for real-time systems).
There are many different kinds of garbage collection; we looked at two:

Mark and sweep: The mark phase finds and marks all accessible storage (starting from the "roots": pointers on the stack and in the static-data area). The sweep phase makes a pass over the entire heap, putting all unmarked chunks of storage on the freelist.
Stop and Copy: Accessible chunks of storage are identified as for mark and sweep, but instead of being marked, they are copied to the currently unused half of the heap. Forwarding pointers are left behind so that additional pointers to the same object can be updated.

Contents