Heap Management


Contents

Overview

In general, the heap is used for dynamically allocated objects. However, it might be used for other kinds of objects, too. For example, activation records might be allocated on the heap for a multi-threaded language, where calls and returns do not follow a stack protocol (i.e., a "return" is not necessarily from the most recently called subprogram, because the most recently called subprogram could be in one thread, while the return was in another).

Different languages use different syntax for the allocation of storage for dynamically created objects:

In some languages, deallocation is done by the programmer:

In other languages (e.g., Java), deallocation is done "automatically" (not under the programmer's control): storage is reclaimed (for later reuse) when is it "dead"; i.e., when it is no longer accessible via some variable in the program.

We will first look at basic techniques for implementing the low-level operations on the heap (how to satisfy requests for storage, and what to do when storage is freed). Then will we consider some of the problems of programmer-controlled and of automatic deallocation. Finally, we will look at some different techniques for doing automatic deallocation.

Basic Techniques

Available storage is managed using a free list: a list of available "chunks" of free storage. Some special location is used to hold the address of the first item on the list; each item includes:

  1. the size of the chunk,
  2. the address of the next item on the list, and
  3. the chunk itself.
Actually, the field that holds the address of the next list item is also part of the chunk itself. The size field, however, is not; that field stays "attached" to the chunk, but should not be overwritten by the programmer's code. (In some languages, like C, the programmer can actually overwrite the value in this field; this is usually the result of a logical error, but could also be a deliberate attempt to breach some kind of security.)

Here is a series of pictures to illustrate the way the freelist works. Note that alignment issues are ignored in this example (we assume that an allocated chunk of storage can start at any address). Also, we assume that the heap starts at location 0, which is not a realistic assumption, but is fine for the purposes of this example.

Initially, the freelist might look like this:

              0    4   ...                         103
+---+      +------------------------------------------+
|   |      |     |   |                                |
| o------->| 100 | \ | ...                            |
|   |      |     |   |                                |
+---+      +------------------------------------------+

first       size  next
free
Now assume that a request to allocate 20 bytes is received. The first 20 bytes (after the size field) would be used to satisfy the request (i.e., the address "4" would be returned), and the heap updated to look like this:
              0    4 ...    23   24    28 ...          103
+---+      +------------------+ +-------------------------+
|   |      |     |            | |    |   |                |
| o---+    |  20 |            | | 76 | \ |                |
|   | |    |     |            | |    |   |                |
+---+ |    +------------------+ +-------------------------+
first |     size                 size next
free  |                           ^
      |                           |
      +----------------------------
The single chunk of available storage has been split into two parts: the first part was used to satisfy the storage request; it still has a "size" field, but the value has been updated to reflect the size of the allocated chunk. The second part is the storage that is now available. The "first free" pointer has been updated to point to this chunk, and its "size" and "next" fields have been set.

Now assume that a request for 10 bytes is received. Here is the situation after that request has been satisfied:

              0    4 ...    23   24   28 ... 37   38   42  46...103
+---+      +------------------+ +--------------+ +---------------+
|   |      |     |            | |    |         | |    |   |      |
| o---+    |  20 |            | | 10 |         | | 62 | \ |      |
|   | |    |     |            | |    |         | |    |   |      |
+---+ |    +------------------+ +--------------+ +---------------+
first |     size                 size             size next
free  |                                            ^
      |                                            |
      +---------------------------------------------
Finally, assume that the first chunk of storage that was allocated is now freed (the chunk starting at location 4). That chunk of storage would be added to the front of the freelist (since that is cheaper than adding it to the middle or the end), and the picture would be like this:
              0    4   8 ... 23  24   28 ... 37   38   42  46...103
+---+      +------------------+ +--------------+ +---------------+
|   |      |     |   |        | |    |         | |    |   |      |
| o---+    |  20 | o |        | | 10 |         | | 62 | \ |      |
|   | |    |     | | |        | |    |         | |    |   |      |
+---+ |    +-------|----------+ +--------------+ +---------------+
first |     size  next           size             size next
free  |      ^     |                               ^
      |      |     |                               |
      +------+     +-------------------------------+

Operations on the Freelist

The operations on the freelist that need to be supported are:
  1. When space is requested, find a satisfactory chunk.
  2. When space is freed, return it to the freelist.
A good implementation of those operations should satisfy the following goals:
  1. Only fail to satisfy a request for a chunk of n bytes of storage if there fewer than n free bytes.
  2. Do both operations quickly.

Some questions to consider are:

  1. Given a request for n bytes, which n bytes to return?
  2. Given a "free" of a chunk, how to coalesce it with neighboring free chunks? (This issue would arise, for example, if the chunk of size 10 in the above example were freed.)

Techniques for allocation

The answer to the first question is that there are a number of different schemes for deciding how to allocate a chunk of size n:

Best Fit: Find the chunk on the freelist with the smallest size greater than or equal to n. The idea is to preserve larger chunks (i.e., do not break them up if it is not necessary). However, it has several disadvantages:

  1. It may require a search of the entire freelist (so may be slow).
  2. It tends to leave lots of little pieces of free storage on the list, which may be useless until coalesced.

First Fit:Use the first chunk with size greater than or equal to n. This technique will generally be faster than Best-Fit; however, it may produce little pieces of free storage at the front of the list, which will slow down later searches.

Circular First Fit: Make the freelist circular (i.e., have the last item point back to the first item). When a request for n bytes is made, satisfy it using the first chunk with size greater than or equal to n, but then change the "first free" pointer to point to the chunk following the one that was returned.

Note: if the list is singly linked, then it will not, in general be possible to return the very first chunk, because there will be no way to fix the "next" pointer of the previous item. This problem can be solved by making the list doubly linked (which does not lower the amount of available storage, since the pointer fields are part of the chunk used to satisfy an allocation request). Another possibility is to have special-case code for the case where there is just one item on the list, and otherwise to start the search from the second item, keeping a "trailing" pointer to permit the previous item's "next" field to be updated.

Techniques for coalescing

There are also several possible ways to solve the second problem (how to coalesce freed storage). One approach is to use a doubly linked list (i.e., each list item has a "previous" as well as a "next" pointer). Also, one bit of the "size" field is reserved to indicate whether the chunk is "free" or "in-use". Now when a chunk is freed, we can check the "free-bit" of the storage that immediately follows the freed chunk (using the freed chunk's "size" bit to locate the "size" field of the following chunk of storage). If that following storage is free, then the two chunks can be coalesced. For example, suppose the situation is like this:

           +------------------------------------+   +--------+
           |                                    |   |        |
           v                                    |   |        v
+---+    +-----------------+ +---------+ +------|---|----+ +--------------+
|   |    |   |   |   |     | |    |    | |    | | | | |  | |   |   |   |  |   
| o---+  |   | \ | o |     | | 10 |    | | 20 | o | o |  | |   | o | \ |  |
|   | |  |   |   | | |     | |    |    | |    |   |   |  | |   | | |   |  |
+---+ |  +---------|-------+ +---------+ +---------------+ +-----|--------+
first |  size prev next       size       size prev next    size prev next
free  |    ^       |                      ^ ^                    |
      |    |       |                      | |                    |
      +----+       +----------------------+ +--------------------+
and now the chunk of size 10 is freed. That chunk can be coalesced with the following chunk (of size 20), producing this situation:
             +------------------------+    +--------------+
             |                        |    |              |
             v                        |    |              v
+---+      +-----------------+ +------|----|----------+ +----------------+
|   |      |   |   |   |     | |    | | |  | |        | |   |    |   |   |   
| o---+    |   | \ | o |     | | 34 | o |  o |        | |   | o  | \ |   |
|   | |    |   |   | | |     | |    |   |    |        | |   | |  |   |   |
+---+ |    +---------|-------+ +----------------------+ +-----|----------+
first |    size prev next       size prev next          size prev next
free  |      ^       |          ^ ^                           |
      |      |       |          | |                           |
      +------+       +----------+ +---------------------------+
Note:

To allow a newly freed chunk to be coalesced with a free chunk that precedes it in memory (as well as with one that follows it) we need to maintain two "size" fields in every chunk: one at the end of the chunk as well as the one at the beginning. In that case, when a chunk is freed, we will know that the immediately preceding 4 bytes are a "size" field (with a "free-bit"); we can use the free-bit to tell whether the preceding memory is available for coalescing, and we can use the value of the size field to know the extent of the previous list item.

Here is an example. Assume that we start with this situation:

           +-------------------------------------+  
           |                                     |  
           v                                     |  
+---+    +-----------------+ +-----------+ +-----|-----------+ +-----------+
|   |    |  |   |   |   |  | |   |   |   | |   | | |   |  |  | |   |   |   |
| o---+  |  | \ | o |   |  | |   |   |   | |20 | o | \ |  |20| |16 |   |16 |
|   | |  |  |   | | |   |  | |   |   |   | |   |   |   |  |  | |   |   |   |
+---+ |  +--------|--------+ +-----------+ +-----------------+ +-----------+
first | size prev next  size  size    size size prev next  size size     size
free  |    ^      |                          ^
      |    |      |                          |
      +----+      +--------------------------+
Now assume that the last chunk of memory in the picture is freed. The "free-bit" in the 4 bytes immediately to the left of the size field of the newly freed chunk will indicate that the preceding chunk is also free, and can be coalesced. The result is shown below.
           +-------------------------------------+  
           |                                     |  
           v                                     |  
+---+    +-----------------+ +-----------+ +-----|-------------------------+
|   |    |  |   |   |   |  | |   |   |   | |   | | |   |               |   |
| o---+  |  | \ | o |   |  | |   |   |   | |44 | o | \ |               |44 |
|   | |  |  |   | | |   |  | |   |   |   | |   |   |   |               |   |
+---+ |  +--------|--------+ +-----------+ +-------------------------------+
first | size prev next  size  size    size size prev next               size
free  |    ^      |                          ^
      |    |      |                          |
      +----+      +--------------------------+
Note that doing the coalesce only requires updating two size fields (the left field of the preceding chunk, and the right field of the newly freed chunk). The new size is the sum of the two old sizes + 8 (because the right size field of the first chunk and the left size field of the second chunk get "reclaimed"). No pointers need to be changed at all, so this is a faster operation than coalescing with a following chunk. However, it has the disadvantage of requiring an extra size field in every chunk.

Freelists for Fixed-Size Chunks

For languages like Pascal, storage is allocated for fixed-size chunks whose sizes correspond to the pointer types in the program. It is possible to determine at compile time exactly what size chunks may be requested when the program runs. In this case, another strategy can be used: If there are N different possible chunk sizes, divide the heap into n "mini-heaps". Maintain a separate freelist for each possible chunk size, and return the first chunk from that freelist when a chunk of the appropriate size is requested. The freelists can be maintained us usual (using a linked list), or a set of bitmaps can be kept (one for each "mini-hap") with each bit corresponding to one chunk.

This has the following advantages over the previously discussed approaches:

Deallocation

Problems with Explicit Deallocation

Recall that in some languages (Pascal, C, C++), deallocation is "explicit" (under programmer control), while in other languages (Java) it is done "automatically". The main reason to prefer automatic deallocation is that it is easy for the programmer to make mistakes in their deallocation code, which can lead to errors that are very hard to track down.

Storage Leaks

One potential problem is storage leaks; i.e., some storage is never freed, although it is inaccessible (and so will never be used again by the program). The problems with storage leaks are that they can cause a program to use more memory than necessary. This can slow down execution, or, in the worst case, if the program runs out of memory completely, can cause it to crash.

Here is an example of C code that causes a storage leak:

    Listnode *p = malloc( sizeof(Listnode) );
          .
          .   // no copy from p in this code
          .
    p = ...;
When the second assignment to p is executed it over-writes the address of the allocated chunk of storage that was stored in p. That storage becomes inaccessible; the program can no longer use it, but it cannot be freed for reuse.

Dangling pointers

A second potential problem is the use of dangling pointers. A dangling pointer is one that points to storage that has been freed. This is a problem because if the pointer is dereferenced for reading, garbage may be read (causing incorrect behavior at some future point in the execution); if the pointer is dereferenced for writing, it may mess up the freelist, or (if the storage has been re-allocated since it was freed) may corrupt other, seemingly unrelated values. This kind of error is especially difficult to track down.

Here is an example of C code that illustrates a dangling pointer:

    Listnode *p, *q;
    p = malloc( sizeof(Listnode) );
    q = p;
       .
       . // no assignment to q in this code
       .
    free(p);
       .
       . // no assignment to q in this code
       .
    *q = ...
In this example, q becomes a dangling pointer when p is freed. The final write into the memory pointed to by q might corrupt the freelist, or (if the storage was reallocated between the free of p and the dereference of q) might corrupt some object pointed to by another pointer.

A technique for detecting uninitialized and dangling pointers

In some languages, the compiler can generate code to detect (at run time) an attempt to dereference an uninitialized or dangling pointer. One way to do this is by including a new "invisible" field (like the size field) as part of every chunk of storage, as well as including a new "invisible" field associated with every pointer. The two fields are called the lock and the key, respectively.

The technique works as follows:

Note that uninitialized pointers can either have their keys set to some special value (e.g., -1), or the key fields can be uninitialized. In the former case, we are sure to catch an attempt to dereference an uninitialized pointer (since a -1 key won't match any lock); in the latter case we may miss some errors (if by coincidence the value in the uninitialized pointer is an address whose "lock" field happens to match the value in the pointer's (uninitialized) key field. However, that is unlikely, and it may be preferable to save the time that would be needed to initialize all key fields.

Note also that this technique requires that every pointer have a key field, including pointers that are inside dynamically allocated objects. This means that allocation must be done according to the type of the object being allocated (as is done in Pascal, C++, and Java) so that space for the key fields can be included. In C, it is not only possible to allocate storage by requesting a specific number of bytes (rather than using the "sizeof" operator), it is also possible to store pointers in non-pointer variables such as integers (via casting). These kinds of language features make it difficult for a compiler to ensure that techniques like this lock-and-key approach work correctly.

Automatic Deallocation

There are two basic problems that must be solved in order to do automatic storage deallocation:
  1. How to determine whether a chunk of storage is no longer accessible to the program, and
  2. How to make deallocation as efficient as possible; in particular how to avoid long pauses in the program's execution when deallocation is being done.
And there are two basic approaches to doing automatic deallocation:
  1. Reference counting, and
  2. Garbage Collection.

Reference counting

Reference counting involves including yet another "invisible" field in every chunk of storage: its reference count field. The value of that field is the number of pointers that point to the chunk. The value is initialized to 1 when the chunk is allocated, and is updated as follows: When a reference count becomes zero, it means that no pointers are pointing to the object, so it can be returned to free storage. At that time, if the object itself contains pointers, then the reference counts of the objects that they point to must in turn be decremented. Note that this requires being able to recognize pointers in a chunk of storage (e.g., by knowing its type).

There are two important problems with reference counting:

  1. Every write into a pointer requires a test to see whether the old value was null, and also requires that one or two reference counts be updated; this may slow the program down quite a bit.
  2. Cyclic structures cannot be deallocated. This is illustrated by the following (Pascal) code:
           var p: Nodeptr;  /* p is a pointer to a node */
           new(p);          /* p points to newly allocated storage
                               for one node; its reference count is 1 */
           p^.next = p;     /* the next field of the node also points to the
                               node itself, so now its reference count is 2 */
           p = nil;         /* p's value is over-written, so the node's
                               reference count is decremented (from 2 to 1)
                               In fact, it is inaccessible (it points to itself,
                               no other pointer points to it), but we can't tell
                               that just from the reference count. */
           

Garbage collection

The basic idea behind garbage collection is to wait until there is little or no storage left, then: There are many different approaches to doing garbage collection (this is an active area of current research). We will discuss two:
  1. Mark and Sweep
  2. Stop and Copy

Mark and Sweep

The Mark and Sweep technique has two phases:

  1. The mark phase finds and marks all accessible object.
  2. The sweep phase sweeps through the heap, collecting all of the garbage (the inaccessible objects) and putting them back on the freelist.
The mark and sweep technique requires a new "invisible" bit in each chunk of storage: its mark bit (this can be one bit of the chunk's "size" field). This bit is: The mark phase works as follows:

Stop and Copy

For the Stop and Copy technique, the heap is divided into two parts: "old" space and "new" space. Old space is used for allocation, and new space is used for garbage collection. There is no free list; instead, a "first-free" pointer is maintained that points to the first free location in "old" space. When a chunk of n bytes is requested, the location pointed to by the first-free pointer is returned, and the first-free pointer is incremented by n (actually, "invisible" size fields are still maintained as part of each allocated chunk, so allocating a chunk would have to include maintaining that field).

When the "old" space is full, or almost full, the stop and copy garbage collection begins. It finds all accessible objects (by following pointers from the static-data area, etc. as for the mark and sweep technique), but instead of marking them, it copies them to "new" space. Once all accessible objects have been copied, the roles of the "old" and "new" space are reversed; the first-free pointer points to the first free location in the "old" space (the location just after the last copied object).

Below are two picture to illustrate the idea. The stack is shown on the left; it contains 2 pointers to heap objects. The heap is shown on the right. Initially, it contains 6 chunks of allocated storage (labeled A - F) in the "old" space. (The first-free pointer points to the small remaining chunk of storage in the "old" space.) Chunk C itself contains a pointer (pointing to chunk D). In the second picture, the three accessible chunks have been copied to what used to be "new" space, leaving behind all garbage. The first-free pointer now points to the first free location in what used to be "new" space, and is now "old" space.

        <-------- old space --------> <-------- new space -------->

        +---------------------------------------------------------+
        | A | B | C o | D | E | F |  |                            |
        +-----------|---------------------------------------------+
                 ^  |  ^       ^  ^
 |   |           |  |  |       |  |
 | o-------------+  +--+       |  first
 |   |                         |  free
 | o---------------------------+
 |   |
 +---+
 stack


           <------ new space -------> <------- old space ------>

           +---------------------------------------------------+
           |                         | C o | D | F |           |
           +---------------------------------------------------+
                                       ^  |  ^  ^   ^
 |   |                                 |  |  |  |   |
 | o-----------------------------------+  +--+  |   first
 |   |                                          |   free
 | o--------------------------------------------+
 |   |
 +---+
 stack
We have glossed over an important part of the stop-and-copy approach: when a chunk of accessible storage is copied, it is vital that all pointers pointing to that storage be updated (to point to its new location in "new" space). It is easy enough to update the pointer that we follow to find the accessible chunk, but what about other pointers (either on the stack, or in accessible heap objects) that point to the same object? The answer is that when an object is copied from "old" to "new" space, a forwarding pointer is left behind; i.e., the address of the object in "new" space. When we follow a pointer P that points to the same object, we must recognize that it has been replaced with a forwarding pointer, and we must copy the value of the forwarding pointer into pointer P. One way to distinguish an object from a forwarding pointer is to set the invisible size field to 0 to indicate a forwarding pointer (this works because an object will never have size 0, and because we don't need the size field in "old" space any more once the object has been copied to "new" space).

The example given above is repeated below, but this time we assume that object F contains a pointer to C (as well as there being a pointer to C from the stack). The first picture shows the situation before garbage collection. The second picture shows the situation after the top-most stack pointer (the one pointing to C) has been followed; C has been copied to "new" space, a forwarding pointer has been left behind, and the stack pointer has been updated. The third picture shows the final situation after garbage collection has finished; all accessible storage has been copied, all pointers to accessible storage have been updated, and the roles of "old" and "new" space reversed.

        <-------- old space ---------> <------ new space ------>

                 +----------------+
                 |                |
                 v                |
        +-------------------------|----------------------------+
        | A | B | C o | D | E | F o | |                        |
        +-----------|------------------------------------------+
                 ^  |  ^       ^     ^
 |   |           |  |  |       |     |
 | o-------------+  +--+       |    first
 |   |                         |    free
 | o---------------------------+
 |   |
 +---+
 stack

       <--------- old space ---------> <------ new space ------>

                 +---------------+
                 |    +----------|-----+
                 v    v          |     |
        +------------------------|-------|---------------------+
        | A | B |  o | D | E | F o | | C o |                   |
        +----------|-------------------------------------------+
                   |          ^       ^^    ^
                   +----------|-------+|    |
                              |        |   first
 |   |                        |        |   free
 | o--------------------------|--------+
 |   |                        |
 | o--------------------------+
 |   |
 +---+
 stack

       <-------- new space --------> <-------- old space ------>

                                     +------------+
                                     |            |
                                     |  +--+      |
                                     v  |  v      |
        +-------------------------------|---------|--------------+
        |                           | C o | D | F o |            |
        +--------------------------------------------------------+
                                     ^         ^     ^
                                     |         |     |
                                     |         |     first
 |   |                               |         |     free
 | o---------------------------------+         |
 |   |                                         |
 | o-------------------------------------------+
 |   |
 +---+
 stack

Stop and Copy garbage collection is currently considered the best approach. It has a number of advantages compared to mark and sweep:

Deutsch-Bobrow deferred reference counting

There is a technique called deferred reference counting that combines some of the features of (normal) reference counting and garbage collection. An important insight behind this technique is that much of the (time) overhead of reference counting happens because of traversals of heap data structures, using a local variable as a "temporary" pointer. For example, consider the following code that traverses the linked list pointed to by L:
Listptr tmp = L;
while (tmp != null) {
   ... do something with tmp->data ...
   tmp = tmp->next;
}
(Note: "tmp->next" is C syntax; it refers to the "next" field of the object pointed to by tmp.)

If normal reference counting is used, then before the loop (when the value in L is copied in to tmp), the reference count of the first item on the list is incremented. The assignment "tmp = tmp->next" inside the loop causes the following changes to be made on each iteration:

  1. The reference count of the list item pointed to by tmp is decremented (because tmp is about to be over-written).
  2. The reference count of the next item on the list is incremented (because tmp now points to it, as well as the "next" field of the previous item on the list).
After the loop finishes, all reference counts are back to where they started; a lot of extra work has been done for nothing!

To avoid this kind of extra work, deferred reference counting works as follows:

Note that this approach requires the compiler to generate different code for different kinds of assignments: For example, if p is a local variable of type "pointer to list", then assignments to p itself (e.g., "p = new list;", or "p = q;") do not involve any updates to reference counts. However, assignments like "p->next = new list;" do require reference count updates, since "p->next" is a location in the heap.

How to identify pointers

Most of the automatic deallocation techniques discussed above require that it be possible to recognize pointers at runtime. There are several possible ways to do this:
  1. Every word includes a one-bit tag (0 means "not a pointer, and 1 means "is a pointer). This has a number of consequences:
    • Values (including addresses) cannot use this bit, so the ranges of possible values are smaller than normal.
    • Operations must preserve this bit. This means that hardware support is necessary.
    • On method entry, this bit must be initialized for all local variables.
    • When a chunk of storage is allocated, this bit must be initialized for all of the fields in the allocated object.
  2. Again, every word has a tag, but instead of storing the tag in the word itself, it is maintained in a separate bit-map (that includes one bit for every word in the heap, the stack, and the static-data area). In this case, no bit is "stolen" (so the range of values is not restricted, and nothing special needs to be done to make sure that operations don't clobber the special bit). However, it is still necessary to initialize the bit on method entry and on storage allocation.
  3. A final possibility is to associate with each variable and each allocated object (rather than with each word) a tag telling its type (which could be implemented as an index into an array of type descriptors, maintained at runtime). While the tag would require more than a single bit, this approach might save space because only one tag is required for an entire object, rather than one bit per word.

Summary

The important concepts covered in this set of notes are: