CS536 Spring 1998 Notes for April 27.

Code generation for Classes and Structures

In C and C++, addresses for variables for classes and structs are done by computing an offset into the the structure and using this as the real address of this entire identifier. For example:

class c
{
	int a;
	int sum(int b) { return a+b; }
}

The first thing that happens here is that at compile time we figure out the size of this class. Since an int on most architectures is 4 bytes, we start with 4 bytes for the size. A member function, the next element in the class, does not actually occupy space within the class instance, it is treated like a global function. The exception to this is virtual members.

class shape
{
	color mycolor;
	float cost() { return cost * area(); }
	virtual float Area() = 0;
};

class square : public shape
{
	float side;
	float Area() { return side * side; }
};

class circle : public shape
{
	float radius;
	float Area() { return 3.141592653589793 * radius * radius; }
}

Here we see a class hierarchy where there is a virtual function. A virtual function is one where, given a generic version of a class, we can call the virtual function, and get the appropriate function for the specific type of object. So, say for example:

int main()
{
     shape *obj;
     obj = getShape();  // Some function that returns us a shape
     cout << obj->Area();
     return 0;
}

This little program would calculate the area for a shape, without us knowing at compile time what kind of object it is. This is implemented by the compiler as a function pointer. Normally functions are not part of a classes size, but in this case, the function pointer is added to make sure that we get the correct version.

Unions and Variant Records

A union is a structure where all of the members share the same memory space.

union
{
     int a;
     double d;
} u;

This will create a block of memory that is 8 bytes long. When a is referenced, it will be represented by the first 4 bytes of the union. When the double is referenced, it will reference the entire 8 bytes. This can be dangerous, since we could do this:

u.d = 12.34;
u.a = 1234;
cout << u.d;

The results of this are architecture specific since when we assign to the int part of the union, we change the front of the double. It will probably clobber the sign and exponent parts as well as part of the mantissa if the double is being represented by IEEE floating point. In addition to unions, Pascal had a notion of variant records

RECORD
      CASE T: INT OF
      1: A: INT
      2: D: DOUBLE
END

This structure works just like a union, except that the structure can only act as an int or a double, but not both at the same time.

Arrays

One dimensional arrays are easy to represent in memory as linear chunks. When generating code to reference into an array, there are several things we must do. First, we need to calculate the size of the array. This can be do by simply multiplying the number of elements times the size of each element.

sizeof(A) == num_elements * size_per_element

so

int a[10]

yields

sizeof(a) == 10 * 4 == 40

Some problems arise on architectures where variables are required to be word aligned. We must introduce 'padding' in order to make every element begin on a word boundary.

Run-time vs. Compile-time

Compile-time

This means that the operation happens when the code is being compiled. This is good since it means that it never has to be done again once the code is compiled. However, it isn't always possible to compute everything at compile-time. i.e. a[4]

Run-time

When we don't have enough information to pre-calculate things, we must generate code that will calculate values at run-time. i.e. a[i]

So, getting back to arrays, how does one go about calculating a[i]. The answer is that we simply take the address of a, and add the offset of i into it. The offset is simply i times the sizeof each element.

a[i] == ADR(A) + i * sizeof(int)
a[i] == ADR(A) + i * 4
could equal 100 + 4 * i

Some Optimizations

We can optimize in some specific cases, for example, loops.

for(i = 0; i < 1000000; ++i)
{
       a[i] = 0;  // aka: *(adr(a) + i * 4) = 0
}

If we look at what is actually happening here, we can see that a lot of this calculation is being performed over and over again to calculate nearly the same thing. We can optimize this to:

LOC = ADDR(A);
for(i = 0; i < LOC + sizeof(A); LOC = LOC + 4)
{
      *LOC = 0;
}

Now consider the case of arrays of structures. Say that we wanted to make an array of the structure below.

struct
{
      int i;
      char c;
} b[10];

The problem that is going to appear is that the structure itself is 5 bytes large, but it must be word aligned. This means that it is going to be padded to 8 bytes, creating a rather space inefficient storage mechanism. One way to combat this problem is to store each of the two elements in a different array. Although this would be slower, since there is an extra memory dereference, it would be more space efficient. There are also caching problems with this.

Some Programing Language Issues

Designers of the C/C++ languages decided to design arrays such that they used 0 indexing. This means that the arrays first element is element 0. Other languages have a notion of an arbitrary indexing number. You can do stuff like:

Array[1900 ... 1998]

The thought behind 0 indexing was that it was faster, this is not necessarily true, since we can optimize our lookups into these arbitrary arrays by doing some pre-computing.

ADR(B) + size * index - 1900 * size    // This is how we compute the index
1000 + 4 * i - 7600
-6600 + 4 * i  // This is just as fast as 0 indexing.

Multi-Dimensional Arrays

Multi-Dimensional arrays are almost the same as linear arrays, except that we add one more term when computing the address: the row length * the row number. This would be changed if we were on a column major system though. Row major is where the array is organized such that the rows of the array are filed in first, going in column order. Column major order is filling the columns first, in row order. Row major is much more popular.