On the matter of indentation and whitespace.
aka
Coding Guidelines
Bolo
Introduction
The purpose of this note is to introduce you to a uni-
form scheme for indenting and formatting code. This is not
supposed to be a rigid standard that can not be broken; nei-
ther is it an attempt to put a straitjacket on you and con-
fine your thinking. What it is is an attempt to make the
code that everyone writes for the system easily understand-
able, at least at the obvious formatting level, by everyone
else.
Code any way you like in your own personal code. When
you are orking on the system, please try to use the follow-
ing guidelines. Aand that is EXACTLY what these are GUIDE-
LINES. If everyone follows them, the code is easy to read
for everyone. And when you are trying to figure out what is
going wrong, at least having a common style gets rid of one
layer of obfuscation.
Warning
Don't be over-zealous about rewriting existing code to
have the new indentation. It wreaks havoc with the CVS
diffs, since you can't "look through" a formatting change
easily. Instead, apply the guidelines to new code that you
write, and to methods and class declarations that your sub-
stantially rewrite.
If you partially rewrite something and are also going
to add new things into it, please don't do the rewrite and
the indentation changes at the same time. Instead, just do
a set of formatting changes only to the existing code and
check that in with a note that the check is only a bunch of
formatting changes. Afterwards, go add the new stuff to the
class. It is a lot easier to deal with changes like that
when the format change is different from
Why have Coding Guidelines?
One of the big deals about indentation is that it
allows one to easily understand the structure of the code
without having to read it line by line. Poorly indented
code is like reading a book or paper with bad grammar or
structure. You can read it, but it is painful and time
consuming, not something that you do easily and enjoy.
Why is this important? Well, other people need to look
at your code. And quite often it is code that they have
never seen before, or that they rarely look at. And on top
of it, they are probably looking at the code because they
are tracking down a bug and trying to get it fixed. Which
means that they have a lot of code to look at. If they have
to take a lot of time to understand your code, often curso-
rily, just to verify that it works correctly, it cuts down
tremendously on the actual time spent finding and fixing
bugs! Instead, if they can just browse over your well writ-
ten and formatted code, verify that it seems to be doing the
right thing, then they can move on to the next thing they
have to look at to track down the bug. When you are looking
at a stack traceback of two or three items, this isn't a big
deal. When you are looking at 30 stack tracebacks, each
20-30 items deep, stuff like this is a big deal, and it can
end up consuming a lot of time and enthusiasm and patience
of the person doing the debugging.
What is Coupling?
In this note, and in some others I'll mention coupling
a lot. But, you ask, what the *^%^ is it? It is the
dependency of one portion of a system on another portion of
a system. It exists at several levels. To be brief, the
levels are include file, linking, class declaration or
interface, and class definition or implementation.
At some point you have to have some sort of use of
other components of a system, or otherwise you don't have a
system. At the same time, you must be very careful about
what portions of components you expose to be seen by others.
As the amount of exposed stuff increases, the coupling of
the system increases, as well as its complexity. This
affects things in several ways, and they are usually all
bad.
o For example, excessive include-file coupling can mean
that touching an inccocuous include file somewhere
causes the whole system to recompile. Or several
include files depending upon other include files to
include stuff that this include file requires.
o Excessive coupling at link time means that you need to
drag in all sorts of libraries and objects that you
don't really want or need.
o Coupling at class declaration level means that the dec-
laration of a class requires exposing, via include
files, the definition of classes that the class in
question uses, either in the definition of the class,
or in the interface of the class.
o Lastly, coupling at the implementation level is when a
class uses another class. To build a system, you need
a certain amount of coupling at this level. However,
the interface between the classes needs to be well
designed to avoid making the class couple to the imple-
mentation of another class, instead of to the interface
of another class.
When coupling occurs, parts of a system become depen-
dent, needlessly, on other parts. It also means that you
can't test just a portion of the system, because it is cou-
pled too closely to the rest of the system and can't be run
in isolation. In a word, coupling is bad and you should
really try to avoid it.
Guidelines
Here we go ...
o Indents are for normal 8 char tab stops. It is what
everyone has available. It is what all the printers
and tools and everything use. It shows enough sepera-
tion that it is easy to match indent levels.
o Don't indent at the file level because of namespaces.
o For functions/methods, the open brace is on the line
start after the definition.
o For normal control strucutes, such as if/while/else
the open brace follows the if/while/else.
o Closing braces are on a line by themselves at the same
indent level as the matching if/while/then/else state-
ment.
o The brace to start a function or method definition
should be on a line by itself, in the first column,
following the function name and arguments. The end
brace matches it, in the first column.
o The one place I recommend violating the previous guide-
line is in the case of methods declared inside a class
declaration. That is someplace that whitespace is a
precious commodity. In that case, put the brace after
the name and arguments. The sames goes for the ':'
initializers in this case. If you can fit everything
nicely on one line, even better, do it!
o Decl arguments are on the decl line, and if you have to
introduce a line break, the following arguments should
match the indentation of the first. If it is a really
long function name, such that the decls would wrap any
which way you try, ibreak the first argument line and
just indent everything a bit so that the decls fit on a
line without wrapping.
o Always put c preprocessor (cpp) commands in the first
column. Don't indent them. Also, don't put the '#' in
the first column and indent the body of the cpp com-
mand.
o Seperate the cpp command from the argument, just don't
blast them together #include"foobar.h"
o Use member variable ':' inits. The ':' should be in
the first column. The inits should be in order of dec-
laration, and specified one per line.
o Always indent stuff, just don't shove it all on one
line; for example: 'if (err) return err;' it does
nothing to make the code easier to read and understand
the structure of. The indentation gives a hint that
something is going on and needs to be looked at. How-
ever, This can be used to effect in a small function or
method, where there isn't a lot to read. In a larger
function or method, however, such a non-indented struc-
ture is something begging to be ignored.
o Don't use extra spaces to seperate tokens in the code,
such as around parenthesis in expressions and such.
They actually make things more difficult to read.
o I recommend placing data members first in a class dec-
laration. Follow the data members with internal, pri-
vate methods. Follow the internal methods with pro-
tected methods. And last, expose the user interface to
the class, the public methods.
o A big block comment is usually telling you that some-
thing is important and should be read. Lesser comments
provide info about what is going on. One liners give
you a hint about something that isn't obvious.
o Don't use big block comments often, especially in class
definitions. Or in the midst of code. There is a con-
stant battle going on trying to stuff enough informa-
tion on a "screen" or a "page" so that people can
encompass the code and understand it better. Adding
large comments in the middle just spread the code far-
ther apart and make it more difficult to understand its
entire structure.
o Seperate blocks of declarations from blocks of code.
If code falls in with declarations it is often glossed
over as part of the declaration.
o When you have block of declarations, or sometimes even
a single declaration, it is good to seperate the decla-
ration names from the declaration types. Do this with
a tab or two so all the names line up. It makes the
variable declarations easier to read.
o Don't use the "C++" style of declaration modifiers that
Stroustroup uses in his style. To be brief, that
groups modifiers with the declaration type instead of
the declaration name. Instead, do the normal "C" style
of declaring where modifiers, such as & and * are
grouped against the declared name. This immediately
raises a flag to a reader that something is different
about the declaration. With the other way, these
important hints are often lost in the noise.
o C++ public:, private:, and protected keywords in class
declarations should not be indented, so they stand out
clearly.
o If you use goto's, the labels you use should be unin-
dented. 1/2 level works well in this case.
o If you have friend declarations inside of a class,
unindent them a half indent. Also you should explain
why these classes are friends of the class in question.
o It aids understanding of code considerably if you start
all data members with an underscore. When you see that
there is no doubt about what is happening, or where
that variable magically came from.
o Often you have externally visible methods in a class,
which are just wrappers for internal methods that do
the real work. In this case, prefix the name of the
fIinternal method with an underscore.
o If it is non-obvious why an internal method exists, you
could always prefix something to it to indicate why it
is internal. Such a prefix can also make understanding
easier, so people don't inadverdently use the internal
method incorrectly. For example, _unlocked_method() to
indicate that the method assumes the caller in the
class has providing locking as necessary.
o If you have a method or function that isn't imple-
mented, just don't let it sit there and do nothing, or
return that it succeeded. If it returns an error, have
it return the unimplimented error. If it doesn't
return an error, crash the system. It may not be ele-
gant, but it will get your attention. Otherwise, peo-
ple will wonder why in the world everything is appar-
ently working but not producing the correct results.
o #defines are bad news. They pollute the global names-
pace, and invade the context of all classes. Then what
happens is that people start using them because they
are conveniently there. And, portions of the system
become coupled together. In C++, the best way to avoid
#define use is to define enumerations at the class
level. This firmly scopes that information, and also
ensures that the correct values are being used.
o Global variables are another thing that is bad. Global
class instances are even worse. Why? Well, global
variables are unencapsulated state. Global class dec-
larations mean that a global constructor needs to be
run for a class. The ordering and error catching of
those global class instance is all random. You can't
recover gracefully from errors, or ensure an ordering
that works correctly. They cause real problems and I
encourage not using them. Instead, they should be
scoped inside a class at the very least. However, see
the next entry ...
o What I said above for global variables goes equally
well for static class members, for all the same rea-
sons. The better thing to do, if you really need the
equivalent of a class static, is to make a class that
holds all the things that would be static in a class.
Each instance of the class can have a reference to the
"holder" class. Doing things this way also guarantees
that you can instantiate multiple, independent versions
of the class and its holder class. This will break the
possibility to do the last, but in some cases, where
there many instances and memory use becomes an issue,
perhaps a class static pointing to the "holder" class
is in order, or even a global variable. But that is an
optimization that can be done at a later date.
o Eliminate include file coupling by insuring that the
include file for a class includes all include files
that a class needs for its own declaration. Don't
include things that the class needs for its implementa-
tion, though, becuase that exposes the implementation,
or parts of it, to the outside world. While that may
not increase coupling, it certainly does increase the
amount of work the compiler has to do to compile a
given file. Multiply that by the number of other
source files including that file, directly or indi-
rectly, and it is a big overhead
o If you just need some classes in the interface of a
class, don't #include the include files for those
classes. Instead, use C++'s ability to have a forward
declaration for a class. You'll need to #include the
proper include files in the implementation of the
class, but at least all that junk won't need to be seen
by the rest of the world.
o Think hard about putting instances of one class in the
declaration of another class. When you do that, you
create a coupling. Sometimes it is not really bad,
sometimes it is necessary, for example, with a template
class. Perhaps it is better to use a pointer to that
class, or to use an implementation-only class to hold
random information. This way you don't need to expose
portions of the implementation to users of the class in
question.
o Be very careful about exposing data types used in the
implementation of a class in the interface of that
class. Especially data types provided by a third-party
software package, or even the underlying operating sys-
tem. Doing that exposes users of your class to that
third party package or the OS. It is better to declare
your own types, system-wide if need be, and use them
instead.
o Don't optimize code prematurely by using inline direc-
tives, or by coding things directly in the class defi-
nition. Instead, place the implementation in the .cpp
file. If profiling shows that something needs to be
optimized, then that can be done, on-purpose, later on.
o If you are having code that is inline, or have moved
something from a .cpp to a .h so it can be inline. Do
not put big chunks of code in the class definition.
Declare them inline and put them after the class decla-
ration in the include file. And code them normally,
just as they would be in the .cpp file, not to try to
compact them.
o If a method does any decision making, versus just act-
ing as a dumb accessor, think several times before
putting it inline or in the class declaration. If you
ever need to change something, crunch, suddenly you
need to recompile a larger amount of code. Same thing
holds for constructors and destructors, especially if a
class has pointers or other things that might need to
be debugged in the future.
o Don't bother with copying virutal keywords in method
declarations from an inherited-from class that declares
a method virtual. In other words, the virtual declara-
tion of a method should only exist in the class where
the method is virtualized.
o Factoring of code is important. Even if it only used
in one place, and dragged in inline to make it effi-
cient, factoring of chunks of code can greatly add to
the understanding of something. If something is used
multiple times, it is also an excellent candidate for
factoring.
If you find yourself copying code, don't. This is an
excellent indication that the code needs to be factored
into a method or function so it can be used by multiple
callers.
o If you virtualize any methods in a class declaration,
the destructor for that class must also be virtualized.
o When you are designing or implementing a component,
remember that the interface is everything. Given a
good interface, you can write an absolutely sucky,
stupid implementation to prototype something and get it
off the ground and running. When the implementation
shows it's limitations, there is only one place you
will need to revisit to improve things, and that is the
implementation of that component alone. You won't end
up changing everything that uses the class to adapt to
its' new implementation.
o It is said that Everything is in a name. That is
entirely true when writing code. A good name can make
the difference between instantly understanding what is
going on, and spending lots of time trying to under-
stand what is going on. Take the time and think about
a good name for what you are doing. Don't name some-
thing after how it does it, but rather after what it
does.
Newer Items
o Something to mention is that I often do not re-indent
some code. Rather I leave it as it is and work in its
existing indentation. Typically this is because the
code needs to be totally redesigned and rewritten, and
there just isn't the time to do it. So, you keep
patching it and trying to make minor cleanups, once you
are aware of some of the issues involved, that will
make the eventual rewrite easier.
o If a class has any complex state associated with it,
add a print() method and an operator<<() to it. If it
is needed during debugging, perhaps even add a extern
"C" function that can be used from the debugger to
print the object. This print method makes it much eas-
ier to determine what the object is doing, and to
determine if it is in a valid state. This is true
whether you are using the debugger, try to debug a
problem, or whatever.
o If you are going to make a class printable, give the
class a ostream &T::print(ostream &) const method.
Also add an ostream &operator<<(ostream &, const T &)
function that is used to print the class. This way the
output operator does not need to be a friend of the
class. It also means the print method can be written
as a normal method with access to class members and
statics, instead of something else.
o The case ...: components of a switch statement are
indented at the same level as the switch itself. The
stanzas of the case are at the next indent level.
o If you are using exceptions, be certain to firewall
exceptions properly. You need to do this when entering
and leavin code that does not know about exceptions,
such as libraries and system calls, and callbacks. It
may be necessary to convert an exception to an error so
it can be passed through the component properly, and
the reconvert the error to an exception once exception
land has been reached. If this is not done, the code
that doesn't know about exceptions, but that deals with
errors quite well, will not be able to do normal error
handling and will appear broken.
o Do Not use exceptions. Use errors instead. If you are
calling code that might generate an exception, you need
to generate a firewall to make certain errors are han-
dled correctly.
o If there are exceptions being used in the system, be
very careful to wrap all stateful constructs in some
sort of C++ scoped object so that the state can be
undone properly if an exception goes off.
o If exceptions are used, or it is possible that an
exception can go through a chunk of code. It is neces-
sary to ensure that all clases used that are exposed to
exceptions in this manner can work in an arbitrary man-
ner. This means thay have to clean up whatever state
that they have and work properly in the face of arbi-
trary (non-planned) use.
o Code the assignment operator so it works correctly in
the case of self-assignment. In most cases, you can do
something like this:
const A &A::operator=(const A &r)
{
if (this == &r)
return *this;
/* the actual assignment */
return *this;
}
o Don't say this-> all the time. C++ does it for you!
And, if you are trying to overload an argument name
with something built into the class by this, don't even
try.
o Some guidelines for #if use. Use #ifdef notyet if this
is code that is intended to be used but isn't used yet.
The notyet indicates it is for the future. Use #if 0
and #else as appropriate to show various alternatives
of a piece of code. The one that is currently active
should compile, and it should be in a #if 1 as the
first thing, or a #else as the last. A comment in each
stanza may be appropriate. If you are disabling code
because something else is broken, explain what is bro-
ken.
o If you find something disgusting or wrong, mark it with
an XXX mark and describe what it is that is wrong.
People can then see the XXX comments and know that they
might need to checkout this thing further. It provides
a indication that something is funny and may need to be
looked at. But it isn't such a big issue that it needs
to be brought up in a design or other meeting ... well,
maybe it does. One typical use is a bug that can't be
fixed because of another problem.
o Format code and comments to fit into 80 characters. It
is the "standard" size of printers, terminals, xterms,
and everything else. If you want to be really nice,
only use 79 characters so the 80th character doesn't
cause a line wrap in emacs.
o Sometimes if a wrapped line is short, and breaking it
would just result in something equally ugly, leave it.
o Don't put multiple statements on a line.
o Don't use // comments for block comments. Only use
them for single line comments. An especially bad for-
mat to avoid are multiple indented // comments trailing
the end of the line. It basically makes the comment
impossible to edit easily without an editor that will
edit the comment for you, and it also forces the com-
ment to be very columunar and take up lots of vertical
space. If you have something to say about a something,
say it in a block comment before the something.
o Do not do ad-hoc argument parsing. Use getopt.
o Don't use void indiscriminately. A 'void *' is a
pointer to an unknown data type. You can't do arith-
metic on a 'void *'. A 'char *' is a pointer to a
chunk of memory. You can do arithmetic on a 'char *'.
The lesson on this is that you may want or need to use
a 'void *' in your interfaces so pointers don't need to
be cast. However, you should use 'char *' internally
to point to areas of memory, since the areas have a
length and you probably want to do pointer arithmetic
on them. The two types are completely different, use
them correctly.
o Don't try to combine data values and error values in
function returns. If you need to return error codes,
pass in a parameter (by reference) to return the value.
Return the error code via the function return value.
Mixing the two is just bad news. It makes it very dif-
ficult to do uniform error handling, and it grossly
limits your data types.
o Never assume that there will be only one instance of
your class. Always implement the general case. Just
because you can't see a need for it now doesn't mean
there isn't a need. Implementing for the general case
will also provide a cleaner solution that is easier to
maintain and extend.
o Don't use ********* comment lines in ordinary usage.
Stuff like that should be reserved for BIG EYE OPENING
DISASTER comments or other indications that something
really important or difficult or bad is going on.
o Only deal with those errors that you know how to han-
dle. Handle those errors. Either pass other errors up
to the caller to deal with, or ABORT because you don't
know what to do. If you just let the error slide you
will have big time problems tracking down what is going
wrong.
o Use streams and '<<' and '>>' operators when possible.
It isolates the formating from type changes to the
arguments and just makes it work. It also gets rid of
the overhead of interpreting the format string, since
the compiler can generate the format code directly. If
you still need to use printf-like formatting, use
form() instead, and output to the I/O streams.
o Don't use stdio I/O, use iostreams I/O.
o C++ is NOT Pascal. There can be multiple exits from a
function or method. It can considerably increase the
readability and comprehension of code to use this capa-
bilities. For example, to return early when various
conditions are not met, and then the "main body" of the
function contains what really happens. Verus, for
example, nesting go and no-go if statements and burying
the "proper" behavior of the function inside levels of
indentation and control structure.
o With regards to complex control structure ... If a
function or method is complex enough it can simplify
things to put all the "safety checks" and environmental
changes in a wrapper function. This divorces all the
setup and shutdown complexity from that of the task
being done, so that it is obvious what is being done.
Afterwards
This set of guidelines is trying to dump out as much
information as I can think about how to do things well. It
actually extends well past the area of formatting code to
design decisions about code, overall code structure, and
other issues. At some point, I feel that have barely grazed
the surface of the kinds of issues that you should consider
when designing and writing code. All that is above, and way
more, goes on in my head when I look at code, write code,
design code, design systems, and do everything else that I
do. I hope I have at least provided some idea of the things
you should consider to do a better job designing and imple-
menting a system.
I'll try to do a better job organizing this document in
the future, but it was tough enough just trying to dig up
all the things to consider and get it written down :)