On the matter of indentation and whitespace. aka Coding Guidelines Bolo Introduction The purpose of this note is to introduce you to a uni- form scheme for indenting and formatting code. This is not supposed to be a rigid standard that can not be broken; nei- ther is it an attempt to put a straitjacket on you and con- fine your thinking. What it is is an attempt to make the code that everyone writes for the system easily understand- able, at least at the obvious formatting level, by everyone else. Code any way you like in your own personal code. When you are orking on the system, please try to use the follow- ing guidelines. Aand that is EXACTLY what these are GUIDE- LINES. If everyone follows them, the code is easy to read for everyone. And when you are trying to figure out what is going wrong, at least having a common style gets rid of one layer of obfuscation. Warning Don't be over-zealous about rewriting existing code to have the new indentation. It wreaks havoc with the CVS diffs, since you can't "look through" a formatting change easily. Instead, apply the guidelines to new code that you write, and to methods and class declarations that your sub- stantially rewrite. If you partially rewrite something and are also going to add new things into it, please don't do the rewrite and the indentation changes at the same time. Instead, just do a set of formatting changes only to the existing code and check that in with a note that the check is only a bunch of formatting changes. Afterwards, go add the new stuff to the class. It is a lot easier to deal with changes like that when the format change is different from Why have Coding Guidelines? One of the big deals about indentation is that it allows one to easily understand the structure of the code without having to read it line by line. Poorly indented code is like reading a book or paper with bad grammar or structure. You can read it, but it is painful and time consuming, not something that you do easily and enjoy. Why is this important? Well, other people need to look at your code. And quite often it is code that they have never seen before, or that they rarely look at. And on top of it, they are probably looking at the code because they are tracking down a bug and trying to get it fixed. Which means that they have a lot of code to look at. If they have to take a lot of time to understand your code, often curso- rily, just to verify that it works correctly, it cuts down tremendously on the actual time spent finding and fixing bugs! Instead, if they can just browse over your well writ- ten and formatted code, verify that it seems to be doing the right thing, then they can move on to the next thing they have to look at to track down the bug. When you are looking at a stack traceback of two or three items, this isn't a big deal. When you are looking at 30 stack tracebacks, each 20-30 items deep, stuff like this is a big deal, and it can end up consuming a lot of time and enthusiasm and patience of the person doing the debugging. What is Coupling? In this note, and in some others I'll mention coupling a lot. But, you ask, what the *^%^ is it? It is the dependency of one portion of a system on another portion of a system. It exists at several levels. To be brief, the levels are include file, linking, class declaration or interface, and class definition or implementation. At some point you have to have some sort of use of other components of a system, or otherwise you don't have a system. At the same time, you must be very careful about what portions of components you expose to be seen by others. As the amount of exposed stuff increases, the coupling of the system increases, as well as its complexity. This affects things in several ways, and they are usually all bad. o For example, excessive include-file coupling can mean that touching an inccocuous include file somewhere causes the whole system to recompile. Or several include files depending upon other include files to include stuff that this include file requires. o Excessive coupling at link time means that you need to drag in all sorts of libraries and objects that you don't really want or need. o Coupling at class declaration level means that the dec- laration of a class requires exposing, via include files, the definition of classes that the class in question uses, either in the definition of the class, or in the interface of the class. o Lastly, coupling at the implementation level is when a class uses another class. To build a system, you need a certain amount of coupling at this level. However, the interface between the classes needs to be well designed to avoid making the class couple to the imple- mentation of another class, instead of to the interface of another class. When coupling occurs, parts of a system become depen- dent, needlessly, on other parts. It also means that you can't test just a portion of the system, because it is cou- pled too closely to the rest of the system and can't be run in isolation. In a word, coupling is bad and you should really try to avoid it. Guidelines Here we go ... o Indents are for normal 8 char tab stops. It is what everyone has available. It is what all the printers and tools and everything use. It shows enough sepera- tion that it is easy to match indent levels. o Don't indent at the file level because of namespaces. o For functions/methods, the open brace is on the line start after the definition. o For normal control strucutes, such as if/while/else the open brace follows the if/while/else. o Closing braces are on a line by themselves at the same indent level as the matching if/while/then/else state- ment. o The brace to start a function or method definition should be on a line by itself, in the first column, following the function name and arguments. The end brace matches it, in the first column. o The one place I recommend violating the previous guide- line is in the case of methods declared inside a class declaration. That is someplace that whitespace is a precious commodity. In that case, put the brace after the name and arguments. The sames goes for the ':' initializers in this case. If you can fit everything nicely on one line, even better, do it! o Decl arguments are on the decl line, and if you have to introduce a line break, the following arguments should match the indentation of the first. If it is a really long function name, such that the decls would wrap any which way you try, ibreak the first argument line and just indent everything a bit so that the decls fit on a line without wrapping. o Always put c preprocessor (cpp) commands in the first column. Don't indent them. Also, don't put the '#' in the first column and indent the body of the cpp com- mand. o Seperate the cpp command from the argument, just don't blast them together #include"foobar.h" o Use member variable ':' inits. The ':' should be in the first column. The inits should be in order of dec- laration, and specified one per line. o Always indent stuff, just don't shove it all on one line; for example: 'if (err) return err;' it does nothing to make the code easier to read and understand the structure of. The indentation gives a hint that something is going on and needs to be looked at. How- ever, This can be used to effect in a small function or method, where there isn't a lot to read. In a larger function or method, however, such a non-indented struc- ture is something begging to be ignored. o Don't use extra spaces to seperate tokens in the code, such as around parenthesis in expressions and such. They actually make things more difficult to read. o I recommend placing data members first in a class dec- laration. Follow the data members with internal, pri- vate methods. Follow the internal methods with pro- tected methods. And last, expose the user interface to the class, the public methods. o A big block comment is usually telling you that some- thing is important and should be read. Lesser comments provide info about what is going on. One liners give you a hint about something that isn't obvious. o Don't use big block comments often, especially in class definitions. Or in the midst of code. There is a con- stant battle going on trying to stuff enough informa- tion on a "screen" or a "page" so that people can encompass the code and understand it better. Adding large comments in the middle just spread the code far- ther apart and make it more difficult to understand its entire structure. o Seperate blocks of declarations from blocks of code. If code falls in with declarations it is often glossed over as part of the declaration. o When you have block of declarations, or sometimes even a single declaration, it is good to seperate the decla- ration names from the declaration types. Do this with a tab or two so all the names line up. It makes the variable declarations easier to read. o Don't use the "C++" style of declaration modifiers that Stroustroup uses in his style. To be brief, that groups modifiers with the declaration type instead of the declaration name. Instead, do the normal "C" style of declaring where modifiers, such as & and * are grouped against the declared name. This immediately raises a flag to a reader that something is different about the declaration. With the other way, these important hints are often lost in the noise. o C++ public:, private:, and protected keywords in class declarations should not be indented, so they stand out clearly. o If you use goto's, the labels you use should be unin- dented. 1/2 level works well in this case. o If you have friend declarations inside of a class, unindent them a half indent. Also you should explain why these classes are friends of the class in question. o It aids understanding of code considerably if you start all data members with an underscore. When you see that there is no doubt about what is happening, or where that variable magically came from. o Often you have externally visible methods in a class, which are just wrappers for internal methods that do the real work. In this case, prefix the name of the fIinternal method with an underscore. o If it is non-obvious why an internal method exists, you could always prefix something to it to indicate why it is internal. Such a prefix can also make understanding easier, so people don't inadverdently use the internal method incorrectly. For example, _unlocked_method() to indicate that the method assumes the caller in the class has providing locking as necessary. o If you have a method or function that isn't imple- mented, just don't let it sit there and do nothing, or return that it succeeded. If it returns an error, have it return the unimplimented error. If it doesn't return an error, crash the system. It may not be ele- gant, but it will get your attention. Otherwise, peo- ple will wonder why in the world everything is appar- ently working but not producing the correct results. o #defines are bad news. They pollute the global names- pace, and invade the context of all classes. Then what happens is that people start using them because they are conveniently there. And, portions of the system become coupled together. In C++, the best way to avoid #define use is to define enumerations at the class level. This firmly scopes that information, and also ensures that the correct values are being used. o Global variables are another thing that is bad. Global class instances are even worse. Why? Well, global variables are unencapsulated state. Global class dec- larations mean that a global constructor needs to be run for a class. The ordering and error catching of those global class instance is all random. You can't recover gracefully from errors, or ensure an ordering that works correctly. They cause real problems and I encourage not using them. Instead, they should be scoped inside a class at the very least. However, see the next entry ... o What I said above for global variables goes equally well for static class members, for all the same rea- sons. The better thing to do, if you really need the equivalent of a class static, is to make a class that holds all the things that would be static in a class. Each instance of the class can have a reference to the "holder" class. Doing things this way also guarantees that you can instantiate multiple, independent versions of the class and its holder class. This will break the possibility to do the last, but in some cases, where there many instances and memory use becomes an issue, perhaps a class static pointing to the "holder" class is in order, or even a global variable. But that is an optimization that can be done at a later date. o Eliminate include file coupling by insuring that the include file for a class includes all include files that a class needs for its own declaration. Don't include things that the class needs for its implementa- tion, though, becuase that exposes the implementation, or parts of it, to the outside world. While that may not increase coupling, it certainly does increase the amount of work the compiler has to do to compile a given file. Multiply that by the number of other source files including that file, directly or indi- rectly, and it is a big overhead o If you just need some classes in the interface of a class, don't #include the include files for those classes. Instead, use C++'s ability to have a forward declaration for a class. You'll need to #include the proper include files in the implementation of the class, but at least all that junk won't need to be seen by the rest of the world. o Think hard about putting instances of one class in the declaration of another class. When you do that, you create a coupling. Sometimes it is not really bad, sometimes it is necessary, for example, with a template class. Perhaps it is better to use a pointer to that class, or to use an implementation-only class to hold random information. This way you don't need to expose portions of the implementation to users of the class in question. o Be very careful about exposing data types used in the implementation of a class in the interface of that class. Especially data types provided by a third-party software package, or even the underlying operating sys- tem. Doing that exposes users of your class to that third party package or the OS. It is better to declare your own types, system-wide if need be, and use them instead. o Don't optimize code prematurely by using inline direc- tives, or by coding things directly in the class defi- nition. Instead, place the implementation in the .cpp file. If profiling shows that something needs to be optimized, then that can be done, on-purpose, later on. o If you are having code that is inline, or have moved something from a .cpp to a .h so it can be inline. Do not put big chunks of code in the class definition. Declare them inline and put them after the class decla- ration in the include file. And code them normally, just as they would be in the .cpp file, not to try to compact them. o If a method does any decision making, versus just act- ing as a dumb accessor, think several times before putting it inline or in the class declaration. If you ever need to change something, crunch, suddenly you need to recompile a larger amount of code. Same thing holds for constructors and destructors, especially if a class has pointers or other things that might need to be debugged in the future. o Don't bother with copying virutal keywords in method declarations from an inherited-from class that declares a method virtual. In other words, the virtual declara- tion of a method should only exist in the class where the method is virtualized. o Factoring of code is important. Even if it only used in one place, and dragged in inline to make it effi- cient, factoring of chunks of code can greatly add to the understanding of something. If something is used multiple times, it is also an excellent candidate for factoring. If you find yourself copying code, don't. This is an excellent indication that the code needs to be factored into a method or function so it can be used by multiple callers. o If you virtualize any methods in a class declaration, the destructor for that class must also be virtualized. o When you are designing or implementing a component, remember that the interface is everything. Given a good interface, you can write an absolutely sucky, stupid implementation to prototype something and get it off the ground and running. When the implementation shows it's limitations, there is only one place you will need to revisit to improve things, and that is the implementation of that component alone. You won't end up changing everything that uses the class to adapt to its' new implementation. o It is said that Everything is in a name. That is entirely true when writing code. A good name can make the difference between instantly understanding what is going on, and spending lots of time trying to under- stand what is going on. Take the time and think about a good name for what you are doing. Don't name some- thing after how it does it, but rather after what it does. Newer Items o Something to mention is that I often do not re-indent some code. Rather I leave it as it is and work in its existing indentation. Typically this is because the code needs to be totally redesigned and rewritten, and there just isn't the time to do it. So, you keep patching it and trying to make minor cleanups, once you are aware of some of the issues involved, that will make the eventual rewrite easier. o If a class has any complex state associated with it, add a print() method and an operator<<() to it. If it is needed during debugging, perhaps even add a extern "C" function that can be used from the debugger to print the object. This print method makes it much eas- ier to determine what the object is doing, and to determine if it is in a valid state. This is true whether you are using the debugger, try to debug a problem, or whatever. o If you are going to make a class printable, give the class a ostream &T::print(ostream &) const method. Also add an ostream &operator<<(ostream &, const T &) function that is used to print the class. This way the output operator does not need to be a friend of the class. It also means the print method can be written as a normal method with access to class members and statics, instead of something else. o The case ...: components of a switch statement are indented at the same level as the switch itself. The stanzas of the case are at the next indent level. o If you are using exceptions, be certain to firewall exceptions properly. You need to do this when entering and leavin code that does not know about exceptions, such as libraries and system calls, and callbacks. It may be necessary to convert an exception to an error so it can be passed through the component properly, and the reconvert the error to an exception once exception land has been reached. If this is not done, the code that doesn't know about exceptions, but that deals with errors quite well, will not be able to do normal error handling and will appear broken. o Do Not use exceptions. Use errors instead. If you are calling code that might generate an exception, you need to generate a firewall to make certain errors are han- dled correctly. o If there are exceptions being used in the system, be very careful to wrap all stateful constructs in some sort of C++ scoped object so that the state can be undone properly if an exception goes off. o If exceptions are used, or it is possible that an exception can go through a chunk of code. It is neces- sary to ensure that all clases used that are exposed to exceptions in this manner can work in an arbitrary man- ner. This means thay have to clean up whatever state that they have and work properly in the face of arbi- trary (non-planned) use. o Code the assignment operator so it works correctly in the case of self-assignment. In most cases, you can do something like this: const A &A::operator=(const A &r) { if (this == &r) return *this; /* the actual assignment */ return *this; } o Don't say this-> all the time. C++ does it for you! And, if you are trying to overload an argument name with something built into the class by this, don't even try. o Some guidelines for #if use. Use #ifdef notyet if this is code that is intended to be used but isn't used yet. The notyet indicates it is for the future. Use #if 0 and #else as appropriate to show various alternatives of a piece of code. The one that is currently active should compile, and it should be in a #if 1 as the first thing, or a #else as the last. A comment in each stanza may be appropriate. If you are disabling code because something else is broken, explain what is bro- ken. o If you find something disgusting or wrong, mark it with an XXX mark and describe what it is that is wrong. People can then see the XXX comments and know that they might need to checkout this thing further. It provides a indication that something is funny and may need to be looked at. But it isn't such a big issue that it needs to be brought up in a design or other meeting ... well, maybe it does. One typical use is a bug that can't be fixed because of another problem. o Format code and comments to fit into 80 characters. It is the "standard" size of printers, terminals, xterms, and everything else. If you want to be really nice, only use 79 characters so the 80th character doesn't cause a line wrap in emacs. o Sometimes if a wrapped line is short, and breaking it would just result in something equally ugly, leave it. o Don't put multiple statements on a line. o Don't use // comments for block comments. Only use them for single line comments. An especially bad for- mat to avoid are multiple indented // comments trailing the end of the line. It basically makes the comment impossible to edit easily without an editor that will edit the comment for you, and it also forces the com- ment to be very columunar and take up lots of vertical space. If you have something to say about a something, say it in a block comment before the something. o Do not do ad-hoc argument parsing. Use getopt. o Don't use void indiscriminately. A 'void *' is a pointer to an unknown data type. You can't do arith- metic on a 'void *'. A 'char *' is a pointer to a chunk of memory. You can do arithmetic on a 'char *'. The lesson on this is that you may want or need to use a 'void *' in your interfaces so pointers don't need to be cast. However, you should use 'char *' internally to point to areas of memory, since the areas have a length and you probably want to do pointer arithmetic on them. The two types are completely different, use them correctly. o Don't try to combine data values and error values in function returns. If you need to return error codes, pass in a parameter (by reference) to return the value. Return the error code via the function return value. Mixing the two is just bad news. It makes it very dif- ficult to do uniform error handling, and it grossly limits your data types. o Never assume that there will be only one instance of your class. Always implement the general case. Just because you can't see a need for it now doesn't mean there isn't a need. Implementing for the general case will also provide a cleaner solution that is easier to maintain and extend. o Don't use ********* comment lines in ordinary usage. Stuff like that should be reserved for BIG EYE OPENING DISASTER comments or other indications that something really important or difficult or bad is going on. o Only deal with those errors that you know how to han- dle. Handle those errors. Either pass other errors up to the caller to deal with, or ABORT because you don't know what to do. If you just let the error slide you will have big time problems tracking down what is going wrong. o Use streams and '<<' and '>>' operators when possible. It isolates the formating from type changes to the arguments and just makes it work. It also gets rid of the overhead of interpreting the format string, since the compiler can generate the format code directly. If you still need to use printf-like formatting, use form() instead, and output to the I/O streams. o Don't use stdio I/O, use iostreams I/O. o C++ is NOT Pascal. There can be multiple exits from a function or method. It can considerably increase the readability and comprehension of code to use this capa- bilities. For example, to return early when various conditions are not met, and then the "main body" of the function contains what really happens. Verus, for example, nesting go and no-go if statements and burying the "proper" behavior of the function inside levels of indentation and control structure. o With regards to complex control structure ... If a function or method is complex enough it can simplify things to put all the "safety checks" and environmental changes in a wrapper function. This divorces all the setup and shutdown complexity from that of the task being done, so that it is obvious what is being done. Afterwards This set of guidelines is trying to dump out as much information as I can think about how to do things well. It actually extends well past the area of formatting code to design decisions about code, overall code structure, and other issues. At some point, I feel that have barely grazed the surface of the kinds of issues that you should consider when designing and writing code. All that is above, and way more, goes on in my head when I look at code, write code, design code, design systems, and do everything else that I do. I hope I have at least provided some idea of the things you should consider to do a better job designing and imple- menting a system. I'll try to do a better job organizing this document in the future, but it was tough enough just trying to dig up all the things to consider and get it written down :)