Hopscotch with the Devil
Today's post is how not to write stupid code.
The other day I ran across this construct (but with far worse whitespace layout) in some code I was attempting to maintain:
if (func1() == OK) { /* lots of code 1 */ if (func2() == OK) { /* lots of code 2 */ if (func3() == OK) { /* lots of code 3 */ if (func4() == OK) { /* lots of code 4 */ if (func5() == OK) { /* lots of code 5 */ } else { /* do some error condition 5 */ return; } } else { /* do some error condition 4 */ return; } } else { /* do some error condition 3 */ return; } } else { /* do some error condition 2 */ return; } } else { /* do some error condition 1 */ return; }
For the comments where it dictates code should be, put up to a few dozen lines in each block.
The code layout for this particular mental form is an utter disaster.
This is how it was rewritten:
if (func1() == NOT_OK) { /* do some error condition 1 */ return; } /* lots of code 1 */ if (func2() == NOT_OK) { /* do some error condition 2 */ return; } /* lots of code 2 */ if (func3() == NOT_OK) { /* do some error condition 3 */ return; } /* lots of code 3 */ if (func4() == NOT_OK) { /* do some error condition 4 */ return; } /* lots of code 4 */ if (func5() == NOT_OK) { /* do some error condition 5 */ return; } /* lots of code 5 */
The first major problem in the first example is the notion of where the PC goes on the common execution path. In the first example, this is very explicitly defined and the common path leads the programmer into a deeper set of nested if statements. This explicit definition ends up directly opposing the implicit understanding of the common code execution path already embued to any decent programmer of codes: the next basic block down in the file.
The second major problem is where the PC goes on the uncommon execution of the code. :) If the first function failed, it could be potentially hundreds to thousands of lines later that the PC hops skipping a LOT of code to somewhere that will be cumbersome to find, bouncing the brace nonwithstanding.
Here is a picture I drew of the code path execution of the first example as it would appear on a page of text.
Legend: '*' means a conditional branch, 'u' means an execution path in an uncommonly hit condition, '#s' means a starting point in the common execution path, '#e' means an end of the common excution path, 'v' means a place execution will hop in an uncommon execution path, '.' means a logical connection, 'c' means where the execution goes in a commonly hit condition, and '|','\' means a path of execution.
#s | | | *c | \ u *c . | \ . u *c . . | \ . . u *c . . . | \ . . . u *c . . . . | \ . . . . u | . . . . . goto #e [ This skip here is abhorrent. ] . . . . v . . . . \ . . . v return . . . \ . . v return . . \ . v return . \ v return \ return #e | | |
This picture shows exactly the confusion, the uncommon 'u' code paths are pushed onto a mental stack as place holders the programmer must remember to match to corresponding 'v' blocks while going down the common path of execution from #s to the inner most branch. Then from the inner most branch a terrible skip of the common execution path occurs to #e, skipping a lot code of that you have to determine isn't in #e's code path since the 'v' blocks are so far away from the initiating conditional.
In the second example, only in the exceptional cases of uncommon code paths do the path of the code deviate from the intial depth of execution.
Here is a picture I drew of the code path execution of the second example almost as it would appear on a page of text:
#s | | *u-return c | *u-return c | *u-return c | *u-return c | *u-return c | | #e
Ah, much better and much more natural.... #s goes directly to #e and the programmer is free to ignore the uncommon conditions of the execution path. I said almost because the return blocks will be directly underneath the initiating conditionals on the page of text, but even so they are still easy to skip in practice and a natural ability for programmers.
This is the hallmark of clear code--keeping the common path of execution on, or very near, the first depth (where higher depth is defined as going farther to the right on the page because of conditionals) line through the function.
The second code execution path tree is simply an isomorphism of the first tree, however, the mapping of that tree onto the 2D page of text and preferred top to bottom execution is much, much clearer.
Moral of the story: Aggresively minimize the depth and skip distance of a common path of execution.
End of Line.