This research was conducted by Ben Liblit, Andrew Begel, and Eve Sweetser. The paper has been published in the 18th Annual Psychology of Programming Workshop (PPIG 2006).
There are unconfirmed rumors that this paper received a “most gratuitously large dataset” award, presumably for its analysis of 45 million lines of Windows 2003 source code. We thank the awards committee for their recognition that anything worth doing is worth overdoing. ☺
Programming a computer is a complex, cognitively rich process. This paper examines ways in which human cognition is reflected in the text of computer programs. We concentrate on naming: the assignment of identifying labels to programmatic constructs. Naming is arbitrary, yet programmers do not select names arbitrarily. Rather, programmers choose and use names in regular, systematic ways that reflect deep cognitive and linguistic influences. This, in turn, allows names to carry semantic cues that aid in program understanding and support the larger software development process.
The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.