Computer Sciences Department logo

CS 368-1 (2010 Summer) — Day 7 Homework

Due Friday, July 23rd, at the start of class.


Analyze a Perl script and report on a few simple statistics.


The script takes the name of a Perl script as input — you can ask the user for the filename or simply hard-code it into your script.

The script reads the file and analyzes it for three things (see below for more details):

Lines of “real” code: We want to count the number of lines of code. But, certain lines in the file are not really code. If a line contains only a comment (and no other code), do not count it. If a line is “blank” — that is, is an empty string or contains only whitespace characters — do not count it. All other lines are code and should be counted. For example:

# Get user input
my $line = <STDIN>

# Remove trailing newline

That code sample contains two “real” lines of code.

Subroutine names: List all of the subroutines that are defined in the script. That is, look for lines with the sub keyword and the subroutine name. Only keep the subroutine name. Assume that the sub keyword and the subroutine name are on the same line, but that the opening curly brace ({) may be on the same line or not. Also, be sure to think about where whitespace can occur.

Scalar variable names: List all of the scalar variables that are defined or used in the script. This is the hardest pattern to describe with a regular expression in this homework. Remember that all scalars begin with a dollar sign ($), but not every expression that begins with a dollar sign includes the name of a scalar variable. For example, look at the following code:

my $foo = 1;
my @array;
$array[$foo] = 'hello';

In that sample, foo is a scalar variable name, but array is not, even though we see $array… in the third line. How do you describe the difference?

Example Output

I ran the script against my solution for homework 5, and this is what I got:

Lines of Code:     51
Subroutine Names:  convert, get_number, get_unit
Scalar Names:      character, from_unit, i, length, meters, ok, result, to_from, to_unit, unit, units, value


Repeated Matches

When looking for scalar variables, keep in mind that there may be more than one scalar variable used in a line. How do you loop through them? It is much like doing repeated substitutions, use the g modifier:

while ($string =~ /...(...).../g) {
    # do something with each match within $string

Matching Word Boundaries

There is one more matching expression that you may want to use in this homework. In a regular expression, \b matches the boundary between word-like characters (i.e., \w ones) and non-word-like characters. Interestingly, \b does not match a character, but rather an imaginary point between characters. For example, the regular expression:


Matches the word cat, because there is a \b boundary before the c and after the t. It also matches cat's, because there is a \b boundary between the t and the single quote ('). But it does not match cat, because there is no \b boundary between the s and the c.

If this is not clear, there are other (slightly harder) ways to do the homework without \b.


Do the work yourself, consulting reasonable reference materials as needed; any reference material that gives you a complete or nearly complete solution to this problem or a similar one is not OK to use. Asking the instructors for help is OK, asking other students for help is not.

Hand In

A printout of your output on a single sheet of paper. Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.