Homework 2
CS536-S24 Intro to PLs and Compilers

Question 1 | Question 2 | Question 3


Question 1

This question is about the language of regular expressions. One way to define that language is using a context-free grammar (CFG). To make things a little simpler, rather than allowing any characters as operands in regular expressions, we will restrict ourselves to only allowing letters (that way we don't have to worry about how to specifiy characters that are the same as operators or things like newlines). We will also allow ε (epsilon) in our regular expressions. The operators for our language of regular expressions are:

In a regular expression, * and + have the same, highest precedence, "followed by" has middle precedence, and | has the lowest precedence. All of the operators are left associative.

Part a

Write an unambiguous CFG for this language of regular expressions so that parse trees correctly reflect the precedences and associativities of the operators. Use lower-case names for nonterminals and use the following terminals:

LTR      // any letter 
EPS      // epsilon
OR       // |
STAR     // *
PLUS     // +
LPAR     // left paren
RPAR     // right paren

Part b

Draw a parse tree for the string:

a+b*|c*(d+f*)+|ε

Use LTR(a) in the parse tree to mean "the LTR token for the letter a" (and similarly for the other letters in the string).


Question 2

This question is about the language of set expressions defined as follows:

  1. A a comma-separated list of zero or more names enclosed in curly braces is a set expression.

  2. If S1 and S2 are both set expressions, then so are each of the following:
    S1 ∪ S2
    S1 ∩ S2
    S1 − S2
    ( S1 )

In a set expression, parenthesis has the highest precedence; intersection and union have the same second-highest precedence; subtraction has the lowest precedence. Intersection, union, and subtraction are all left associative.

The following table are tokens for terminals:

NAME   // one name in a set
∪      // union
∩      // intersection
−      // minus
(      // left paren
)      // right paren
{      // left curly brace
}      // right curly brace
,      // comma

For this question you are not required to write CFG for this language; some CFGs are offered below. For each CFG, do the following check:

  1. If there exists one string that is a legal set expression (given our definition above), but is not in the language of the CFG, give one example.
  2. If there exists one string that is not a set regular expression (given our definition above), but is in the language of the CFG, give one example.
  3. If the CFG is ambiguous, drawing two different parse trees for some string in the language of the CFG.
  4. If the CFG is correct, claim "It is correct".

Note that the terminals are LPAREN, RPAREN, MINUS, UNION, INTERSECT, LCURLY, RCURLY, COMMA, and NAME.

CFG 1:

exp → exp MINUS term | term
term → term UNION factor | term INTERSECT factor | LPAREN exp RPAREN | factor
factor → LCURLY list RCURLY
list → NAME | list COMMA NAME

CFG 2:

exp → LCURLY RCURLY | LCURLY list RCURLY | term
term → term MINUS factor | factor
factor → factor UNION set | factor INTERSECT set | set
set → LPAREN set RPAREN | exp
list → epsilon | NAME | list COMMA NAME

CFG 3:

exp → exp MINUS term | term
term → term UNION factor | term INTERSECT factor | LPAREN exp RPAREN | factor
factor → LCURLY RCURLY | LCURLY list RCURLY
list → NAME | list COMMA NAME

CFG 4:

exp → exp MINUS term | term
term → factor UNION factor | term INTERSECT factor | factor
factor → LPAREN exp RPAREN | LCURLY RCURLY  | LCURLY list RCURLY
list → NAME | list COMMA NAME

CFG 5:

exp → exp MINUS term | term
term → term UNION factor | term INTERSECT factor | factor
factor → LPAREN exp RPAREN | LCURLY RCURLY  | LCURLY list RCURLY
list → list COMMA NAME | NAME

Question 3

For this question you will define a syntax-directed translation for the CFG given below, which defines a very simple programming language.

program → MAIN LPAREN RPAREN LCURLY list RCURLY

list → list oneItem
     | epsilon
      
oneItem → decl
        | stmt
        
decl → BOOL ID SEMICOLON
     | INT ID SEMICOLON
          
stmt → ID ASSIGN exp SEMICOLON 
     | IF LPAREN exp RPAREN stmt
     | LCURLY list RCURLY 

exp → exp PLUS exp
    | exp LESS exp
    | exp EQUALS exp
    | ID
    | BOOLLITERAL
    | INTLITERAL

Part a

Write a syntax-directed translation for the CFG given above, so that the translation of an input program is the set of variables used somewhere in the program. Note: here the term use is in contrast to declaration; any appearance of a variable that is not a variable declaration counts as a use of that variable. For the example code in Part b, the translation should be { x, a, b }.

Your translation rules should use the following notation:

Note that you should not try to use something like "{ a, b }" to mean a set with two elements; instead, use set union to combine two sets that each contain one element.

Use the notation that was used in class and in the on-line readings; i.e., use nonterminal.trans to mean the translation of a nonterminal, and terminal.value to mean the value of a terminal. Assume that ID.value is a String (the name of the identifier). Use subscripts for translation rules that include the same nonterminal or the same terminal more than once.

Part b

Draw a parse tree for the program given below and annotate each nonterminal in the tree with its translation.

main ( ) { 
    int x;
    bool y;
    x = a;
    int z;
    if (x == a) {
        int x;
        b = x < 18;
    }
}

Last Updated: 2/13/2024     © 2024 CS 536