Theory of finite automata and regular expressions
- { [^i ]^i | i >= 1 } is NOT regular
- If R is regular, -R (complement of R) is regular
EX: /* */ is ok.
"/*" -("*/") "*/" looks like it should work, but it is completely
wrong. It is wrong because it doesn't include anything (even a string)
with the two characters */ anywhere in it.
- R is regular, s is a subset of R, but this does
NOT mean that s is regular.
EX: if V = ascii vocabulary, then V* denotes everything
(books, stock reports, etc); and everything can contain a lot of extra
information that is not wanted.
V is a superset of #1 (above), and so then s cannot be a
subset. Smaller expressions do not necessarily mean easier
- R1 is regular and R2 is regular, then R1 intersection
R2 is regular.
EX: CSXtokens = CSX & all-three-letter-tokens = VVV
This helps to illustrate intersection, as CSX intersection VVV
will narrow the field of all CSXtokens and all-three-letter-tokens to
be only three letter CSXtokens.
- ( ) denotes a regular state
- (( )) denotes a final state
- --x--> denotes a transition (here the transition is x
- <==x denotes a loop back to the same state
- ( )\-->( ) denotes that each of these two states have the same
originating state
R1: -->(1)--a-->((2))<==a
R2: -->(3)--a-->((4))--b-->((5))
R1 intersection R2: -->(1,3)--a-->((2,4))-->(2,?)\-->(?,5)
The '?' denotes that that particular state under both states isn't
final, so we can go ahead and delete it.
R1 intersection R2: -->(1,3)--a-->((2,4)) ;
or a+ intersection (a|ab) = a.
We can also prove this using complementation (Dr. Morgan's Law)
R1 intersection R2 = -(-R1 union -R2)
- If R is regular R^rev (where 'rev' reverses all strings)
is regular.
EX: (xyz)^rev = zyx
This can be more easily seen by drawing a finite automata (FA) and
reversing the directional arrows.
Parsing and Context Free Grammar (CFG)
---->SCANNER--tokens-->PARSER--structural represenation of the language-->
The parser uses Context Free Grammar (CFG) the same way the scanner uses
regular expressions.
- CFG: specifies valid programs in terms of tokens. It should
recognize something as syntactically valid and should also recognize an
invalid expression.
- CFG:has rules (productions) for defining proper token sequences, in
terms of 2 sorts of objects.
- terminals (tokens); these are recognized by the scanner
- non-terminals (placeholders)
- The left hand side of any expression will always be non-terminals.
- The right hand side of any expression can be any sequence of 0 or more
terminals and non-terminals.
- non-terminal -> {sequence of 0 or more terminals and non-terminals}
- Sample:
PROG -> { STMTS }
STMTS -> STMTS ; STMT
- | STMT
STMT -> ID = EXPR
EXPR -> ID
- | EXPR + ID
Following is an idea of how this sample grammar can be used
[STMTS] -> [STMTS; STMT] -> [STMT; STMT] -> [ID = EXPR ; ID = EXPR] ->
[ID = ID; ID = EXPR + ID] -> [ID = ID; ID = ED + ID]
- Anything generated from the CFG rules is valid