Theory of finite automata and regular expressions

{ [^i ]^i | i >= 1 } is NOT regular
If R is regular, -R (complement of R) is regular
EX: /* */ is ok.
"/*" -("*/") "*/" looks like it should work, but it is completely wrong. It is wrong because it doesn't include anything (even a string) with the two characters */ anywhere in it.
R is regular, s is a subset of R, but this does NOT mean that s is regular.
EX: if V = ascii vocabulary, then V* denotes everything (books, stock reports, etc); and everything can contain a lot of extra information that is not wanted.
V is a superset of #1 (above), and so then s cannot be a subset. Smaller expressions do not necessarily mean easier
R1 is regular and R2 is regular, then R1 intersection R2 is regular.
EX: CSXtokens = CSX & all-three-letter-tokens = VVV This helps to illustrate intersection, as CSX intersection VVV will narrow the field of all CSXtokens and all-three-letter-tokens to be only three letter CSXtokens.
- ( ) denotes a regular state
- (( )) denotes a final state
- --x--> denotes a transition (here the transition is x
- <==x denotes a loop back to the same state
- ( )\-->( ) denotes that each of these two states have the same originating state
R1: -->(1)--a-->((2))<==a
R2: -->(3)--a-->((4))--b-->((5))
R1 intersection R2: -->(1,3)--a-->((2,4))-->(2,?)\-->(?,5)
The '?' denotes that that particular state under both states isn't final, so we can go ahead and delete it.
R1 intersection R2: -->(1,3)--a-->((2,4)) ; or a+ intersection (a|ab) = a.
We can also prove this using complementation (Dr. Morgan's Law) R1 intersection R2 = -(-R1 union -R2)
If R is regular R^rev (where 'rev' reverses all strings) is regular.
EX: (xyz)^rev = zyx
This can be more easily seen by drawing a finite automata (FA) and reversing the directional arrows.

Parsing and Context Free Grammar (CFG)

---->SCANNER--tokens-->PARSER--structural represenation of the language-->
The parser uses Context Free Grammar (CFG) the same way the scanner uses regular expressions.

CFG: specifies valid programs in terms of tokens. It should recognize something as syntactically valid and should also recognize an invalid expression.
CFG:has rules (productions) for defining proper token sequences, in terms of 2 sorts of objects.
1. terminals (tokens); these are recognized by the scanner
2. non-terminals (placeholders)
The left hand side of any expression will always be non-terminals.
The right hand side of any expression can be any sequence of 0 or more terminals and non-terminals.
non-terminal -> {sequence of 0 or more terminals and non-terminals}
Sample:
PROG -> { STMTS }
STMTS -> STMTS ; STMT

| STMT
STMT -> ID = EXPR
EXPR -> ID

| EXPR + ID
Following is an idea of how this sample grammar can be used
[STMTS] -> [STMTS; STMT] -> [STMT; STMT] -> [ID = EXPR ; ID = EXPR] -> [ID = ID; ID = EXPR + ID] -> [ID = ID; ID = ED + ID]

Anything generated from the CFG rules is valid