CS 520
Fall 2017
Course Log
Lecture 1
Wed 9/6/17
Sipser 0.1 < textbook section most relevant to this lecture
3 handouts (course info, reserve books, topics list). See our
web page: pages.cs.wisc.edu/~cs5201
Intro to course topics:
Finite state machines
A finite state machine, reading left to right,
cannot decide if a string of left and right parentheses is
balanced.
Computability theory
Grew out of Hilbert's program to find a procedure that
could decide the truth or falsity of any mathematical statement.
To prove there is no such procedure requires a definition
of "algorithm."
Turing machine: Finite state control + memory (no bound on
how much could be used).
Main result: Most "natural" questions about programs (e.g.
"does it always stop?") cannot be decided by algorithms.
Polynomialtime computation
Can we find a good interpretation for "efficient algorithms"?
Idea: consider algorithms with run time bounded by a polynomial
in length of the input.
Main result: there is a large class of combinatorial problems
(now called NPcomplete) that includes many natural questions
(graph coloring, traveling salesman, etc) with an amazing
property: either all of them have a poly time algorithm or
none of them do.
Lecture 2
Fri 9/8/17
Review of (mostly discrete) math concepts
Sipser 0.2, 0.4
Sets
everthing built from the membership idea, e.g.
1 is a member of {1,2,3}.
subsets, set equality
set operations: intersection, union, difference
Cartesian Product
ordered pairs (can be defined using sets)
A x B
Binary Relations
can visualize using a 2d array or directed graph
Functions
special kind of binary relation
domain, range
informally, a rule for transforming inputs into outputs
Sequences
just functions with domain {1,...,n} for some n
or {1,2,3,...}.
oeis.org = database of integer sequences
Induction and Recursion
Recursive defn for Fibonacci numbers
(Strong) induction principle: To prove something for all n>=0,
we can prove it for n=0, and then show that its truth for
any m= 0 }
L^+ is same thing but with k>=1.
Examples: L=M={a,aa}. LM = {aa,aaa,aaaa} (note the "collision").
L [empty set] = [empty set}  understand why this is
L = {a^2, a^5}
= {a^j : j=0,2 or j>= 4}.  you can prove by induction
Lecture 4
Wed 9/13/17
Homework 1 handed out. Due Friday 9/22.
Regular Expressions
Sipser 1.3, 4.2 subsection "The Diagonalization Method"
Background
McCulloch and Pitts 1943  nets of "neurons" (now called
threshold gates)
Kleene 1951  characterized the patterns recognizable by
an arbitrary net (could have cycles)
Regular Expressions
Defined recursively (Sipser Defn 1.52)
Atomic R.E.'s: symbols, plus empty string and empty set
Compound R.E.'s: built using union, concatenation, star
Each R.E. is a name for a language.
Precedence rules allow one to avoid parentheses
Examples
A language is regular iff it has a R.E.
Theorem: Not every language is regular.
Proof: Use Cantor's diagonal argument to show that
the number of languages is strictly greater
than the number of R.E.'s (which are themselves
strings in some enlarged alphabet).
Lecture 5
Fri 9/15/17
Review of regular expressions, regular languages (= languages
that can be defined by regular expressions).
Defn: x, the length of x, is the number of symbols is has.
So, for example, 0011 = 4.
Pumping lemma for regular languages:
Let L be regular. There is a number c_L (the pumping
length) such that
if w in L, w >= c_L, w can be factored as
w = xyz, where y>=1 and xy^kz in L for all k >= 0.
Illustration: L = all strings in {0,1}* with an even number of 1's.
I can use c_L = 2. Suppose w in L, and w>=2. There are
two possibilities: a) w has a 0 in it, or b) w is all 1's.
For a), let w = x0z, so that y=0. For b), let x=z=empty string,
y = w.
The pumping length is not unique. For example, I think that c_L=1
works for this example.
This theorem is often used to show that a particular language
is not regular.
Proof: Use induction on regular expressions. Suppose that L
is regular. We can recursively define a pumping length for
each subexpression for the regular expression that defines L.
Base cases: If L = a, take c_L = 2.
If L = epsilon, take c_L = 1.
If L = emptyset, take c_L = 1.
Recursive steps:
If L = M*, take c_L = 1. To see this works,
suppose w in L and w >= 1. Then, for some k>=1,
w = w1 w2 ... wk,
with each wi in M and some wj >= 1. So let
x = w1...w{j1}
y = wj
z = w{j+1}...wk.
If L = M u N, take c_L = max{c_M, c_N}.
If w in L with w >= c_L, then either
w in M and w >= c_M
or
w in N and w >= c_N
By induction, there are suitable x,y,z in either case.
If L = MN, let c_L = c_M + c_N. Suppose w in L
and w>= c_M + c_N. Then w = uv, and either
u >= c_M or v >= c_N. Suppose the first case
holds. By induction we can write u = xyz
with xy^kz in M for all k>=0, where y >= 1 . Then
x y^k zv in L
for all k>=0. The other case is similar and left
to you.
Finite Automata
Start with an example: Switches at the top and bottom
of a staircase, which control the light above it:
Two states: conducting (C) and not conducting (N)
We can flip the switch at the bottom or the one at the top.
Call these inputs B and T.
Ways to describe system's behavior:
Transition table
State diagram
Suppose the initial state is N. The set L of all input
sequences that leave the state in N is a regular
language.
L = all strings in {B,T}* of even length
= ( (B u T)(B u T) )* .
We will soon prove that a language is definable by a finite
automaton iff it has a regular expression.
Lecture 6
Mon 9/18/17
(Deterministic) finite automata
Reference: Sipser 1.1
Formal defn
Defn: The machine accepts the input x1x2...xn (xi are symbols) iff
there is a path labeled by x1x2...xn going from the initial
state to one of the final states.
If there is no such path, the input is rejected.
Example 1: A machine to decide if the number of 1's in a 01 string
is odd or even.
Key task in such a design problem is to figure out what
must be remembered about a prefix (or initial segment)
of the input word.
Here, it is whether the prefix has odd or even parity.
So we need 2 states.
Sipser Fig. 1.19 has a picture of the resulting machine.
We make q_even the accepting state.
Example 2: Base10 numbers divisible by 3.
Input alphabet is {0,1,...,9}. Leading 0's are allowed.
The idea behind the design is that x1x2....xn is
divisible by 3 iff x1 + x2 + ... + xn == 0 (mod 3).
This works because 10 == 1 (mod 3).
The machine will have 3 states, indicating the remainder
after dividing the input by 3.
Transition rule: delta(q_i, x_i) = q_{i + x_i mod 3}.
q_0 is the initial state and the only final state.
Both of these machines are determinstic: the behavior is completely
determined by the input.
Rabin and Scott (1959) introduced nondeterminism for finite automata
State transitions are permitted, not obligatory
From a given state, multiple (or even no) transitions are allowed
for the same symbol.
Acceptance is defined the same way. However, it is not obvious
how to verify that there is no path from initial to final states.
Very useful for certain kinds of patterns. Example:
L = {a,b,c,d}* abcd
is recognized by a 4state NFA:
a,b,c,d

 
\ v
> q0 > q1 > q2 > q3 > q4 (final)
a b c d
We can think of it as reading some input and making a guess about
where the final abcd begins.
Lecture 7
Wed 9/20/17
Nondeterministic Finite Automata (Sipser 1.2)
Formal defn: Just like DFAs but the transition function is
delta : Q x Sigma_epsilon > P(Q)
where Sigma_epsilon = Sigma with the empty string added
P(Q) = set of all subsets of Q (power set).
Example: NFA with 3 states to recognize (ab u a)+.
The table for delta is:
a b epsilon
q0 {q1,q2} empty empty
q1 empty {q2} empty
q2 empty empty epsilon
q0 is initial, q2 is the only final state
To decide if this machine accepts we can to breadthfirst search.
Ex: tree for the input w = aba.
q0
/  \
/  \
q1 q2 q0 after reading a
/ \
/ \
q2 q0 " " b
/  \
/  \
q1 q2 q0 " " a
Since a final state is reachable we say that the NFA
accepts aba.
To make this more efficient, we can just store the subset
of reachable states for each level. This is basically dynamic
programming.
{q0}

{q0,q1,q2}

{q0,q2}

{q0,q1,q2}
Big Theorem: The following are equivalent
1) L is regular
2) L = L(N) for some NFA N
3) L = L(M) for some DFA N
Note: L(N) is the set of strings that are accepted by N.
We'll prove 1) ==> 2) ==> 3) ==> 1)
Proof that 1)==>2):
We will define, for each RE phi, a nondeterministic finite
automaton N_phi for which L(N_phi) matches phi. Each N_phi
will have a unique final state.
Base cases:
a
phi is a symbol a: o > (o)
epsilon
phi is epsilon: o > (o)
phi is emptyset: o (o)
Recursive steps:
phi is (phi1 o phi2): connect N_phi1 and Nphi_2 in series
using epsilon edges
phi is (phi1 u phi2): connect N_phi1 and Nphi_2 in parallel
(similarly)
phi is (phi1*):

 
v 
o > [o o] > (o)
 (N_phi1) ^
 

Note that the bypass loop is outside N_phi1.
What goes wrong if you connect it from N_phi1's
initial state to its final state?
Lecture 8
Fri 9/22/17
Transforming NFAs to DFAs (Sipser 1.2, "Equivalence of NFAs and DFAs;
Transforming DFAs to REs (Hopcroft/Ullman, Thm. 2.4)
Review of DFA, NFA models
From NFAs to DFAs
Let q be a state of the NFA, S a subset of states
Closure E(q) = all states reachable from q using 0 or more
epsilontransitions
E(S) = union on q in S of E(q)
= all states reachable from some s in S using
0 or more epsilontransitions
We can compute E(q) by breadthfirst search out of q  just
ignore all the transitions except those labeled epsilon
Using the 3state example from last lecture,
E({q2)} = {q0,q2}
E({q1,q2)} = {q0,q1,q2}
To make a DFA M accepting the same language as N, we let
the states of M be the *subsets* of the states of Q.
(There are <= 2^Q of these.)
The transitions are as given in the pf of Thm 1.39 in the
text. Basically the rule is:
From the state set S, take all singleedge transitions
labeled by the symbol a. This gives you a new set T.
Then form the closure E(T).
Usually when you do this you only need to enumerate the
closed reachable subsets.
From DFAs to REs
Use dynamic programming (this is in Hopcroft & Ullman, pp. 3334).
Let the states of M be {1,...,n}. The key idea is this:
Consider paths from i to j that only use intermediate states
(that is, neither the beginning nor the end) <= k. Such a
path will go through k a finite number of times, and between
those visits to k, will use intermediate states <= k1.
Call the corresponding set of strings R_{i,j}^(k).
We'll show by induction on k that each such set is regular.
This suffices since L(M) is a finite union of such sets, where
i is the initial state, and j is one of the final states.
Basis:
R_{i,j}^(0) = the union of all {a} with delta(i,a) = j,
plus (additionally) {epsilon} (if i=j).
Induction step: go from k1 to k
R_{i,j}^(k) = R_{i,j}^(k1)
union
R_{i,k}^(k1) o (R_{k,k}^(k1))* o R_{k,j}^(k1).
Lecture 9
Mon 9/25/17
Review of regular language transformations
To define a regular language, we can use:
A regular expression
A nondeterministic finite automaton
A deterministic finite automaton
This tells us that the class of regular languages is a natural one,
and insensitive to tweaks in the definition.
The NFA for a regular expression is built up recursively
Example we did: (a u b)* o a
Try Sipser Example 1.56, and Example 1.58. Note that his
recursive construction for the NFA is different from
ours in that his machines can have multiple final states.
To replace a nondeterministic finite automaton by a deterministic
one, use subsets.
Example: The 3state automaton from Lecture 7. This accepts
(ab u a)+ = (ab u a)* (ab u a)
Starting with the closure of the initial state (here
{q0}, we enumerate reachable closed subsets. Remember that
E(S) is the closure of S  all the states you can get to
by starting in S and reading 0 or more empty strings.
Q0 = {q0}. Explore out of this.
delta(Q0, a) = closure of {q0,q2} = {q0, q1, q2} = Q1.
delta(Q0, b) = emptyset, call this Q2.
Now we explore out of Q1
delta(Q1, a) contains delta(Q0,a) = {q0, q1. q2}
So delta(Q1, a) is Q1.
delta(Q1,b) = closure of q2 = {q0, q2} = Q3
Explore out of Q2
delta(Q2,a) = delta(Q2,b) = emptyset = Q2
Explore out of Q3
delta(Q3, a) contains delta(Q0,a) = {q0, q1. q2}
So delta(Q3, a) is Q1.
delta(Q3, b) = emptyset = Q2.
This gives us a DFA with 4 states and 8 transition edge.
The final states are subsets that contain *some* final
state of the original machine. These are Q1 and Q3.
[Draw the picture of the DFA]
The number of states in the DFA is <= 2^[# of states in the NFA].
This exponential blowup can't be avoided (examples later)
Alternative method for replacing a DFA by a regular expression.
(Called State Elimination by Sipser, pp. 69 ff. See the proof
of Lemma 1.60.)
Worked example: Start with the 4 state example done above.
Remove dead state Q2, then add new (unique) final state by
adding epsilonedges from Q1 and Q3. Then successively
remove Q3 and Q1.
At each stage you an an automatonlike diagram whose edges
are regular expressions. Stop when you have
R
> o > (o)
This will terminate since the number of states decreases
with each step.
Note: The DFA for a RE can be exponentially large.
Lecture 10
Wed 9/27/17
Today: Regularity perserving opns (Hopcroft/Ullman 3.2, Ex. 3.4g,
Sipser Thms 1.451.49)
An improved pumping lemma (Sipser 1.4)
There are many theorems that say "If L is regular and we do something
to it to make a new language M, then M is also regular." Despite what
you might think, these are useful for studying nonregular languages.
1) If L is regular so is its complement {w in Sigma* : w not in L}
Proof idea: Use a DFA for L and make a state final (resp. nonfinal)
if it is nonfinal (resp. final).
Exercise for you: Could I have used an NFA?
2) If L1, L2 are regular, so is their intersection.
Pf: Use De Morgan's law (the prime denotes complement)
L1 n L2 = (L1' u L2')'
This can also be proved using DFA's: run the two machines
in parallel, and accept iff both of them accept.
3) The class of regular languages is closed under arbitrary Boolean
operations.
E.g. if L1, L2 are regular, so is
L1 (+) L2 = {w : w belongs to exactly one of L1, L2}
= (L1 n L2') u (L1' n L2).
4) If L is regular so is L^R = {w : the reverse of w is in L}.
Proof idea: Use an NFA for L, which (without loss of generality)
has exactly one final state. Reverse all the transitions,
and make the initial state final (and vice versa).
This can also be proved using regular expressions. It is a good
exercise to do this (using recursion of course).
An improved pumping lemma:
Let L be regular. Then for some p>0, any w in L of length >= p
can be expressed as w=xyz, such that:
for all k>=0, x y^k z in L
y >= 1
xy <= p < this condition is new.
Rather than memorize this, you should just remember the proof.
Let M be a DFA for L. Suppose L has p states. Let M accept
w, where w>=p. After M reads the first p symbols of w,
it will have visited p+1 states. Two of these must be the
same (by the pigeonhole principle). From this you can
get the factorization:
x = the part read up to the repeated state
y = the part read during the "cycle"
z = the rest of it
Proving Languages nonRegular
Let PL(L) be the above statement. We know
L regular ==> PL(L)
so
notPL(L) ==> L is not regular.
The statement of the pumping lemma is complicated! It has
four quantifiers:
there exists p>0 such that
for all w in L, w>=p
there exists xyz = w such that
for all k >= 0 x y^k z in L
To negate it we "flip" each quantifier to get
for all p>0
there exists w in L, w>=p such that
for all xyz = w
there exists k >= 0 with x y^k z not in L
Observe that we also negated the innermost statement
so that it now asserts something is not in L.
This mechanical process for negating statements with quantifiers
is very useful and should be mastered. To practice, try it
on the following: How do I say that the function f is not
continuous at x?
Lecture 11
Fri 9/29/17
Today: Finish pumping lemma
Some nonregular languages
MyhillNerode (Hopcroft/Ullman pp. 6570)
Review of the "textbook" pumping lemma and its negation
A common strategy for showing that L is not regular
1. Apply some regularity preserving operation to get a simpler
language L', that would be regular if L was.
2. Use the pumping lemma to show that L' isn't regular.
Example: L = strings of a's and b's with equal numbers of each
L' = L n a*b* = { a^n b^n : n >= 0 }.
We now argue that notPL is true of L'. The structure of
the argument exactly mirrors the structure of the statement.
stmt what we write
forall > Choose any p > 0.
exists > Let w be a^p b^p.
forall > Choose any factorization w = xyz with
y >= 1 and xy<= p
exists > Let k=3.
Then xy^kz has more a's than b's
You can think of this as providing a winning strategy
against an opponent who thinks that L' is regular.
Another example: L = Even length pailndromes = {ww^R : w in Sigma* }
This is regular if Sigma = 1
Not regular if Sigma > 1
You should write out the proofs of these.
Before showing L isn't regular, intersect it
with a*ba* first.
Problems with the pumping lemma: not definitive (there are nonregular
languages that satisfy it) and can require tedious case analysis
(trying all factorizations xyz).
MyhillNerode Theorem: A reliable test for regularity.
The underling idea.
Let L be some language. We want to decide membership in L,
by reading from left to right. Suppose w = xz and I have
read x. What must I "remember" about x?
Hard to think about abstractly (what does it mean to "remember"
something?) but I can simplify the question to:
Suppose x and y are two strings.
Under what conditions must my "memories" of x and y
be different?
Answer: If there is a distinguisher, that is, a string
z for which
xz belongs to L
yz doesn't belong to L
(or the other way around).
This is the key idea behind MyhillNerode. It is already enough
to show some languages are not regular.
Example: L = { a^n b^n : n >= 0 }
If m != n, then x=a^m and y=a^n have the distinguisher
z = b^m.
So any lefttoright algorithm to recognize L
needs access to unbounded memory. In other words,
no DFA can recognize L.
Lecture 12
Mon 10/2/17
Today: MyhillNerode theorem
Minimal finite automata
(Sipser Exs. 1.51 and 1.52, Hopcroft/Ullman pp. 6571)
Let L [ Sigma* be a language, x and y strings in Sigma*
Defn: A string z is a distinguisher for x,y if exactly one of
xz, yz is in L.
We will write x ~L y when x and y have no distinguisher.
Lemma (Ex. 1.51 in Sipser): ~L is an equivalence relation.
Proof: We must verify that ~L is
Reflexive: x ~L x (it is impossible for xz to be in and not in L)
Symmetric: x ~L y ==> y ~L x (the defn of ~L is symmetric)
Transitive: x ~L y and y ~L z ==> x ~L z.
Let x ~L y and y ~L z. It is required to prove that
forall w, xw and zw are either both in or both out of L
Choose any w.
If xw in L, then yw in L, so zw in L.
If xw not in L, then yw not in L, so zw not in L.
In either case, both xw and zw are in L, or neither is.
So z ~L w.
Recall that any equivalence relation on S partitions S into disjoint
equivalence classes. x's equivalence class consists of all the
elements of S that are equivalent to x.
[Draw the picture.]
The number of equivalence classes is called the *index*.
Example: Let S = Z (the integers). Call two numbers equivalent
if their difference is divisible by n. All numbers that yield the
same remainder after division by n form an equivalence class.
Theorem (MyhillNerode): L is regular iff the index of ~L is finite.
Proof (==>) [This is Problem 1.52a of our text]
Let (regular) be recognized by the DFA M.
Say that x ~~M y if M sends x,y to the same state.
This is finer than ~L, that is, if x ~~M y then x ~L y. (Why:
any distinguisher z for x and y would show that M didn't work.)
So index( ~L ) <= index ( ~~M ) <= # of states for M < oo.
Proof (<==) [This is Problem 1.52b]
We will make a DFA M for L, which uses the alphabet Sigma.
The states of M are the ~Lequivalence classes.
Let [w] denote the equivalence class of w (under ~L). Observe that
either all the elements of [w] are in L, or none of them are.
Initial state: q0 = [epsilon]
F = { [w] : w is in L }
delta([x],a) = [xa].
We need to check that the next state doesn't depend
on which representative we pick for [x]. Suppose that
x ~L x'. Then for all z, xz and x'z are both in L or both out.
In particular, for all w, xaw and x'aw are both in L or both
out of L. So xa ~L x'a, i.e. [xa] = [x'a].
To finish the proof, show (by induction on n) that
delta([epsilon], a1 a2...an ) = [a1 a2...an]
Then, x in L iff delta(q0,x) in F, i.e. M accepts x.
The proof showed that the DFA made from the ~Lequivalence classes
uses the fewest number of states possible.
This suggests a procedure for finding a DFA with minimal state count.
The idea is to start with a lower bound on the number of states
needed, and make it larger, until is equals the index of ~L.
Start with any DFA M for L. Assume that M has both final and nonfinal
states. (If not, one state suffices.) Also assume that every state
of M is reachable.
Begin with the partition Q = F + F'.
While there are two states q,r in the same part P, and a letter a
with delta(q,a) and delta(r,a) in different parts,
Use a to split P, according to the values of delta(q,a).
It's easy to see this terminates, since each pass through the loop
increases the index of the partition.
Correctness is more subtle. There is a proof in Hopcroft & Ullman
(see Thm 3.11, page. 71).
Lecture 13
Wed 10/4/17
Midterm exam will be Wed 10/25 7:15 PM, room TBA.
Homework #2 will be due Fri 10/13.
Example of DFA minimization procedure
Automata with output (HU pp. 2.7  on course web site)
Review of minimization procedure for DFAs
Example: Modified mod 2 counter, with 4 states:
B
^ \ 0,1
1 / \
> / v <
0  (A) (C)  0
 ^ / 
1 \ / 1
\ v
D
 ^
 

0
Initial partition:
A C  B D
delta(B,0) is final and delta(D,0) is not so we split
B from D:
A C  B  D
1 splits A from C, since delta(A,1) = B and
delta(C,1) = D are in separate parts. We get:
A  C  B  D
This tells us that the machine cannot be reduced.
Exercise for you: Which strings does it accept?
Finite Automata as Computers
Remember our course is presenting three "models of computation":
finitestate, unlimited (e.g. Turing machines), polynomialtime.
Up to this point, our finitestate devices don't do very much.
They only give one bit of output (whether the string is in
a language or not). Theoretically, all function computation
can be reduced to such yes/no decisions ("is the 4th bit of
f(x) a 0 or a 1?"), but this is inelegant and seems to violate
the spirit of finitestate computation.
We'll discuss two models for a "finitestate computer":
Moore Machines (after E.F. Moore, of Bell Labs and then UWMadison)
Q,Sigma,Delta,q0 same as a standard DFA (no final state set)
Delta = output alphabet
lambda : Q > Delta (so outputs are associated with states)
What it does: if the input is a1 a2... an (ai in Sigma)
and the states visited are q0 q1 ... qn, the output is
lambda(q0) lambda(q1) ... lambda(qn).
If f_M is the function computed by the Moore machine M,
then f_M is lengthincreasing:
 f_M(x)  = x + 1
Example (see Hopcroft/Ullman p. 43): Moore machine to
find the remainder mod 3 for a binary number. Based on
the idea that, when x is a binary number,
x0 = 2x
x1 = 2x + 1
Systematic examination of the output reveals that
the last symbol in f(x) is x mod 3.
Mealy Machines (named for G. H. Mealy, of Bell Labs):
Ouput is associated with state transitions:
lambda : Q x Sigma > Delta
What gets computed is
lambda(q0,a1) lambda(q1,a2) ... lambda(q{n1},an).
so this is length preserving
f_M (x) = x.
We can think of a Mealy machine as a "subtitution with
memory."
Example: WWII Enigma encryption machine
Equivalence: If M is a Moore machine and N is a Mealy machine,
we say that M is equivalent to N if (for some b and all x in
Sigma*)
f_M(x) = b f_N(x).
Theorem: For every Moore machine, there is an equivalent Mealy
machine, and vice versa.
Lecture 14
Fri 10/6/17
Discussion of problems in HW 1
Regular expressions in the real world
Reference: Section 4.1 (the grep family) of Kernighan and Pike
The Unix Programming Environment, PrenticeHall 1984. Now on
the 520 web site.
Some history
1960s: Text editors (led to today's vim, notepad, etc.)
Necessary to find and print lines containing given words
More generally, lines containing a *pattern*
~1968: Ken Thompson's qed interprets "pattern" as regular expression
Standalone pattern matcher (grep) still in common use.
E.g.
grep file
finds all lines with a string matching the pattern
grep v file
finds all lines not containing the pattern
Thompson's approach: run the text against an NFA derived from
the RE. See his paper in CACM, 11:6, June 1968.
Eventually regular expressions were put into progamming languages
like Java, Python, Perl, ...
grep's RE syntax (now standard):
Most symbols such as a,b,... stand for themselves
Concatenate to make strings, e.g. xyz
Metacharacters (used to specify structure of the RE)
 (pipe) means OR (or union)
( ) (parens) for grouping
^ forces match at beginning of line
$ forces match at end of line
. (wild card) matches any single character
[ ] (set brackets), e.g. [AZ] matches any capital letter
These can be combined, e.g.
colorcolour is equivalent to col(oou)r
Repetition symbols
R* Kleene star (as usual)
R+ means 1 or more copies of R
R? means 0 or 1 copies of R
R{n} means n repetitions of R
Extended REs
Many implementations allow back references, e.g. in
(.+)\1
the RE in parens matches any nonempty string, and the \1
refers to another copy of what was used in the match
These can define nonregular languages, for example
(a+)(b+)\1\2
matches L = {a^m b^n a^m b^n : n >= 1}. This is not
regular, by the pumping lemma.
Some examples. Since the shell interprets many symbols
before executing the command, it is wise to put the pattern
in single quotes.
grep 'abc'
matches any line containing abc
grep v 'abc'
matches any line not containing abc
grep '[AZ]*$'
matches everything (!) but
grep '[AZ][AZ]*$'
matches lines ending with a nonempty block
of capital letters
grep '^\(abc\)\1$'
matches a line ending with abcabc
but not the line xabcabc
(My version of grep requires that backreferenced
expressions be "tagged" by using escaped parens.
Using plain parens gives an error message.)
Lecture 15
Mon 10/9/17
New Topic: Computability Theory
Sipser 3.1, 3.3
Rogers 1.1, 2.2 has a nice intro
Some history
~ 1900: Development of formal logic
Hilbert's challenge: Find a method to decide if a sentence
of firstorder logic (can only apply forall, exists to
individuals) is true under any possible interpretation.
E.g. forall x ( R(x) > (R(x) OR S(x)) )
holds no matter what R,S mean.
~ 1930: Godel showed that any sufficiently expressive system
for proving statements about the integers would have
true, but unprovable statements.
~ 1935: Church (US) and Turing (UK) showed that Hilber's sought for
method cannot exist.
To show this requires a definition of "algorithm."
In this course we will study Turing's definition. The defn
of Church (and many others) leads to the same class of
computable functions.
What is an algorithm? Informally, it should exhibit the following
properties.
1. It is discrete and finite (e.g. a piece of text written using
a finite alphabet).
2. It can be executed by a agent who just follows precise directions.
No creativity or guesswork is needed.
3. Computation is deterministic (no random choices).
4. There is a way to store and retrieve intermediate results.
5. There is no a priori limit on the time or memory needed for a
computation.
Turing Machine (1936):
This was Turing's idealization of what a computer (then a human
being) executing an algorithm would do.
Infinite tape, divided into cells. Each cell holds one symbol.
Finitestate device that can read/write symbols and move
left or right.
Time is discrete: At each step, the machine will
Consult its internal state and symbol scanned
Write a new symbol
Move left or right one cell
Change its state
(Moves that would force the machine to fall off its
tape are ignored.)
Formal defn similar to the automata we've studied
See Sipser 3.1 for details, in particular Defn. 3.3
Note that each TM embodies one algorithm.
Warning: Details of the model are not standard.
Configuration: snapshot of a computation
We will abbreviate
[ q ]

v
A B C D ... Z _ _ _ ...
(the underscore _ stands for a blank)
as
A B C q D ... Z
Note: We assume that Q n Gamma is empty, so that the configuration
is just a string that is uniquely readable. Sipser (p. 169)
ensures readability by separating q, in effect making the
configuration a triple.
Lecture 16
Wed 10/11/17
Review of Turing machine model (Sipser 3.1)
Finite automata as Turing machines (Sipser 4.1)
Turing Machines
Review of Sipser Defn 3.3
TM "instructions" are quintuples: qi s qj t Delta says that
the machine, when in state qi and reading symbol s, will
write symbol t, move by Delta (one square L or R), and then
go into state qj. There is no "program counter"  the state
does the job of one.
If computation is a movie, then a single frame is called
a configuration.
If C,C' are configurations, then C yields C' if the TM could have
gone from C to C' in one step (meaning it executed one quintuple.)
Conventions: accept/reject configs yield nothing
left moves off end of tape are not executed
Defn (important): M {accepts, rejects} w in Sigma* if there is a sequence
of configurations, each yielding the next, such that
q0w = C0, C1, C2, ..., Ct = x q_accept y or x q_reject y
t is the number of steps, or the length, of the computation.
For any TM M and input word w, there are three mutually exclusive
possibilities:
M accepts w
M rejects w
M on input w runs forever.
Defn: L [ Sigma* is
Decidable (aka recursive) if for some TM M,
w in L ==> M accepts w
w not in L ==> M rejects w
Recognizable (aka recursively enumerable) if for some TM M,
w in L <==> M accepts w
(If w is not in L, M can reject or run forever.)
Note: This terminology is by no means standard. Many people say
computably enumerable for recognizable, for example.
Lecture 17
Fri 10/13/17
HW 2 Due
Review of Turing machine model
Church's Thesis (Sipser 3.3)
Turing machine:
Finite state control + infinite tape
Executes step by step, symbolically
Accepts/rejects by final state
L [ Sigma* is
Decidable if for some TM M,
w in L ==> M accepts w
w not in L ==> M rejects w
Recognizable if for some TM M,
w in L <==> M accepts w
(If w is not in L, M can reject or run forever.)
Language classes:
REGULAR [ DECIDABLE [ RECOGNIZABLE [ ARBITRARY
We showed last time that every regular language is decidable.
Every decidable language is recognizable (use the same M).
There are decidable languages that are not regular.
Example: L = { a^n b^n : n >= 0 }
Sketch of a TM design to decide membership in L
Tape alphabet = { a, a', b, b' [] (blank) }.
Think of a' as a marked version of a, similarly for b' and b.
The machine will sweep from left to right, marking the first
a and the first b that it encounters.
It accepts if it reaches a blank without seeing any a's
or b's. (This handles the case n=0 in L.)
It rejects if it marks an a but not a b, or vice versa
Otherwise, it moves back to the left and makes another
sweep.
[Examples.]
We can extend the idea of TM computation to include functions.
Defn: f : Sigma* > Sigma* is computable (sometimes called
recursive) if, for some TM M,
w in Sigma* ==> M on input w halts a final configuration
with just f(w) on its tape.
Thm: Every function computed by a {Moore, Mealy} machine is
computable.
Proof: modify the proof (given last time) that every regular
language is decidable. The Turing machine will exactly mimic
the action of the finitestate machine.
There are computable functions that cannot be computed by an Mealy
or Moore machine.
Example: f(w) = ww.
A TM for f can just copy the input, symbol by symbol,
to the right of w. (It can use a similar "marking"
idea to keep track of what has been written.)
f(w) = 2w, but any Mealy or Moore machine transforms
w into a string of length <= w + 1.
Church's Thesis (after the American logician Alonzo Church):
The formal concept of computation by a Turing exactly matches
our informal concept of computation by an algorithm.
Note that this is not a mathematical statement, but rather a statement
about what (most) people believe.
Rogers (~1965) used this to simplify proofs. Rather than explicitly
describe a TM to do some task, just describe algorithms precisely
but informally. (Any reader who isn't satisfied with such an argument
is invited to fill in the details.)
This has now become standard in computability theory.
Observe that our description of the TM to recognize a^n b^n
took advantage of Church's thesis.
Lecture 18
Mon 10/16/17
Church's Thesis and Proofs
The Evidence for Church's Thesis
Review of Church's thesis (see 10/13 lecture)
If we adopt Church's thesis, we can simplify proofs.
Example: For an infinite language L [ Sigma*, the following are
equivalent.
1) L can be enumerated by some methodical procedure. This
means that the procedure makes a list which will eventually
contain every w in L, but none of the words that are not in L.
(Words can repeat.)
2) L is recognizable by some Turing machine M, which means that
M accepts w iff w is in L.
Proof that 1) ==> 2): On input w, M starts the enumeration procedure.
If w ever appears in the list, M stops in
an accepting state.
Proof that 2) ==> 1): This uses an idea called *dovetailing*.
Let w1, w2, w3, ... be an enumeration of Sigma*,
such as the following. List the strings in
groups by length, using dictionary order to
decide the ordering within each length. Thus,
when Sigma = {0,1}, the enumeration is
epsilon, 0, 1, 00, 01, 10, 11 , ...
Suppose that M recognizes L.
For n = 1,2,3,...
Run M on w1,...,wn for n steps each.
Output all wi that were accepted.
The idea behind the procedure is that any
element w of L will appear on the list, and
then on all on subsequent ones. Eventually
n will be large enough that M's computation
on w can run to termination.
The Evidence for Church's Thesis
1. All attempts to extend the Turing machine definition to include
extra features have not extended the ultimate power of such
a device.
2. Many alternative formal definitions of computation have been
proposed. All known such alternatives have the same power
as the Turing machine.
Examples for 1:
a. Twoway infinite tape: Simulate with a twotrack tape, using
the top track for one direction and the bottom track for
the other.
b. k>1 tapes, say k = 2: We can simulate a twotape machine with
two tracks on a onetape machine. The only difficulty is when
the heads are ordered to move in opposite directions. Then,
one instruction by the 2tape machine requires the onetape
machine to shift the two tracks relative to each other This
causes a slowdown which is of no consequence in computability
theory.
c. Random access: We can put the addresses on one track of a single
tape, and the data word (unconstrained as to length) for an
address starting right below it.
d. Twodimensional tape, say with addresses
14
9 13
5 8 12 ...
2 4 7 11
i >= 0 0 1 3 6 10
j= 0 1 2 3 4
The address for (0,j) is sum_{k=0}^j k = (j+1 choose 2)
So the address for (i,j) is (i+j+1 choose 2) + j.
Upcoming:
10/18: Unsolvability of the Halting Problem (Sipser 4.2)