CS 520
Fall 2017
Course Log
Lecture 1
Wed 9/6/17
Sipser 0.1 < textbook section most relevant to this lecture
3 handouts (course info, reserve books, topics list). See our
web page: pages.cs.wisc.edu/~cs5201
Intro to course topics:
Finite state machines
A finite state machine, reading left to right,
cannot decide if a string of left and right parentheses is
balanced.
Main result: Many "natural" ways of defining the patterns
recognizable by finite machines are equivalent.
Computability theory
Grew out of Hilbert's program to find a procedure that
could decide the truth or falsity of any mathematical statement.
To prove there is no such procedure requires a definition
of "algorithm."
Turing machine: Finite state control + memory (no bound on
how much could be used).
Main result: Most "natural" questions about programs (e.g.
"does it always stop?") cannot be decided by algorithms.
Polynomialtime computation
Can we find a good interpretation for "efficient algorithms"?
Idea: consider algorithms with run time bounded by a polynomial
in length of the input.
Main result: there is a large class of combinatorial problems
(now called NPcomplete) that includes many natural questions
(graph coloring, traveling salesman, etc) with an amazing
property: either all of them have a poly time algorithm or
none of them do.
Lecture 2
Fri 9/8/17
Review of (mostly discrete) math concepts
Sipser 0.2, 0.4
Sets
everthing built from the membership idea, e.g.
1 is a member of {1,2,3}.
subsets, set equality
set operations: intersection, union, difference
Cartesian Product
ordered pairs (can be defined using sets)
A x B
Binary Relations
can visualize using a 2d array or directed graph
Functions
special kind of binary relation
domain, range
informally, a rule for transforming inputs into outputs
Sequences
just functions with domain {1,...,n} for some n
or {1,2,3,...}.
oeis.org = database of integer sequences
Induction and Recursion
Recursive defn for Fibonacci numbers
(Strong) induction principle: To prove something for all n>=0,
we can prove it for n=0, and then show that its truth for
any m= 0 }
L^+ is same thing but with k>=1.
Examples: L=M={a,aa}. LM = {aa,aaa,aaaa} (note the "collision").
L [empty set] = [empty set}  understand why this is
L = {a^2, a^5}
= {a^j : j=0,2 or j>= 4}.  you can prove by induction
Lecture 4
Wed 9/13/17
Homework 1 handed out. Due Friday 9/22.
Regular Expressions
Sipser 1.3, 4.2 subsection "The Diagonalization Method"
Background
McCulloch and Pitts 1943  nets of "neurons" (now called
threshold gates)
Kleene 1951  characterized the patterns recognizable by
an arbitrary net (could have cycles)
Regular Expressions
Defined recursively (Sipser Defn 1.52)
Atomic R.E.'s: symbols, plus empty string and empty set
Compound R.E.'s: built using union, concatenation, star
Each R.E. is a name for a language.
Precedence rules allow one to avoid parentheses
Examples
A language is regular iff it has a R.E.
Theorem: Not every language is regular.
Proof: Use Cantor's diagonal argument to show that
the number of languages is strictly greater
than the number of R.E.'s (which are themselves
strings in some enlarged alphabet).
Lecture 5
Fri 9/15/17
Review of regular expressions, regular languages (= languages
that can be defined by regular expressions).
Defn: x, the length of x, is the number of symbols is has.
So, for example, 0011 = 4.
Pumping lemma for regular languages:
Let L be regular. There is a number c_L (the pumping
length) such that
if w in L, w >= c_L, w can be factored as
w = xyz, where y>=1 and xy^kz in L for all k >= 0.
Illustration: L = all strings in {0,1}* with an even number of 1's.
I can use c_L = 2. Suppose w in L, and w>=2. There are
two possibilities: a) w has a 0 in it, or b) w is all 1's.
For a), let w = x0z, so that y=0. For b), let x=z=empty string,
y = w.
The pumping length is not unique. For example, I think that c_L=1
works for this example.
This theorem is often used to show that a particular language
is not regular.
Proof: Use induction on regular expressions. Suppose that L
is regular. We can recursively define a pumping length for
each subexpression for the regular expression that defines L.
Base cases: If L = a, take c_L = 2.
If L = epsilon, take c_L = 1.
If L = emptyset, take c_L = 1.
Recursive steps:
If L = M*, take c_L = 1. To see this works,
suppose w in L and w >= 1. Then, for some k>=1,
w = w1 w2 ... wk,
with each wi in M and some wj >= 1. So let
x = w1...w{j1}
y = wj
z = w{j+1}...wk.
If L = M u N, take c_L = max{c_M, c_N}.
If w in L with w >= c_L, then either
w in M and w >= c_M
or
w in N and w >= c_N
By induction, there are suitable x,y,z in either case.
If L = MN, let c_L = c_M + c_N. Suppose w in L
and w>= c_M + c_N. Then w = uv, and either
u >= c_M or v >= c_N. Suppose the first case
holds. By induction we can write u = xyz
with xy^kz in M for all k>=0, where y >= 1 . Then
x y^k zv in L
for all k>=0. The other case is similar and left
to you.
Finite Automata
Start with an example: Switches at the top and bottom
of a staircase, which control the light above it:
Two states: conducting (C) and not conducting (N)
We can flip the switch at the bottom or the one at the top.
Call these inputs B and T.
Ways to describe system's behavior:
Transition table
State diagram
Suppose the initial state is N. The set L of all input
sequences that leave the state in N is a regular
language.
L = all strings in {B,T}* of even length
= ( (B u T)(B u T) )* .
We will soon prove that a language is definable by a finite
automaton iff it has a regular expression.
Lecture 6
Mon 9/18/17
(Deterministic) finite automata
Reference: Sipser 1.1
Formal defn
Defn: The machine accepts the input x1x2...xn (xi are symbols) iff
there is a path labeled by x1x2...xn going from the initial
state to one of the final states.
If there is no such path, the input is rejected.
Example 1: A machine to decide if the number of 1's in a 01 string
is odd or even.
Key task in such a design problem is to figure out what
must be remembered about a prefix (or initial segment)
of the input word.
Here, it is whether the prefix has odd or even parity.
So we need 2 states.
Sipser Fig. 1.19 has a picture of the resulting machine.
We make q_even the accepting state.
Example 2: Base10 numbers divisible by 3.
Input alphabet is {0,1,...,9}. Leading 0's are allowed.
The idea behind the design is that x1x2....xn is
divisible by 3 iff x1 + x2 + ... + xn == 0 (mod 3).
This works because 10 == 1 (mod 3).
The machine will have 3 states, indicating the remainder
after dividing the input by 3.
Transition rule: delta(q_i, x_i) = q_{i + x_i mod 3}.
q_0 is the initial state and the only final state.
Both of these machines are determinstic: the behavior is completely
determined by the input.
Rabin and Scott (1959) introduced nondeterminism for finite automata
State transitions are permitted, not obligatory
From a given state, multiple (or even no) transitions are allowed
for the same symbol.
Acceptance is defined the same way. However, it is not obvious
how to verify that there is no path from initial to final states.
Very useful for certain kinds of patterns. Example:
L = {a,b,c,d}* abcd
is recognized by a 4state NFA:
a,b,c,d

 
\ v
> q0 > q1 > q2 > q3 > q4 (final)
a b c d
We can think of it as reading some input and making a guess about
where the final abcd begins.
Lecture 7
Wed 9/20/17
Nondeterministic Finite Automata (Sipser 1.2)
Formal defn: Just like DFAs but the transition function is
delta : Q x Sigma_epsilon > P(Q)
where Sigma_epsilon = Sigma with the empty string added
P(Q) = set of all subsets of Q (power set).
Example: NFA with 3 states to recognize (ab u a)+.
The table for delta is:
a b epsilon
q0 {q1,q2} empty empty
q1 empty {q2} empty
q2 empty empty epsilon
q0 is initial, q2 is the only final state
To decide if this machine accepts we can to breadthfirst search.
Ex: tree for the input w = aba.
q0
/  \
/  \
q1 q2 q0 after reading a
/ \
/ \
q2 q0 " " b
/  \
/  \
q1 q2 q0 " " a
Since a final state is reachable we say that the NFA
accepts aba.
To make this more efficient, we can just store the subset
of reachable states for each level. This is basically dynamic
programming.
{q0}

{q0,q1,q2}

{q0,q2}

{q0,q1,q2}
Big Theorem: The following are equivalent
1) L is regular
2) L = L(N) for some NFA N
3) L = L(M) for some DFA N
Note: L(N) is the set of strings that are accepted by N.
We'll prove 1) ==> 2) ==> 3) ==> 1)
Proof that 1)==>2):
We will define, for each RE phi, a nondeterministic finite
automaton N_phi for which L(N_phi) matches phi. Each N_phi
will have a unique final state.
Base cases:
a
phi is a symbol a: o > (o)
epsilon
phi is epsilon: o > (o)
phi is emptyset: o (o)
Recursive steps:
phi is (phi1 o phi2): connect N_phi1 and Nphi_2 in series
using epsilon edges
phi is (phi1 u phi2): connect N_phi1 and Nphi_2 in parallel
(similarly)
phi is (phi1*):

 
v 
o > [o o] > (o)
 (N_phi1) ^
 

Note that the bypass loop is outside N_phi1.
What goes wrong if you connect it from N_phi1's
initial state to its final state?
Lecture 8
Fri 9/22/17
Transforming NFAs to DFAs (Sipser 1.2, "Equivalence of NFAs and DFAs;
Transforming DFAs to REs (Hopcroft/Ullman, Thm. 2.4)
Review of DFA, NFA models
From NFAs to DFAs
Let q be a state of the NFA, S a subset of states
Closure E(q) = all states reachable from q using 0 or more
epsilontransitions
E(S) = union on q in S of E(q)
= all states reachable from some s in S using
0 or more epsilontransitions
We can compute E(q) by breadthfirst search out of q  just
ignore all the transitions except those labeled epsilon
Using the 3state example from last lecture,
E({q2)} = {q0,q2}
E({q1,q2)} = {q0,q1,q2}
To make a DFA M accepting the same language as N, we let
the states of M be the *subsets* of the states of Q.
(There are <= 2^Q of these.)
The transitions are as given in the pf of Thm 1.39 in the
text. Basically the rule is:
From the state set S, take all singleedge transitions
labeled by the symbol a. This gives you a new set T.
Then form the closure E(T).
Usually when you do this you only need to enumerate the
closed reachable subsets.
From DFAs to REs
Use dynamic programming (this is in Hopcroft & Ullman, pp. 3334).
Let the states of M be {1,...,n}. The key idea is this:
Consider paths from i to j that only use intermediate states
(that is, neither the beginning nor the end) <= k. Such a
path will go through k a finite number of times, and between
those visits to k, will use intermediate states <= k1.
Call the corresponding set of strings R_{i,j}^(k).
We'll show by induction on k that each such set is regular.
This suffices since L(M) is a finite union of such sets, where
i is the initial state, and j is one of the final states.
Basis:
R_{i,j}^(0) = the union of all {a} with delta(i,a) = j,
plus (additionally) {epsilon} (if i=j).
Induction step: go from k1 to k
R_{i,j}^(k) = R_{i,j}^(k1)
union
R_{i,k}^(k1) o (R_{k,k}^(k1))* o R_{k,j}^(k1).
Lecture 9
Mon 9/25/17
Review of regular language transformations
To define a regular language, we can use:
A regular expression
A nondeterministic finite automaton
A deterministic finite automaton
This tells us that the class of regular languages is a natural one,
and insensitive to tweaks in the definition.
The NFA for a regular expression is built up recursively
Example we did: (a u b)* o a
Try Sipser Example 1.56, and Example 1.58. Note that his
recursive construction for the NFA is different from
ours in that his machines can have multiple final states.
To replace a nondeterministic finite automaton by a deterministic
one, use subsets.
Example: The 3state automaton from Lecture 7. This accepts
(ab u a)+ = (ab u a)* (ab u a)
Starting with the closure of the initial state (here
{q0}, we enumerate reachable closed subsets. Remember that
E(S) is the closure of S  all the states you can get to
by starting in S and reading 0 or more empty strings.
Q0 = {q0}. Explore out of this.
delta(Q0, a) = closure of {q0,q2} = {q0, q1, q2} = Q1.
delta(Q0, b) = emptyset, call this Q2.
Now we explore out of Q1
delta(Q1, a) contains delta(Q0,a) = {q0, q1. q2}
So delta(Q1, a) is Q1.
delta(Q1,b) = closure of q2 = {q0, q2} = Q3
Explore out of Q2
delta(Q2,a) = delta(Q2,b) = emptyset = Q2
Explore out of Q3
delta(Q3, a) contains delta(Q0,a) = {q0, q1. q2}
So delta(Q3, a) is Q1.
delta(Q3, b) = emptyset = Q2.
This gives us a DFA with 4 states and 8 transition edge.
The final states are subsets that contain *some* final
state of the original machine. These are Q1 and Q3.
[Draw the picture of the DFA]
The number of states in the DFA is <= 2^[# of states in the NFA].
This exponential blowup can't be avoided (examples later)
Alternative method for replacing a DFA by a regular expression.
(Called State Elimination by Sipser, pp. 69 ff. See the proof
of Lemma 1.60.)
Worked example: Start with the 4 state example done above.
Remove dead state Q2, then add new (unique) final state by
adding epsilonedges from Q1 and Q3. Then successively
remove Q3 and Q1.
At each stage you an an automatonlike diagram whose edges
are regular expressions. Stop when you have
R
> o > (o)
This will terminate since the number of states decreases
with each step.
Note: The DFA for a RE can be exponentially large.
Lecture 10
Wed 9/27/17
Today: Regularity perserving opns (Hopcroft/Ullman 3.2, Ex. 3.4g,
Sipser Thms 1.451.49)
An improved pumping lemma (Sipser 1.4)
There are many theorems that say "If L is regular and we do something
to it to make a new language M, then M is also regular." Despite what
you might think, these are useful for studying nonregular languages.
1) If L is regular so is its complement {w in Sigma* : w not in L}
Proof idea: Use a DFA for L and make a state final (resp. nonfinal)
if it is nonfinal (resp. final).
Exercise for you: Could I have used an NFA?
2) If L1, L2 are regular, so is their intersection.
Pf: Use De Morgan's law (the prime denotes complement)
L1 n L2 = (L1' u L2')'
This can also be proved using DFA's: run the two machines
in parallel, and accept iff both of them accept.
3) The class of regular languages is closed under arbitrary Boolean
operations.
E.g. if L1, L2 are regular, so is
L1 (+) L2 = {w : w belongs to exactly one of L1, L2}
= (L1 n L2') u (L1' n L2).
4) If L is regular so is L^R = {w : the reverse of w is in L}.
Proof idea: Use an NFA for L, which (without loss of generality)
has exactly one final state. Reverse all the transitions,
and make the initial state final (and vice versa).
This can also be proved using regular expressions. It is a good
exercise to do this (using recursion of course).
An improved pumping lemma:
Let L be regular. Then for some p>0, any w in L of length >= p
can be expressed as w=xyz, such that:
for all k>=0, x y^k z in L
y >= 1
xy <= p < this condition is new.
Rather than memorize this, you should just remember the proof.
Let M be a DFA for L. Suppose L has p states. Let M accept
w, where w>=p. After M reads the first p symbols of w,
it will have visited p+1 states. Two of these must be the
same (by the pigeonhole principle). From this you can
get the factorization:
x = the part read up to the repeated state
y = the part read during the "cycle"
z = the rest of it
Proving Languages nonRegular
Let PL(L) be the above statement. We know
L regular ==> PL(L)
so
notPL(L) ==> L is not regular.
The statement of the pumping lemma is complicated! It has
four quantifiers:
there exists p>0 such that
for all w in L, w>=p
there exists xyz = w such that
for all k >= 0 x y^k z in L
To negate it we "flip" each quantifier to get
for all p>0
there exists w in L, w>=p such that
for all xyz = w
there exists k >= 0 with x y^k z not in L
Observe that we also negated the innermost statement
so that it now asserts something is not in L.
This mechanical process for negating statements with quantifiers
is very useful and should be mastered. To practice, try it
on the following: How do I say that the function f is not
continuous at x?
Lecture 11
Fri 9/29/17
Today: Finish pumping lemma
Some nonregular languages
MyhillNerode (Hopcroft/Ullman pp. 6570)
Review of the "textbook" pumping lemma and its negation
A common strategy for showing that L is not regular
1. Apply some regularity preserving operation to get a simpler
language L', that would be regular if L was.
2. Use the pumping lemma to show that L' isn't regular.
Example: L = strings of a's and b's with equal numbers of each
L' = L n a*b* = { a^n b^n : n >= 0 }.
We now argue that notPL is true of L'. The structure of
the argument exactly mirrors the structure of the statement.
stmt what we write
forall > Choose any p > 0.
exists > Let w be a^p b^p.
forall > Choose any factorization w = xyz with
y >= 1 and xy<= p
exists > Let k=3.
Then xy^kz has more a's than b's
You can think of this as providing a winning strategy
against an opponent who thinks that L' is regular.
Another example: L = Even length pailndromes = {ww^R : w in Sigma* }
This is regular if Sigma = 1
Not regular if Sigma > 1
You should write out the proofs of these.
Before showing L isn't regular, intersect it
with a*ba* first.
Problems with the pumping lemma: not definitive (there are nonregular
languages that satisfy it) and can require tedious case analysis
(trying all factorizations xyz).
MyhillNerode Theorem: A reliable test for regularity.
The underling idea.
Let L be some language. We want to decide membership in L,
by reading from left to right. Suppose w = xz and I have
read x. What must I "remember" about x?
Hard to think about abstractly (what does it mean to "remember"
something?) but I can simplify the question to:
Suppose x and y are two strings.
Under what conditions must my "memories" of x and y
be different?
Answer: If there is a distinguisher, that is, a string
z for which
xz belongs to L
yz doesn't belong to L
(or the other way around).
This is the key idea behind MyhillNerode. It is already enough
to show some languages are not regular.
Example: L = { a^n b^n : n >= 0 }
If m != n, then x=a^m and y=a^n have the distinguisher
z = b^m.
So any lefttoright algorithm to recognize L
needs access to unbounded memory. In other words,
no DFA can recognize L.
Lecture 12
Mon 10/2/17
Today: MyhillNerode theorem
Minimal finite automata
(Sipser Exs. 1.51 and 1.52, Hopcroft/Ullman pp. 6571)
Let L [ Sigma* be a language, x and y strings in Sigma*
Defn: A string z is a distinguisher for x,y if exactly one of
xz, yz is in L.
We will write x ~L y when x and y have no distinguisher.
Lemma (Ex. 1.51 in Sipser): ~L is an equivalence relation.
Proof: We must verify that ~L is
Reflexive: x ~L x (it is impossible for xz to be in and not in L)
Symmetric: x ~L y ==> y ~L x (the defn of ~L is symmetric)
Transitive: x ~L y and y ~L z ==> x ~L z.
Let x ~L y and y ~L z. It is required to prove that
forall w, xw and zw are either both in or both out of L
Choose any w.
If xw in L, then yw in L, so zw in L.
If xw not in L, then yw not in L, so zw not in L.
In either case, both xw and zw are in L, or neither is.
So z ~L w.
Recall that any equivalence relation on S partitions S into disjoint
equivalence classes. x's equivalence class consists of all the
elements of S that are equivalent to x.
[Draw the picture.]
The number of equivalence classes is called the *index*.
Example: Let S = Z (the integers). Call two numbers equivalent
if their difference is divisible by n. All numbers that yield the
same remainder after division by n form an equivalence class.
Theorem (MyhillNerode): L is regular iff the index of ~L is finite.
Proof (==>) [This is Problem 1.52a of our text]
Let (regular) be recognized by the DFA M.
Say that x ~~M y if M sends x,y to the same state.
This is finer than ~L, that is, if x ~~M y then x ~L y. (Why:
any distinguisher z for x and y would show that M didn't work.)
So index( ~L ) <= index ( ~~M ) <= # of states for M < oo.
Proof (<==) [This is Problem 1.52b]
We will make a DFA M for L, which uses the alphabet Sigma.
The states of M are the ~Lequivalence classes.
Let [w] denote the equivalence class of w (under ~L). Observe that
either all the elements of [w] are in L, or none of them are.
Initial state: q0 = [epsilon]
F = { [w] : w is in L }
delta([x],a) = [xa].
We need to check that the next state doesn't depend
on which representative we pick for [x]. Suppose that
x ~L x'. Then for all z, xz and x'z are both in L or both out.
In particular, for all w, xaw and x'aw are both in L or both
out of L. So xa ~L x'a, i.e. [xa] = [x'a].
To finish the proof, show (by induction on n) that
delta([epsilon], a1 a2...an ) = [a1 a2...an]
Then, x in L iff delta(q0,x) in F, i.e. M accepts x.
The proof showed that the DFA made from the ~Lequivalence classes
uses the fewest number of states possible.
This suggests a procedure for finding a DFA with minimal state count.
The idea is to start with a lower bound on the number of states
needed, and make it larger, until is equals the index of ~L.
Start with any DFA M for L. Assume that M has both final and nonfinal
states. (If not, one state suffices.) Also assume that every state
of M is reachable.
Begin with the partition Q = F + F'.
While there are two states q,r in the same part P, and a letter a
with delta(q,a) and delta(r,a) in different parts,
Use a to split P, according to the values of delta(q,a).
It's easy to see this terminates, since each pass through the loop
increases the index of the partition.
Correctness is more subtle. There is a proof in Hopcroft & Ullman
(see Thm 3.11, page. 71).
Lecture 13
Wed 10/4/17
Midterm exam will be Wed 10/25 7:15 PM, room TBA.
Homework #2 will be due Fri 10/13.
Example of DFA minimization procedure
Automata with output (HU pp. 2.7  on course web site)
Review of minimization procedure for DFAs
Example: Modified mod 2 counter, with 4 states:
B
^ \ 0,1
1 / \
> / v <
0  (A) (C)  0
 ^ / 
1 \ / 1
\ v
D
 ^
 

0
Initial partition:
A C  B D
delta(B,0) is final and delta(D,0) is not so we split
B from D:
A C  B  D
1 splits A from C, since delta(A,1) = B and
delta(C,1) = D are in separate parts. We get:
A  C  B  D
This tells us that the machine cannot be reduced.
Exercise for you: Which strings does it accept?
Finite Automata as Computers
Remember our course is presenting three "models of computation":
finitestate, unlimited (e.g. Turing machines), polynomialtime.
Up to this point, our finitestate devices don't do very much.
They only give one bit of output (whether the string is in
a language or not). Theoretically, all function computation
can be reduced to such yes/no decisions ("is the 4th bit of
f(x) a 0 or a 1?"), but this is inelegant and seems to violate
the spirit of finitestate computation.
We'll discuss two models for a "finitestate computer":
Moore Machines (after E.F. Moore, of Bell Labs and then UWMadison)
Q,Sigma,Delta,q0 same as a standard DFA (no final state set)
Delta = output alphabet
lambda : Q > Delta (so outputs are associated with states)
What it does: if the input is a1 a2... an (ai in Sigma)
and the states visited are q0 q1 ... qn, the output is
lambda(q0) lambda(q1) ... lambda(qn).
If f_M is the function computed by the Moore machine M,
then f_M is lengthincreasing:
 f_M(x)  = x + 1
Example (see Hopcroft/Ullman p. 43): Moore machine to
find the remainder mod 3 for a binary number. Based on
the idea that, when x is a binary number,
x0 = 2x
x1 = 2x + 1
Systematic examination of the output reveals that
the last symbol in f(x) is x mod 3.
Mealy Machines (named for G. H. Mealy, of Bell Labs):
Ouput is associated with state transitions:
lambda : Q x Sigma > Delta
What gets computed is
lambda(q0,a1) lambda(q1,a2) ... lambda(q{n1},an).
so this is length preserving
f_M (x) = x.
We can think of a Mealy machine as a "subtitution with
memory."
Example: WWII Enigma encryption machine
Equivalence: If M is a Moore machine and N is a Mealy machine,
we say that M is equivalent to N if (for some b and all x in
Sigma*)
f_M(x) = b f_N(x).
Theorem: For every Moore machine, there is an equivalent Mealy
machine, and vice versa.
Lecture 14
Fri 10/6/17
Discussion of problems in HW 1
Regular expressions in the real world
Reference: Section 4.1 (the grep family) of Kernighan and Pike
The Unix Programming Environment, PrenticeHall 1984. Now on
the 520 web site.
Some history
1960s: Text editors (led to today's vim, notepad, etc.)
Necessary to find and print lines containing given words
More generally, lines containing a *pattern*
~1968: Ken Thompson's qed interprets "pattern" as regular expression
Standalone pattern matcher (grep) still in common use.
E.g.
grep file
finds all lines with a string matching the pattern
grep v file
finds all lines not containing the pattern
Thompson's approach: run the text against an NFA derived from
the RE. See his paper in CACM, 11:6, June 1968.
Eventually regular expressions were put into progamming languages
like Java, Python, Perl, ...
grep's RE syntax (now standard):
Most symbols such as a,b,... stand for themselves
Concatenate to make strings, e.g. xyz
Metacharacters (used to specify structure of the RE)
 (pipe) means OR (or union)
( ) (parens) for grouping
^ forces match at beginning of line
$ forces match at end of line
. (wild card) matches any single character
[ ] (set brackets), e.g. [AZ] matches any capital letter
These can be combined, e.g.
colorcolour is equivalent to col(oou)r
Repetition symbols
R* Kleene star (as usual)
R+ means 1 or more copies of R
R? means 0 or 1 copies of R
R{n} means n repetitions of R
Extended REs
Many implementations allow back references, e.g. in
(.+)\1
the RE in parens matches any nonempty string, and the \1
refers to another copy of what was used in the match
These can define nonregular languages, for example
(a+)(b+)\1\2
matches L = {a^m b^n a^m b^n : n >= 1}. This is not
regular, by the pumping lemma.
Some examples. Since the shell interprets many symbols
before executing the command, it is wise to put the pattern
in single quotes.
grep 'abc'
matches any line containing abc
grep v 'abc'
matches any line not containing abc
grep '[AZ]*$'
matches everything (!) but
grep '[AZ][AZ]*$'
matches lines ending with a nonempty block
of capital letters
grep '^\(abc\)\1$'
matches a line ending with abcabc
but not the line xabcabc
(My version of grep requires that backreferenced
expressions be "tagged" by using escaped parens.
Using plain parens gives an error message.)
Lecture 15
Mon 10/9/17
New Topic: Computability Theory
Sipser 3.1, 3.3
Rogers 1.1 has a nice intro
Some history
~ 1900: Development of formal logic
Hilbert's challenge: Find a method to decide if a sentence
of firstorder logic (can only apply forall, exists to
individuals) is true under any possible interpretation.
E.g. forall x ( R(x) > (R(x) OR S(x)) )
holds no matter what R,S mean.
~ 1930: Godel showed that any sufficiently expressive system
for proving statements about the integers would have
true, but unprovable statements.
~ 1935: Church (US) and Turing (UK) showed that Hilber's sought for
method cannot exist.
To show this requires a definition of "algorithm."
In this course we will study Turing's definition. The defn
of Church (and many others) leads to the same class of
computable functions.
What is an algorithm? Informally, it should exhibit the following
properties.
1. It is discrete and finite (e.g. a piece of text written using
a finite alphabet).
2. It can be executed by a agent who just follows precise directions.
No creativity or guesswork is needed.
3. Computation is deterministic (no random choices).
4. There is a way to store and retrieve intermediate results.
5. There is no a priori limit on the time or memory needed for a
computation.
Turing Machine (1936):
This was Turing's idealization of what a computer (then a human
being) executing an algorithm would do.
Infinite tape, divided into cells. Each cell holds one symbol.
Finitestate device that can read/write symbols and move
left or right.
Time is discrete: At each step, the machine will
Consult its internal state and symbol scanned
Write a new symbol
Move left or right one cell
Change its state
(Moves that would force the machine to fall off its
tape are ignored.)
Formal defn similar to the automata we've studied
See Sipser 3.1 for details, in particular Defn. 3.3
Note that each TM embodies one algorithm.
Warning: Details of the model are not standard.
Configuration: snapshot of a computation
We will abbreviate
[ q ]

v
A B C D ... Z _ _ _ ...
(the underscore _ stands for a blank)
as
A B C q D ... Z
Note: We assume that Q n Gamma is empty, so that the configuration
is just a string that is uniquely readable. Sipser (p. 169)
ensures readability by separating q, in effect making the
configuration a triple.
Lecture 16
Wed 10/11/17
Review of Turing machine model (Sipser 3.1)
Finite automata as Turing machines (Sipser 4.1)
Turing Machines
Review of Sipser Defn 3.3
TM "instructions" are quintuples: qi s qj t Delta says that
the machine, when in state qi and reading symbol s, will
write symbol t, move by Delta (one square L or R), and then
go into state qj. There is no "program counter"  the state
does the job of one.
If computation is a movie, then a single frame is called
a configuration.
If C,C' are configurations, then C yields C' if the TM could have
gone from C to C' in one step (meaning it executed one quintuple.)
Conventions: accept/reject configs yield nothing
left moves off end of tape are not executed
Defn (important): M {accepts, rejects} w in Sigma* if there is a sequence
of configurations, each yielding the next, such that
q0w = C0, C1, C2, ..., Ct = x q_accept y or x q_reject y
t is the number of steps, or the length, of the computation.
For any TM M and input word w, there are three mutually exclusive
possibilities:
M accepts w
M rejects w
M on input w runs forever.
Defn: L [ Sigma* is
Decidable (aka recursive) if for some TM M,
w in L ==> M accepts w
w not in L ==> M rejects w
Recognizable (aka recursively enumerable) if for some TM M,
w in L <==> M accepts w
(If w is not in L, M can reject or run forever.)
Note: This terminology is by no means standard. Many people say
computably enumerable for recognizable, for example.
Lecture 17
Fri 10/13/17
HW 2 Due
Review of Turing machine model
Church's Thesis (Sipser 3.3)
Turing machine:
Finite state control + infinite tape
Executes step by step, symbolically
Accepts/rejects by final state
L [ Sigma* is
Decidable if for some TM M,
w in L ==> M accepts w
w not in L ==> M rejects w
Recognizable if for some TM M,
w in L <==> M accepts w
(If w is not in L, M can reject or run forever.)
Language classes:
REGULAR [ DECIDABLE [ RECOGNIZABLE [ ARBITRARY
We showed last time that every regular language is decidable.
Every decidable language is recognizable (use the same M).
There are decidable languages that are not regular.
Example: L = { a^n b^n : n >= 0 }
Sketch of a TM design to decide membership in L
Tape alphabet = { a, a', b, b' [] (blank) }.
Think of a' as a marked version of a, similarly for b' and b.
The machine will sweep from left to right, marking the first
a and the first b that it encounters.
It accepts if it reaches a blank without seeing any a's
or b's. (This handles the case n=0 in L.)
It rejects if it marks an a but not a b, or vice versa
Otherwise, it moves back to the left and makes another
sweep.
[Examples.]
We can extend the idea of TM computation to include functions.
Defn: f : Sigma* > Sigma* is computable (sometimes called
recursive) if, for some TM M,
w in Sigma* ==> M on input w halts a final configuration
with just f(w) on its tape.
Thm: Every function computed by a {Moore, Mealy} machine is
computable.
Proof: modify the proof (given last time) that every regular
language is decidable. The Turing machine will exactly mimic
the action of the finitestate machine.
There are computable functions that cannot be computed by an Mealy
or Moore machine.
Example: f(w) = ww.
A TM for f can just copy the input, symbol by symbol,
to the right of w. (It can use a similar "marking"
idea to keep track of what has been written.)
f(w) = 2w, but any Mealy or Moore machine transforms
w into a string of length <= w + 1.
Church's Thesis (after the American logician Alonzo Church):
The formal concept of computation by a Turing exactly matches
our informal concept of computation by an algorithm.
Note that this is not a mathematical statement, but rather a statement
about what (most) people believe.
Rogers (~1965) used this to simplify proofs. Rather than explicitly
describe a TM to do some task, just describe algorithms precisely
but informally. (Any reader who isn't satisfied with such an argument
is invited to fill in the details.)
This has now become standard in computability theory.
Observe that our description of the TM to recognize a^n b^n
took advantage of Church's thesis.
Lecture 18
Mon 10/16/17
Church's Thesis and Proofs
The Evidence for Church's Thesis
Review of Church's thesis (see 10/13 lecture)
If we adopt Church's thesis, we can simplify proofs.
Example: For an infinite language L [ Sigma*, the following are
equivalent.
1) L can be enumerated by some methodical procedure. This
means that the procedure makes a list which will eventually
contain every w in L, but none of the words that are not in L.
(Words can repeat.)
2) L is recognizable by some Turing machine M, which means that
M accepts w iff w is in L.
Proof that 1) ==> 2): On input w, M starts the enumeration procedure.
If w ever appears in the list, M stops in
an accepting state.
Proof that 2) ==> 1): This uses an idea called *dovetailing*.
Let w1, w2, w3, ... be an enumeration of Sigma*,
such as the following. List the strings in
groups by length, using dictionary order to
decide the ordering within each length. Thus,
when Sigma = {0,1}, the enumeration is
epsilon, 0, 1, 00, 01, 10, 11 , ...
Suppose that M recognizes L.
For n = 1,2,3,...
Run M on w1,...,wn for n steps each.
Output all wi that were accepted.
The idea behind the procedure is that any
element w of L will appear on the list, and
then on all on subsequent ones. Eventually
n will be large enough that M's computation
on w can run to termination.
The Evidence for Church's Thesis
1. All attempts to extend the Turing machine definition to include
extra features have not extended the ultimate power of such
a device.
2. Many alternative formal definitions of computation have been
proposed. All known such alternatives have the same power
as the Turing machine.
Examples for 1:
a. Twoway infinite tape: Simulate with a twotrack tape, using
the top track for one direction and the bottom track for
the other.
b. k>1 tapes, say k = 2: We can simulate a twotape machine with
two tracks on a onetape machine. The only difficulty is when
the heads are ordered to move in opposite directions. Then,
one instruction by the 2tape machine requires the onetape
machine to shift the two tracks relative to each other This
causes a slowdown which is of no consequence in computability
theory.
c. Random access: We can put the addresses on one track of a single
tape, and the data word (unconstrained as to length) for an
address starting right below it.
d. Twodimensional tape, say with addresses
14
9 13
5 8 12 ...
2 4 7 11
i >= 0 0 1 3 6 10
j= 0 1 2 3 4
The address for (0,j) is sum_{k=0}^j k = (j+1 choose 2)
So the address for (i,j) is (i+j+1 choose 2) + j.
Examples for 2. include general recursive functions (Kleene),
lambdacalculus (Church), string rewriting systems (Markov).
Lecture 19
Wed 10/18/17
Bit Encoding (Sipser p. 145 [1st edn], p. 157 [2nd], p. 185 [3rd])
The Halting Problem (Sipser 4.2)
Bit Encoding
All "discrete math" objects can be represented with bit strings.
A course in data structures will discuss this at length. Here
I'll just give a few examples.
N (natural numbers) = {0, 1, 10, 11, 100, ... }
Kurt Godel's representation for finite sequences of natural numbers
x1,x2,...,xn > 2^{x1+1} 3^{x2+1} ... pn{xn+1}
(pn is the nth prime).
e.g. 0,1,1,2 is represented as 2^1 * 3^3 * 5^2 * 7^3 = 154350
Using sequences, we can do a lot more. For example, the graph G
given by
v1  v2  v3
has the adjacency matrix
0 1 1
1 0 0
1 0 0
which can be "flattened" into 0,1,1,1,0,0,1,0,0,
In particular, note that any Turing machine can be represented
using a sequence of 5m numbers, for some m. (The factor 5 is
because we used quintuples to describe our machines.)
Let us pick some bit encoding once and for all.
Defn: is the bit string that represents the object M.
represents the sequence M1,...,Mk.
The Halting Problem
Given a TM M and an input string w, does M halt on w?
Let K = { : M halts on w }.
Thm 1: K is recognizable by a Turing machine
Proof: Here is a TM for the job.
On input (just reject if format is incorrect)
Simulate the action of M on the word w.
If this computation stops, accept the input.
Thm 2: K is not decidable.
Proof: Use diagonalization. (We used this on 9/12 to
prove that there are nonregular languages.)
Suppose we had a Turing machine H to decide
membership in K. Use that to make D:
On input (M is a TM)
Run H on >.
If M on would halt, loop forever.
If M on would loop, hald.
Then we know that (for any M)
D() will {loop, halt} iff M {halts, loops} on .
Taking M=D:
D() will {loop, halt} iff D {halts, loops} on .
This is a contradiction so there is no such H.
Via the same kind or argument, Sipser shows that
A_{TM} = { : M accepts w } is not decidable.
In contrast, A_{DFA} = { : the DFA M accepts w}
is.
Thm 2: Let K' = { : M doesn't halt on w } is not
(Turing)recognizable.
(You should think of K' as the complement of K, although
it isn't exactly that, because {0,1}*  K contains all
kinds of indecipherable junk.)
Proof: Suppose it were. Then (by a result proved 10/16)
There would be enumerators for K and for K'. Suppose
their listings are
K = { w1, w2, w3, ... }
K' = { w1', w2', w3', ... }
To decide if w is in K we could run the following procedure:
For n = 1,2,3,...
Obtain wn and wn'.
If w = wn, accept w and halt.
If w = wn', reject w and halt.
(This works because w will be in exactly one of the lists
we see.)
Using the same argument we can prove a more general result:
A language L [ Sigma* is decidable iff L and its complement
are recognizable. This is Thm 4.22 in Sipser, and Thm V.II
in Rogers.
Some language classes:
REGULAR [ DECIDABLE [ RECOGNIZABLE [ ARBITRARY
We now know that all containments are proper.
Lecture 20
Fri 10/20/17
HW2 to be returned next Monday (please come early)
Exam: Wed 10/25, 7:15 PM, 105 Psychology
Bring one 8.5 x 11 sheet of notes (both sides)
No calculators, cell phones, or other electronics.
Today's topics: Universal Turing Machines (in Sipser 4.2)
Reductions (Sipser 5.3)
Universal Turing Machines
Last time: Unsolvability of the halting problem for TMs
Key ingredient: Ability of one TM to simulate any other one.
We can summarize this with a program U:
On input
Set up the initial configuration for M w/ input w
While M is not in a final state (q_accept, q_reject):
Run M for one step to get the next configuration
If M is in its accept state, accept w.
If M is in its reject state, reject w.
U can be realized as a Turing machine. We call any
such TM a *universal* Turing machine
Analog in modern software: an interpreter for Pprograms (P is
Java, C++, Python, ...) that is itself written in P. This is
clearly possible.
Reductions
Review of what it means for a TM to compute f : Sigma* > Sigma*
(see lecture 10/13/17).
Defn: Let A,B [ Sigma*. (That is, A and B are languages using the
finite alphabet Sigma.) A is reducible to B if for some computable
function f : Sigma* > Sigma*,
w in A ==> f(w) in B
w no in A ==> f(w) not in B
[Draw "fried eggs" picture here.]
Another way to state the condition is: f(A) [ B, f(A') [ B'.
This is often called mapping reducibility or manyone reducibility.
Notation: A <=_m B means that A is reducible to B.
Reductions have been used in mathematics for a long time:
Example: A = { : a != 0, ax^2 + bx + c = 0 has a real soln }
B = { } : d >= 0 }.
(The variables a,b,c,d denote integers here.)
The discriminant f(a,b,c) = b^2  4ac reduces A to B.
So A <=_m B.
The notation A <=_m B suggests that A is no harder to decide than B,
in a certain sense.
Thm 1: If A <=_m B and B is decidable, so is A.
Proof: To decide if w is in A, compute v = f(w). Feed v into
a decision procedure for B. Report whatever this
procedure says.
Cor 1: If A <=_m B and A is not decidable, neither is B.
Proof: Use proof by contradiction. If B were decidable, so would
A be, by Thm 1.
Thm 2, Cor 2: Same as Thm 1, Cor 1, but for acceptable languages.
Proofs: Very similar, left to you.
Lecture 21
Mon 10/23/17
HW 2 returned
Discussion of problems from HW 2
Lecture 22
Wed 10/25/17
Twohour midterm exam: 7:15 PM, 105 Psychology
1 sheet (8.5 x 11, front and back) of notes allowed.
Discussion of problems from 2011 CS 520 exam
Lecture 23
Fri 10/27/17
Total and partial functions
References: Hopcroft & Ullman 7.3, 7.7
Rogers 5.1, 5.2
Review of Turing machines as devices to compute functions.
Output convention: when the computation stops, there is one
string containing no blanks (the output) on the tape.
Motivation: By design, each Turing machine defines a language (the
set of strings that it accepts). We'd like to extend this idea
to include functions. Unfortunately, we can't tell a priori whether
any given computation (given as a Turing machine M and an input word w
for M) will stop or not. Partial functions give us a way out of this
problem.
Defn: For sets A and B, a partial function from A to B is a function
f : D > B, where D is a subset of A. D is called the *domain* of f.
If you like the "ordered" pair definition, a partial function is just
a subset of A x B, such that there is at most one b for every a in A.
This subset could be empty.
Example: sqrt(x) for real x. The domain is the set of nonnegative real
numbers. Try to find more examples on your own.
Defn: f is a partial computable function (aka partial recursive function)
if there is a D [ Sigma* and a Turing machine M such that:
w in D ==> On input w, M halts with f(w) (a string in Sigma*)
and nothing else on its tape.
w not in D ==> On input w, M either loops or halt without leaving
an output.
f is a total computable function if its domain is Sigma*.
Every Turing machine defines a partial computable function f_M.
is called an *index* or *program* for f_M.
f_M could well be empty  just have M loop forever.
Can we design a programming language that gives exactly the toral
computable functions? No!
Theorem: {: f_M is total} is not recognizable.
Proof: Diagonalize. If this were recognizable, there would
be a procedure to enumerate it. Suppose I did have a list of
the programs for the total functions:
M_1, M_2, ...
I can enumerate the elements of Sigma* :
w_1, w_2, ...
Let a be a symbol from Sigma. Define a new total function f by
f(w_i) = f_{M_i}(w_i) a ( != f{M_i}(w_i), by design )
Since this differs on at least one input from every function in
my list, it cannot be on the list.
Two useful theorems:
1) A [ Sigma* is recognizable iff it is the domain for some partial
computable function.
2) A [ Sigma* is recognizable iff is is the range of some partial
computable function.
This is often strenghened as follows: a nonempty A is recognizable iff
it is the range of some total recursive function f. (You can think
of f as an enumerator for A.)
Here is a proof of 1). The proof for 2) is similar.
(=>) Suppose A is recognizable by the TM M. Create a new TM M' that,
on input w:
Runs M on w.
If this halts in the accept state, erases the
scratch work, prints some symbol, and halts.
If this halts in the reject state, loops.
Then A is the domain for f_{M'}.
(<=) Let A be the domain of f_M. Create a new TM M', that, on input w:
Runs M on w.
If this halts with an output (=f_M(w)), accepts w.
If it halts without an output, rejects w.
Lecture 24
Mon 10/30/17
Review of reductions
Rice's Theorem: Sipser, Exercise 5.28 (and its solution)
Hopcroft/Ullman pp. 185189
Reductions
Definitions as in lecture 10/20/17.
Reductions have two uses: 1) to provide a method for deciding
membership in A (if B is decidable), and b) to allow us to prove
that B is undecidable (if A is undecidable).
In computability theory, the sets A and B are often composed
of programs (i.e. strings where M is a Turing machine), or
computational tasks (encoded pairs , where M is a TM and
w is an input word for M).
Example: Let A = {2x2 linear systems w/ rational coeffs,
having a unique solution}
B = {nonzero rational numbers}
The determinant gives a reduction from A to B.
More precisely: Suppose the system is
ax + by = e
cx + dy = f
Then the function which sends (a,b,c,d,e,f) to ad  bc
is a reduction from A to B.
Using Reduction to Prove Undecidability
K = { : The computation of M halts on w }. We proved earlier
that this is not decidable.
L = { : M computes the constant function f(w) == 0}. (We'll assume
that 0 belongs to the input/output alphabet Sigma.)
Let's reduce K to L. This shows that L is undecidable.
The mapping is
> M', the following program:
On input x,
Run M on w.
If (and when) this halts, return 0
Note that we have mapped computational tasks to programs, and
the pair is "hard wired" into the program.
Then
in K ==> M halts on w ==> M' in L
not in K ==> M doesn't halt on w ==> M' not in L
as needed for a reduction.
Rice's Theorem
Let P be a nontrivial input/output property of programs.
(This means some programs have it and some don't, and that
P is defined solely from the function the program computes.)
Then P is undecidable.
Proof: W/out loss of generality, P doesn't hold for programs
that compute the empty function (otherwise, swap P with
its complement).
Choose f, computed by some program with property P.
Let > the following program, M'':
On input x,
Run M on w
If this halts, return f(x).
As before, we see that
in K ==> M'' has property P
not in K ==> M'' does not have property P
Since K is undecidable, we can't decide if an arbitrary program
have property P or not.
The Futility of Software Engineering
Rice's theorem implies that various "semantic" questions about programs
(that is, questions about the input/output behavior of Turing machines)
cannot be answered by algorithms. For example:
Formal Informal
Does M compute a total function? Can it ever loop?
Is f_M defined anywhere? Could it ever return an answer?
For given pairs (xi,yi), i=1...n Does it work on the test data?
does f_M(xi) = yi (all i)?
For a particular f, does f_M = f? Is it correct (in the sense
of matching a specification)?
Lecture 25
Wed 11/1/17
Topic du jour: Kleene's Recursion Theorem (Sipser 6.1)
Fixed Points
Selfreproducing Programs
Socratic Programming
An Extension of the Program Concept
Let Sigma = {0,1} and w in Sigma*. Let
the partial function computed by TM M, if w =
f_w =
the empty function, if there is no such M.
That is, we allow *any* string to be thought of as a program.
Strings that are incomprehensible (don't encode a Turing machine)
just name for the empty function.
The FixedPoint Theorem:
Theorem: Let F : Sigma* > Sigma* be a (total) computable function.
Then, for some w, f_w = f_{F(w)}.
Informally, there is no perfect "program shredder".
Examples to see why this is so: F erases every program,
F appends 1 to the output of every program, etc.
Note that w need not be a literal fixed point (F(w) equals w), but
is a kind of semantic fixed point, in that the programs w and F(w)
have identical behavior.
Proof: (From Machtey and Young, An Introduction to the General
Theory of Algorithms, pp. 110111. I have changed their notation
to be compatible with ours.)
There is a total computable function G such that:
f_{f_x(x)}, if the output of f_x on x is defined
f_{G(x)} =
the empty function, otherwise.
G is a mapping from programs to programs. Indeed, G(x) is:
On input z
Run f_x on x. If it givs the output w,
Run f_w on z, and report the output (if any).
Note that G is total. We can compose computable functions, so
there is a u such that
F o G = f_u
(The expression on the left means: "Do G, then apply F to whatever
output was produced.") Since F,G are total, so is f_u.
Then:
f_w = f_{G(u)} = f_{f_u(u)} = f_{F(G(u))} = f_{F(w)}.
Note: This form of the recursion theorem is apparently due to Rogers.
SelfReproducing Programs
Corollary 1: There is a program (TM description) that prints itself
and halts.
Proof: Let F(x) be the program "Output x."
Then, by the fixed point theorem, there is a w such that
f_w = f_{F(w)}.
^ ^
 
the partial fn the constant function, equal
computed by to w for every input.
program w
Socratic Programming
Corollary 2: We can write programs that refer to their own descriptions.
We'll skip the proof. See Theorem 6.3 in Sispser
That is, there are programs of the following kind:
M: On input x,
Obtain M^ such that f_{M^} = f_M.
Do any other computations, using x and M^ as data.
The term "Socratic" (just made up for this lecture) comes from
the philosopher's dictum: Know Thyself.
As an application, we can give a proof of Rice's theorem, which
stands independently of the one from last lecture.
[Review Rice's Theorem, from lecture 10/20/17.]
Proof of Rice:
Let P be a nontrivial inputoutput property of programs,
and assume we could decide P. Choose a program N that has P
and another program N' that does not.
By Corollary 2, we could then make the following program
M: On input x
Obtain an M^ with f_M = f_{M^}.
Test M^ for property P
If yes, compute N'(x)
If no, compute N(x)
This is a contradition, since M has property P iff it doesn't.
Lecture 26
Fri 11/3/17
Today: Godel's incompleteness theorem (Sipser 6.2)
See also Nagel and Newman, Godel's Proof, NYU Press, 1958.
For a more detailed view, see Enderton, A Mathematical Introduction
to Logic, Academic Press, 1972.
Recall the Hilbert Program: Find methods to decide if an arbitrary
mathematical statement is true or false. Part of the problem is to
say formally what counts as a mathematical statement.
Godel (1931) showed this is unattainable. We'll show how one of Godel's
key results, the Incompleteness Theorem, follows from the recursion
theorem. (Thus reversing historical order  the recursion theorem is
from the 1940s.)
A Language for Mathematics
Variables: x,y,z, etc.
Boolean Connectives: ^ (AND), v (OR), ~ (NOT).
Quantifiers: forall, exists
Relation Symbols: R,S,T,... (each having a fixed arity, or
number of arguments)
Function Symbols: f,g,h,... (ditto)
Brackets: (, ), [, ], etc.
Note that these are not "symbols" in the sense of our course,
since the list is infinite. To get around this, we can encode
each logical symbol using 0's and 1's.
A formula is a "wellformed" statement, using the above symbols, e.g.
forall x R(x,y).
This one has y free and x bound (within the scope of a quantifier).
Sentence: formula with no free variables, e.g.
forall x R(x,x)
Models and Truth
A model, or structure gives "meaning" to sentences and formula.
Has a universe U, plus a relation for each relation symbol,
and a function for each function symbol.
Example: (N,+,=), where N is the natural number, + the
addition function, and = the prediate for equality of
individuals.
M = sigma means that M is true for the sentence sigma.
This can be defined using recursion on formulas (details skipped).
The idea of defining truth this way is due to Karski
Th(M)  the theory of M  consists of all sentences true
in M.
Example: For (N,+,=)
exists x (x+x=x) is true (use x=0)
forall x (x+x=x)a is not (x=1 is a counterexample)
In 1929 Presburger showed that Th(N,+,=) is decidable. (See
Sipser Thm 6.12.)
Proofs
Let us equip our system with a notion of formal proof.
Details skipped, but here are the properties we will need:
1) The set of axioms is decidable (often it is finite).
2) We can decide if a sequence pi is a proof of the statement sigma.
3) In any model satisfying the axioms, provable statements
are true. (This is often called soundness.)
Books on mathematical logic (e.g. Enderton) give the details of this.
Note: Godel showed how sequences can be defined using only +,* .
So our mathematical language is rich enough to include statements
about proofs, Turing machines, computations, ...
Godel's Incompleteness Theorem (1931): For M = (N,+,*,=), some true
statement in Th(M) has no proof.
Proof: by the recursion theorem, there is a program (TM)
M: On input w,
Obtain M^ with identical inputoutput behavior as M
Search (by examining all sequences) for a proof of
"M^ does not accept 0" < Call this stmt psi
If one is found, stop and accept w.
psi is not provable:
If psi has a proof, then M accepts all w, hence it accepts 0
So M^, which behaves identically to M, also accepts 0.
But this violates soundness.
psi is true:
No proof of psi exists. Therefore M won't accept 0, so
M^, which behaves identically, won't accept 0 either.
This result was revolutionary for its time and is disturbing even
for today.
Lecture 27
Mon 11/6/17
Discussion of midterm exam problems
Lecture 28
Wed 11/8/17
Discussion of problems from HW3
New Topic: Computational Complexity
Sipser 7.1 ff.
So far we have studied
Finite State Machines: too weak
Turing Machines: too powerful
Complexity theory comes from the attempt to find a middle ground
between these two extremes.
Earlier attempts along the same lines included subrecursive
hierarchies, and the study of TMs with limited architectures
(such as pushdown automata).
1965: Hartmanis and Stearns (GE Labs) studied Turing machines
with limited running time. This eventually led to the modern
idea of polynomial time computation.
Running Times for Turing Machines
A TM computation is a sequence of configurations (tape +
location/state of the machine):
C_0  C_1  ....  C_t
We say that such a computation uses t steps, or takes time t.
This allows us to assign a cost (in this case, the number of steps)
to any TM computation.
Example: Recognizing L = { wcw }. It is assumed that w in {0,1}*,
and c != 0,1.
A onetape TM can do this by repeatedly sweeping left
to right. With each sweeep the TM checks off one pair
of symbols that match. For the input 00110c00110,
the tape would be
_
0 0 1 1 0 c 0 0 1 1 0
_ _
0 0 1 1 0 c 0 0 1 1 0
_ _ _
0 0 1 1 0 c 0 0 1 1 0
_ _ _ _
0 0 1 1 0 c 0 0 1 1 0
...
When w = n, the number of steps is about 2n^2: Each sweep,
forward and back, takes about 2n steps, and there are n
sweeps.
This could be done more quickly, by using more state
so that B symbols can be remembered at once. For B=2, the
computation then becomes
_ _
0 0 1 1 0 c 0 0 1 1 0
_ _ _ _
0 0 1 1 0 c 0 0 1 1 0
_ _ _ _ _
0 0 1 1 0 c 0 0 1 1 0
_ _ _ _ _ _ _ _
0 0 1 1 0 c 0 0 1 1 0
...
This seems to be faster. To make this precise, observe
that
time per sweep ~~ 2n + c
# of sweeps ~~ n/c
so the total time is about
(2n+c)*n/c = 2 n^2 / c + n
Consideration of examples like this, for which there
seems to be no "best" TM suggests that we should
base our theory of complexity on growth rates rather
than absolute step counts.
BigO Notation
This is a common way to express resource bounds in contexts
where we have chosen to ignore constant factors.
Defn: f is O(g) if: There is a C>0 making f(n) <= C g(n)
for all suffiently large n.
This is a onesided verson of the limit concept from
calculus.
This is generally used when f and g are >= 0,
at least when n is large.
Examples: 5n^3  15 = O(n^3) but is not O(n^2).
n log n = O(n^2)
n! is not O(n^k), for any k. (By Stirling's formula)
There are more examples and discussion of this
in Sipser, following Defn. 7.2.
Timebased complexity classes
TIME(f) = { L in Sigma* : L can be decided by a one tape TM
that uses O(f(n)) steps on any input
of length n. }
So we showed that { wcw } is in TIME(n^2).
P = Union_{k>=1} TIME(n^k).
P is short for "polynomial time."
Note that a time bound like O(n log n) is polynomial
time, even though n log n isn't a polynomial. (Why isn't
it a polynomial? Repeatedly differentiating a polynomial
gives 0 in a finite number of steps, whereas this isn't
the case for n log n.)
Lecture 29
Fri 11/10/17
Today: polynomial time computation (Sipser 7.2)
Review of configurations, computation step (= execution of one
quintuple in a TM description).
P = {L [ Sigma*  for some k, there is a Turing machine, using O(n^k)
steps on inputs of length n, that decides membership
in L}
FP = {f : Sigma* > Sigma*  for some k, there is a Turing machine,
using O(n^k) steps on inputs of length n,
Note that P is a class of decision problems, FP is a class
of functions.
Why do we focus on polynomial time?
Main motiation: distinguish "feasible" or "efficient" algorithms
from the "merely finite" ones.
Key paper: Jack Edmonds, Paths, Trees, and Flowers (~1965).
Gave a polytime alg to find a maximum matching in
graphs.
Matching = set of edges that don't share any endpoints.
If G = (V,E), we could test all subsets of E for
the "forbidden" subgraph *** . This isn't poly
time, since there are 2^E such subsets.
Why polynomials, instead of some other growth rate family?
1. Common models of computation can simulate each other with
polynomial overhead.
Example: A ktape TM computation using t steps can be simulated
by a 1tape TM using O(t^2) steps.
The idea is to put each of the k tapes on a separate
track of the single tape. If the read/write heads move
in opposite directions, realignment is necessary.
Since no read/write head can move more than t steps
from its initial position, the realignment can be done
with O(t) steps. Total time for the simulated computation
is O(t^2).
2. Many practical algorithms have polynomially bounded run times.
Example: sorting n mbit numbers. The input length is N=m*n.
For any standard sorting algorithm, the number of key comparisons
is O(n^2). Each key comparison uses O(m) steps (compare bits
from the left, and stop when there is a disagreement). The
total work is O(n^2 m) = O(N^2).
3. The class of polynomials is closed under composition. More
generally, if we use an O(n^k1) algorithm as a subroutine,
and call it O(n^k2) times, the total run time is O(n^{k1*k2}).
Lecture 30
Mon 11/13/17
Today: Designing poly time algorithms (Sipser 7.1)
Unary vs binary encoding of numbers
P = decision problems solvable in poly time on a multitape TM
FP = functions computable in the same way
It is convenient to allow multitape TMs. This doesn't change
what P and FP are, since they can be simulated by singletape
machines with quadratic overhead.
The Big Picture:
REGULAR [ P [ DECIDABLE [ RECOGNIZABLE [ 2^(Sigma*)
All inclusions are proper.
Note that these are classes of languages. Informally we think
of decision problems using ordinary discrete math objects: graphs,
natural numbers, arrays, etc.
Designing polytime algorithms
Do this in two steps: 1) Specify an algorithm using poly(n) high
level steps; 2) show that each highlevel step can be realized
in poly(n) steps on a multitape TM.
Example:
PATH = { : the directed graph G has a (directed) path
from vertex s to vertex t }.
We'll encode the graph as an adjacency matrix. This is an nxn 01
matrix (a_ij) with a_ij = 1 iff there is an edge vi > vj.
Vertices are encoded as binary integers in the range 0..n1.
So the input length (in bits) is n^2 + O(log n) ~ n^2.
Example:
v0 > v1
 / 
 /  ( 0 1 1 0 )
 /  A = ( 0 0 1 1 )
 /  ( 0 0 0 0 )
 /  ( 0 0 0 0 )
 / 
v v v s = 00
v2 v3 t = 01
So we can think of the input as the string
0110 0011 0000 0000 00 01
(Spacing has been added. However, this isn't strictly necessary.
If L = n^2 + O(log n), then n is uniquely determined, as long
as L is sufficiently large. Determining n allows the string to
be parsed.)
A highlevel algorithm for PATH:
Put a mark on s
Repeat
for each edge v>w, if v is marked but w is not,
put a mark on w
Until the number of marks has not changed
Accept iff t is marked.
This is a standard algorithm and can be seen to make
<= V*E edge queries.
Realizing this algorithm on a TM
Let V=n, E=m.
We can dedicate one tape to a bit map of length n for the
marks.
We can run through the edges by making a pass through the
n^2 bits that store A. When we do this, we keep two indices
(on another tape) i,j. These can be incremented odometer
style: j changes with every step, and when the register for j
overflows, the register for i is incremented.
It is now easy to access the bit map, to see if (say) vi is
marked. We make a copy of i on a separate tape, and decrement
it until it equals 0, moving a pointer into the bit map
to the right with each decrement step.
The number of TM steps to do this test is O(n log n). (Each
decrement and zero test is O(log n), and i <= n.
All of the other highlevel steps are polynomial (in n) time
as well.
Unary vs. binary notation
We can test if for primality using trial division:
n>1 is prime iff forall d, 1 for some c in Sigma*, V accepts .
c is called a certificate (or witness, proof,...) of membership
in L.
Pictorially, we imagine a projection that "forgets" c:
c ^


 * * *
 * * * *
 * ******* < acceptable pairs
 ****** *
 ****

 (/////////) > w
^

elements of L
Nondeterministic Algorithms
We extend our (informal) notion of algorithm to include
instructions of the form
Guess x in A,
where A is some explicitly given finite set. The set A can
depend on previous steps taken by the algorithm, or it can
be given once and for all. (Examples: {0,1}, all strings of
a particular length, all simple paths that extend some path
previously chosen, ...)
The algorithm accepts the input if there is some sequence
of guesses leading to an accepting state.
Example: Here is a nondeterministic algorithm for PATH.
On input w = :
Guess a set P of edges (<= 2^E such sets)
Accept of P can be arranged into a path from
s to t.
It's worth thinking about how a set P can be checked
for "pathiness." One (not very efficient) way is to
consider all orderings of P, and then check each to
see if an initial segment of it leads from s to t.
You can think of a nondeterministic algorithm as one that
attempts to construct a certificate piece by piece.
Nondeterministic Turing Machines (NTMs)
These are just like ordinary Turing machines, but with
allowable transitions specified by
delta : Q x Sigma > P( Q x Sigma x {L,R} ).
Everything else is the same. In particular a NTM accepts
w iff some sequence of configurations, starting with the
machine in the initial state and scanning the first
symbol of w, leads to a configuration with an accepting state.
We can think of an NTM's computation as a tree of configurations:
* < starting config for w
/  \
* * *
. .  \
. . * *
. . . .
. .
. .
This can be infinite. If the machine is deterministic, this
tree is linear:
*

*

*
.
.
.
The machine accepts if some path leads from the root to an
accepting configuration. Thus, the path forms a certificate.
A verifier can take w and the path, and check that each
link in the path obeys the specification of the machine.
The branching factor of the tree is bounded (in fact it is
<= 2QSigma).
There is a nice way to visualize nondeterministic computation.
Imagine that each leaf of the tree has a green light if it is
accepting, and a red light if it is rejecting. The input w
is accepted iff there is a green light on somewhere in the tree.
Note that each input gives a different tree. The number of steps
of the (nondeterministic) computation is just the height of
the tree. Usually this will be finite.
Nondeterministic Polynomial Time
NP = { L [ Sigma* : For some k>0, and NTM M that uses
O(n^k) steps on inputs of length n,
L is exactly the strings accepted by M }.
This definition has two components:
All computation trees are shallow
L is characterized by the presence of an "accepting" tree.
Alternative defn using verifiers (you should prove these are
equivalent):
L belongs to NP if for some k>0, polynomial time verifier V,
w in L <==> for some c with c <= w^k, V accepts .
P [ NP, since we could just have a poly time algorithm that
ignores the certificate. (Alternatively, any deterministic
TM is also an NTM.)
It's not known if these two classes are different, but conventional
wisdom holds this to be unlikely.
Lecture 32
Fri 11/17/17
Review of NP (a class of decision problems)
Examples of problems in NP (Sipser 7.3)
1. Composite Numbers
n>1 is composite (not prime, not a unit) iff
for some d, 1 < d < n, n mod d = 0.
d is the certificate. Note length(d) <= length(n)
The verifier can simply use the standard long division
algorithm to compute the remainder for n divided by d.
We usually assume that n and d are given in binary.
2. Cliques
G =(V,E) is an undirected graph.
Let n = V, m = E as usual.
A clique is a set of mutually adjacent vertices. This means
that for every v,w in the set, (v,w) is in E.
[Example]
The term kclique is also used. So a 1clique is a vertex,
a 2clique is an edge, a 3clique a triangle, and so on.
CLIQUE = { : The graph G has a clique of size k }.
Observe that the elements of CLIQUE are strings that encode
pairs consisting of a graph and a number (given in binary).
CLIQUE belongs to NP. This is because the set of vertices
comprising the clique can be used as a certificate.
We only care about k <= n. The number of edges to test
is (k choose 2) <= n^2.
3. The Subset Sum Problem
The data for a subset sum problem consists of natural numbers
x1,...,xk,t (all >= 1). The decision problem is to ascertain
of some of the xi's can be added together to get exactly t.
(If the list of xi's contains multiple copies of the same number,
we can only use that many copies of it.)
Example: U.S. electoral college. There are k=51 electoral units
(50 states, plus the District of Columbia). The ith unit has
xi electoral votes. The total number of votes is 538 (one for
each member of Congress, plus 3 for D.C.). Is a tie possible?
This is the subset sum problem with t=269. (Apparently, the answer
is yes.)
This problem is in NP. As a certificate, we can take a 01
vector z1,...,zk for which sum_{i=1}^k zi xi = t.
This is an example of an integer programming feasibility problem.
(Integer programming is like linear programming, but with the
variables restricted to take integer values.)
We usually think of the numbers as given in binary.
For unary data, the decision problem is in P. It can be solved
using dynamic programming.
4. Hamiltonian Paths
Let G = (V,E) be a directed graph.
Path = sequence v_0,v_1,...,v_k of vertices for which
(v_i, v_{i+1}) is an edge, for all i=0,...,k1.
[Example]
A Hamiltonian Path from s to t is a path with v_0 to v_1
that uses each vertex exactly once. When s=t, this is called
a Hamiltonian Cycle.
The name honors William Rowan Hamilton, a 19th century Irish
scientist who contributed to mathematics and mathematical physics.
HAMILTONIAN PATH = { : The digraph G has a Hamiltonian
path from s to t }
This is in NP, since the sequence of vertices along the path
can serve as a certificate.
Observe that the set of vertices along the path would not do,
since the only eligible set is V. In distinction to the
subset sum and clique problems, the ordering is crucial.
Lecture 33
Mon 11/20/17
NPcompleteness
The satisfiability problem
(Sipser 7.4)
Review of P and NP
We know that P [ NP but they may be equal. If we can't
resolve this, can we at least identify the "obstruction"
to showing that P = NP?
Review of reductions. Defined in lecture 10/20/17
If f : Sigma* > Sigma* reduces A to B, and is computable
in polynomial time, we write A <=_p B. The notation is
meant to suggest that A is "no harder" than B, from the
perspective of polytime computation.
Transitivity: if A <=_p B and B <=_p C, then A <=_p C.
Theorem: if A <=_p B and B is in P, then A is in P.
Proof: See Sipser, Thm 7.31
Corollary: If A <=_p B and A is not in P, then neither is B.
You should do this yourself as an exercise.
A Key Definition
L [ Sigma* is NPcomplete if:
1) L belongs to NP
2) For every L' in NP, L' <=_p L.
If 2) holds, we call L NPhard. (This terminology is often
extended to include optimization problems, e.g. "the traveling
salesman problem is NPhard.")
A way to picture this: Make an (infinite) directed graph with
nodes the languages in Sigma*, and edges A > B whenever A <=_p B.
languages in NP the rest
* 
... \ 
v 
*>* 
^ 
... / 
* 
NPcomplete means something on the left side of the
border that everything else over there points to.
Review of Propositional Logic
Variables x0, x1, x2, ...
Binary Operators ^ (AND), v (OR) [binary]
Unary Operator ~ (NOT)
A formula is an expression built from the above pieces.
Expressions can be defined using recursion, much as
we defined regular expressions. (Do this)
Example: phi = (x ^ ~y) v (~y v x).
An expression is *satisfiable* if there is a way to set
its variables to T and F so as to make its evaluation equal T.
The settings x = T, y = F satisfy phi.
Satisfiability is decidable  we can use truth tables.
CookLevin Theorem (early 70s):
The language consisting of (suitably encoded) satisfiable
Boolean formulas is NPcomplete.
We will prove this next time.
Lecture 34
Wed 11/22/17
CookLevin Theorem
Proof follows Sipser 7.4
No Class (Tday break)
Fri 11/24/17
Lecture 35
Mon 11/27/17
NPcompleteness of restricted satisfiability problems
Review of CookLevin Theorem (discussed last time)
Conjunctive Normal Form (CNF)
The formula must be an AND of clauses
Each clause is an OR of literals
A literal is a variable or the negation of one
Example: (x v ~y) ^ (~x v y v z)
Intuitively, CNF formulas are hard to satisfy because we must
turn on each clause simultaneously, using the same assignment
to the variables.
We will use the following result without proof.
Every Boolean function can be realized by a formula in
conjunctive normal form. (This is often proved in courses
on logic design, such as CS 352.)
CNFSAT
We can reduce SAT to CNFSAT, as follows:
Let phi be the formula we want to test for satisfiability.
Example: phi = (X ^ ~Y) v (~X ^ Y)
Introduce a new variable for each subformula, and put these
all into a tree:
R (v)
/ \
/ \
/ \
S (^) (^) T
/ \ / \
/ \ / \
x u (~) (~) v y
 
 
y x
Now, assert that the root is 1, and, for each internal node,
its evaluation correctly follows the values on its children.
For the above example, we'd get
psi = R
^ [CNF for R = S v T]
^ [CNF for S = X ^ U]
^ [CNF for T = V ^ Y]
^ [CNF for u = ~ Y]
^ [CNF for V = ~ X]
Then, any satisfying assignment for phi extends to a unique
satisfying assignment for psi. Conversely, any satisfying
assignment for psi restricts to a satisfying assignment for
phi.
This shows SAT <=_p CNFSAT, and since CNFSAT is in NP,
it is NPcomplete.
3SAT
This is just satisfiability of CNF formulas with exactly
3 literals per clause.
The above reduction gives us a CNF formula with 2 or 3
literals in each clause. If a clause has too few variables,
we can introduce a new variable, e.g.
(x v y) becomes (x v y v z) ^ (x v y v ~z).
This preserves satisfiability.
So 3SAT is NPcomplete as well.
Restrictions like this are important because they simplify the
job of mapping satisfiability (in some form) to other problems.
Lecture 36
Wed 11/29/17
NPcomplete graph problems (Sipser 7.5)
To find "new" NPC problems, we take A, known to be NPC,
and B, a set in NP, and show that A <=_p B. This then proves
that B is NPC.
Clique
Let G=(V,E) be an undirected graph
A clique is a set of mutually adjacent vertices.
Ex:
AB
 /
/ 
C D
has the clique {A,B,C}
Typically, large cliques are indicative of some structure, e.g.
in a graph formed from a social network.
The decision problem CLIQUE
Input: Graph G=(V,E), integer k >= 0.
Question: does G have a clique with k vertices?
For the above graph, the answer is yes when k <= 3, and no
when k>=4.
This is in NP. The vertices forming the clique can serve as the
certificate.
CNFSAT <=_p CLIQUE.
Proof: Let phi = C1 ^ C2 ^ ... ^ Cm be an instance of CLIQUE,
where the Ci are clauses with variables chosen from x1,...,xn.
Define G=(V,E) by
V = { (lambda,i) : the literal lambda occurs in Ci }
E = { {(lambda,i),(mu,j)} : i != j, and lambda != ~mu. }
and let k = # of clauses.
Example: phi is (x v y) ^ (~x v ~y) ^ (~x v y)
[Draw the graph  it has 6 vertices and 8 edges.]
k = 3 so we are looking for triangles. One such is
(y,1)
/ \
/ \
(~x,2)(~x,2)
We can set these literals (y, ~x) to 1 and satisfy phi.
In general, when S is a kclique we can set the literals appearing
in S to 1 and satisfy phi. Conversely, if we have a satisfying
assignment for phi, choosing one literal that is set to 1 for
each clause gives us a clique.
So phi is in CNFSAT <==> G has a clique of size equal to the
number of clauses in phi.
We conclude that CLIQUE is NPcomplete.
Independent Set
Defn: In the graph G=(V,E), a set S of vertices is independent
if no two vertices are adjacent.
In the 4vertex graph above, {A,D} is an independent set.
INDEPENDENT SET (decision problem)
Input: Graph G = (V,E), integer k >= 0
Question: Does G have an independent set of size k?
This is in NP, since the independent set can be the certificate.
Fact: A clique in G = (V,E) is an independent set in the complement graph
(V,E'), where E' = { {x,y} : {x,y} is not in E }
For example, the graph above has the complement
A B
\
\
CD
{A,B,C} is independent in this graph and a clique
in the original graph.
This gives us a reduction from CLIQUE to INDEPENDENT SET:
Complement the graph, and keep the parameter the same.
So we conclude that INDEPENDENT SET is NPcomplete.
Vertex Cover
Defn: A vertex cover in G=(V,E) is a set of vertices that
touches all the edges.
Example:
AB
 /
/ 
C D
has the vertex cover {B,C}.
If you like, think of the graph as an art gallery. The management
wants to put guards on the vertices so they can "see" all the edges.
The associated decision problem is
VERTEX COVER
Input: Graph G = (V,E), integer k >= 0
Question: Does G have a vertex cover of size k?
This is in NP. To show it is NPcomplete, we use the following
fact: S is a clique in G=(V,E) iff S' = VS is a vertex cover
in G'=(V,E').
As we saw, the complement of the graph above is
A B
\
\
CD
This has the vertex cover {D}.
The required reduction is (G,k) > (G', Vk).
Packing vs. Covering
Each of these problems has a natural optimization problem.
Since any subset of a clique is a clique, we can try to find
a largest clique. The same is true for independent sets.
On the other hand, any superset of a vertex cover is a vertex
cover, so it is natural to look for a smallest vertex cover.
The optimization versions of CLIQUE and INDEPENDENT SET are
"packing" problems, in which we want to choose as many components
as we can, subject to constraints.
The optimization version of VERTEX COVER is a "covering" problem,
in which we want to choose as few components as we can and still
meet our requirements.
Lecture 37
Fri 12/1/17
Discussion of Homework 3 answers
Lecture 38
Mon 12/4/17
NPcomplete problems from integer programming:
Subset Sum  Sipser 7.5
01 Integer Programming (Feasibility)  Karp 1972
NPcomplete problems we know so far:
SAT (CookLevin Thm)

v
CNFSAT
/ \
v v
CLIQUE 3SAT

v
INDEPENDENT
SET

v
VERTEX
COVER
Background
Linear programming: optimization of a linear function (in R^n)
subject to linear inequalities
Ex: max 2x+y s.t.
x+y <= 1, x,y >= 0.
[draw this]
This can be done in polynomial time (Khachian, 1980s.)
An older algorithm, called the simplex method, is effective
on practical problems but has an exponential worst case.
If a pt P = (x1,...,xn) satisfies the inequalities we call it
*feasible*.
Integer programming: Same as LP, but we only allow points with
integer coords to be used.
This seems more difficult, since the optimum might now be
in the interior of the region defined by the inequalities.
An effective (guaranteed to terminate) algorithm for this
was only found in the 1960s.
Subset Sum
Input: Nonnegative integers a1,...,am (called the pieces)
Target t >= 0 (also an integer)
Question: Does some collection of the pieces sum exactly to t?
Note that we can use pieces multiple times, if we are provided
with enough copies. E.g. 5, 10, 13, 13 with target 26 has
the solution 13+13.
Subset Sum is in NP (the pieces used form the certificate)
Reduction from 3SAT: As in Sipser 7.5
This result isn't as strong as we'd like since it relies on
the construction of huge numbers.
01 Integer Programming (Feasibility)
Input: Integer matrix C, integer column vector d
Question: Is there a 01 column vector x such that Cx >= d?
This is in NP (x is the certificate). To show it's NPC we
show that CNFSAT reduces to it.
Let phi be a CNF formula, e.g. (x v y') ^ (x' v y v z')
We will have one row of C for each clause. There is a coefficient
of +1 for each variable used positively, 1 for each variable
used negatively, and 0 for variables not used. The right
hand side for this row is 1  [# of variables that are complemented].
Here is the system for our example:
C d
 
v v
[ 1 1 0] [x] = [ 0]
[1 1 1] [y] [1]
[z]
To justify the choice of right hand side, note that
the choice x1=...=xr=1, x_{r+1} = ... = x_{r+s} = 0
satisfies the clause in which the first r variables occur
positively and the second s variables occur negatively
iff
sum_{i=1}^r xi + sum_{j=r+1}^{r+s} (1  xj) >= 1
Rearranging this produces
sum_{i=1}^r xi + sum_{j=r+1}^{r+s} (xj)
>= 1  [# of complemented variables]
which is exactly what the system gives for the row coming
from that clause.
So the 01 vector x satisfies phi iff Cx >= d.
Lecture 39
Wed 12/6/17
Search Problems vs Decision Problems (Sipser Exs. 7.387.40)
NPcompleteness deals with decision problems, instances of which
have yes/no answers.
Example: If phi(x1,...,xn) is a Boolean formula, we ask
if there is a sequence of ai's in {0,1} for which
phi(a1,a2,...,an) = 1.
Sometimes this is not enough. We'd like to construct the
satisfying assignment.
Thm: If P=NP, there is a polytime algorithm that, given phi,
computes a satisfying assignment to phi or reports that
none exists.
Proof: The assignments form a binary tree. E.g. for n=3 the tree is
*
x1 = 0/ \1
/ \
* *
x2 = 0/ \1 0/ \1
/ \ / \
* * * *
x3 = 0/ \1 0/ \1 0/ \1 0/ \1
and the rightleftleft branch is a1=1, a2=a3=0.
If P=NP, there is a polytime algorithm that checks satisfiability.
We use this to guide the search for an assignment.
Input: Boolean formula phi(x1,x2,...,xn)
Ask A if phi is in SAT (only needed at top level)
If no, report "unsatisfiable" and stop
If yes, ask A if phi(0,x1,...,xn) is in SAT.
If yes, proceed recursively with phi(0,x2,...,xn)
If no, proceed recursively with phi(1,x2,...,xn)
The base case is phi(a1,a2,...,an), which is guaranteeed
satisfiable.
This algorithm calls A n+1 times, so it is poly time if A is.
Notes: 1) This style of algorithm is often called selfreduction.
2) The algorithm actually finds the lexicographically least
satisfying assignment.
Selfreduction for finding a largest clique
Review of cliques in undirected graphs (11/29/17).
It's natural to try to find a largest clique.
Thm: Assume P=NP. (So the presence of cliques of a given size can
be tested for in poly time.) Then there is a polytime alg that
finds maximum size cliques.
Pf: Let G = (V,E) with n = V >= 1.
Test for cliques of size n, n1, ... until k*, the size of the
largest clique in G, is known. Let r = nk*.
Then, there is a sequence v1,v2,...,vr in V whose removal leaves
a clique of size k*.
We use self reduction to find this removal sequence.
If r=0, return G (it is already a k*clique).
Else
for v in V
Test if Gv has a k*clique
If yes, proceed recursively with Gv.
If no, continue with the next v.
Gv is the graph that results by the removal of v and every
edge it is incident to.
Invariant: G has a clique with k* vertices.
The number of vertices decreases by 1 with each recursive call,
so the algorithm terminates with an optimal (largest) clique.
How many times do we use the clique tester: At most
n times (to find k*)
+ (nk*) x n times (for testing vertices for removability)
<= n^2 + n = O(n^2)
So the procedure to construct a max clique runs in poly time.
You might want to think about which clique (or, equivalently, what
removal sequence) is produced by this method.
Lecture 40
Fri 12/8/17
Randomized Algorithms (Sipser 12.2)
The Complexity Class RP
Recall that if L is in NP, there is an M in P and a polynomial p()
such that:
x in L <==> exists y [ y <= p(x) and in M ]
We think of M as checking the alleged certificate y.
If the certificates are plentiful enough, we might be able to just
guess one.
Informal example: f(x1,...,xn) = poly in n variables (with integer
coeffs) that is not identically zero. Then, in
a sense we won't make precise, for almost every
(a1,a2,...,an) in R^n, f(a1,a2,...,an) != 0.
[Draw a picture for n=2.]
The idea is that the locus of f=0 is lower dimensional
so if we throw a dart at the picture, we have almost
no chance of hitting it.
RP = the NP languages for which certificates are easy to guess
Formally, L is in RP when there are M and p() as above, such that:
x in L ==> for >= 1/2 of the y with y <= p(x), in M
x not in L ==> for none of the y with y <= p(x), in M
Note the asymmetry, just as in the definition of NP. Tne difference
is that we have required the set of certificates to be uniformly
"fatter."
We have P [ RP [ NP. It is not known if any of these containments
are proper, but if any of them are, P != NP.
The "R" in RP stands for "random." The definition suggests a
randomized procedure to test membership in L: Choose a random string
from {0,1}^p(x), and say "yes" if M accepts . This has a
onesided error property: An answer of "yes" can be trusted absolutely,
but an answer of "no" only provides evidence of nonmembership.
Is Anything in RP?
Recall
COMPOSITES = { n>1 (in binary) : n is a composite number }
= { n>1 : exists d, 2 <= d <= n1 s.t. n mod d = 0 }
Using this definition, the density of certificates (nontrivial divisors)
is too small to put COMPOSITES in RP. Indeed, if n = pq with p and
q primes, only 2 out of n2 d's will work.
Here is a better compositeness test (MR stands for Miller and Rabin,
who wrote about it in the 1970s):
Input n (odd, > 1)
Let n1 = m * 2^k, with m odd.
Choose a, 1 <= a <= n1, at random
Compute (using arithmetic mod n)
x0 = a^m
x1 = {x0}^2 (= a^{2m})
x2 = {x1}^2 (= a^{4m})
...
xk = {x_{k1}}^2 (= a^{n1})
If xk == 1, and the last xi that is not 1 (if any) is == 1
say "prime"
If not, say "composite."
This is a polynomialtime test, if we implement the power by
repeated squaring.
Correctness relies on two results that we will not prove here.
1) If n is prime, than a^n == a (mod n). (Use induction on a
and the binomial theorem.) Consequently, if a is relatively
prime to n, it can be cancelled from the congruence, to
yield a^{n1} == 1 (mod n). This is sometimes called
Fermat's Little Theorem.
2) If n is prime, and x^2 == 1 (mod n), then x == +1.
(To prove this, factor x^2  1 and then note that if a prime
divides a product, it must divide one of the factors.)
It follows from 1) and 2) that when n is prime, MR will correctly
respond that n is prime.
The corresponding assertion for composite n is more difficult.
The intuitive idea is that the squaring chain provides a sort of
obstacle course that the base a must pass, and that most bases
are unlikely to pass it. Formally:
Theorem: When n is composite, Pr[MR says "prime"] <= 1/2.
This, with a stronger constant of 1/4, was proved by Rabin.
Sipser (Thm 10.9) proves a bound of 1/2, for the case n=pq.
An extension to general n appears in the algorithms text of Cormen,
Leiserson, and Rivest.
To summarize: For the MR test,
n composite ==> Pr[MR says "composite"] >= 1/2
n prime ==> Pr[MR says "prime"] = 0
So COMPOSITES is in RP. (Actually, more is true. It was proved
in 2003 that COMPOSITES is actually in P.)
From a practical point of view, an error probability of 1/2 is
unsatisfactory. But we can try the test t times, choosing a
independently and uniformly each time. Since we err only if
n is composite and all t tests say "prime", we have
Pr[ an error is made ] <= (1/2)^t > 0 rapidly in t.
Lecture 41
Mon 12/11/17
Approximation Algorithms (Sipser 12.1)
Many NPcomplete problems come from combinatorial optimization.
In these we aim to select the "best" configuration from
a finite, but large, number of choices.
Best known of these is probably the Traveling Salesman Problem
(TSP). We start with n cities, a list of the (n choose 2)
distances between them, and try to find a tour (cycle hitting
each city once) of minimum total distance. The obvious solution
is to test all (n1)! tours. The decision version of TSP
("is there a tour of cost <= B"?) is NPcomplete.
Idea: If we can't solve the optimization problem exactly,
can we at least find an efficient algorithm with guaranteed
performance, compared to the optimum? (It is not obvious how to
do this, since we don't know the cost of the optimum.)
Investigation of this question is still ongoing.
Vertex Cover
G = (V,E) undirected.
We seek a smallest S [ V that "hits" all vertices.
A simple "greedy" procedure A (for "avaricious")
S = empty set
Initially all edges are unmarked.
While there is an edge sharing no vertex with a marked edge
Choose one and mark it
Add both of its ends to S.
Example:
AB optimal VC = {B,C}, size 2.
 /
 / 
/ 
CD
pick {A,B} and mark it
A and B are now in S
AxB
 /
 / 
/ 
CD
only can pick {C,D}
S is now A,B,C,D
AxB
 /
 / 
/ 
CxD
done! computed VC has size 4
For this example computed VC = 2 optimum VC
Theorem: The cover produced by A is no less than twice the
size of the optimum.
Proof: Let M be the set of marked edges, S the cover produced
by A. Then S = 2M (each marked edge contributed 2 vertices
to S).
S is a vertex cover (any uncovered edge would have been marked).
Since the best VC covers all edges, it certainly covers the
marked edges, and therefore
optimum VC >= [# of marked edges] = M = S/2.
(Why the inequality? By design, the marked edges don't
meet each other, and we need a vertex to cover each.)
So S <= 2optimum VC.
The example of n disjoint edges shows that 2 is the best possible
approximation factor for this procedure.
Maximum Cut
G = (V,E) as before.
A *cut* is a partition of V into two disjoint sets S,T.
The size of a cut is the number of crossing edges (one end in
S, the other in T).
We seek a cut of maximum size.
Example: This graph has a cut of size 4, namely
{A,D} and {B,C}.
AB
 /
 / 
/ 
CD
You should verify that for s=0,2,3,4 (and only these)
there are cuts of size s.
Here is a simple "hill climbing" method HC for finding a large
cut:
Start with any cut V = S u T.
If there is a v in V whose "defection" improves the cut,
move v from S to T (or vice versa).
Repeat until no such improvements are possible.
On the example:
AB S = {A}, T = {B,C,D}. Cut size = 2.
 /
 / 
/ 
CD
If B (or C) defects from T to S, the
cut size becomes 3. If D defects,
we get a cut of size 4. Although not
required by the procedure, let's choose D.
*
A*B Final cut: S = {A,B}, T = {C,D}.
* /* (Best possible.)
*  / *
/ *
C*D
*
*
Theorem: For HC, final cut >= (1/2) optimum cut.
Proof: For each v,
[# of crossing edges hitting v] >= [# of noncrossing edges hitting v].
(If this weren't true HC would have improved the cut.) Sum on v to get
2 * [# of crossing edges] >= 2 * [# of noncrossing edges].
(Each edge is counted twice, once for each of its ends.)
After canceling the 2's we find at least as many crossing edges
as noncrossing edges, so
[# of crossing edges] >= E/2 >=  optimum cut  / 2.
Lecture 42
Wed 12/13/17
Homework problems and review
Fri 12/15 10:05 AM, 168 Noland
Final exam
Please bring two 8.5 x 11 sheets of notes (both sides)
Do not bring calculators, cell phones, computers, ...