\documentclass[11pt]{article}
\usepackage{tikz,subfig}
\include{lecture}
\begin{document}
\lecture{17}{11/03/2011}{Space-Bounded Derandomization}{Chetan Rao}
%\draft
In the previous lecture, we discussed \textit{Expanders} and also
two methods to realize \textit{error reduction} using expanders, namely -
\begin{itemize}
\item \textit{Deterministic Error Reduction} - uses a deterministic
algorithm ($A_d$). The number of successive runs of $A_d$ required to reduce
the error to $\epsilon$ is polynomial in $(1/\epsilon)$ which is
larger than performing independent trials ($O(log(1/\epsilon)$).
\item \textit{Randomized-efficient Error Reduction} - uses an algorithm ($A_r$)
that requires additional random bits. The number of runs of $A_r$ required
to reduce the error to $\epsilon$ is logarithmic in $(1/\epsilon)$. In addition
to the random bits used by $A_r$, the procedure requires extra random bits
logarithmic in $(1/\epsilon)$.
\end{itemize}
\noindent The randomized result was obtained by viewing random bit
sequences as vertices of an expander graph and performing a
random walk upon choosing a start vertex uniformly at random,
and casting a majority vote. The error (probability of majority
vote resulting in error) exponentially decreases with the length
of the random walk. We also saw a stronger statement based on
Chernoff bounds for random walks expander graphs.
In this lecture, we will discuss space-bounded derandomization.
We construct a Pseudorandom Generator (PRG) for space-bounded computations
based on expanders. The idea is to decrease the required number of seed
(random) bits and simulate the algorithm on all possibilities of seed
values.
The following section (Section \ref{17:sec:PRG}) defines Pseudorandom
Generators (PRGs) and its various parameters. Section \ref{17:sec:uses}
outlines the use of pseudorandom generators in complexity theory.
Section \ref{17:sec:SBD} concludes with the construction of an efficient
PRG for $\BPL$ that has seed length of $O((\log n)^2)$.
\section{Pseudorandom Generators} \label{17:sec:PRG}
\begin{definition}
An $\varepsilon$-PRG for a class $\mathcal{A}$ of algorithms is a
sequence $(G_r)_{r=1}^{\infty}$ of deterministic procedures where
$G_r: \{0,1\}^{\ell(r)} \rightarrow \{0,1\}^{r}$ such that :
\begin{equation}
\label{17:eqn:PRG1}
(\forall A \in \mathcal{A})\hspace{3 pt}(\forall^{\infty} x)\hspace{3 pt}
\|A(x,U_r) - A(x,G_r(U_{\ell(r)}))\|_1 < 2\varepsilon,
\end{equation}
where $r$ is the number of random bits $A$ uses on $x$
(and is also the length of the output of $G_r$),
$U_n$ denotes $n$ bits taken from the uniform distribution,
$\ell(r)$ is the seed length (discussed below) and
$\forall^{\infty}$ means ``for all except finitely many.''
\end{definition}
Note that if $A$ is a decision algorithm, Equation \ref{17:eqn:PRG1}
is equivalent to:
\begin{equation}
(\forall^\infty x)\hspace{5 pt}
|\text{Pr} [A(x, U_r) \text{ accepts}] -
\text{Pr} [A(x,G_r(U_{\ell(r)})) \text{ accepts}]|
< \varepsilon
\end{equation}
There are three important parameters to the above definition.
\begin{itemize}
\item \textit{Seed length $\ell(r)$}: the number of random bits required as
input to the pseudorandom generator to generate an pseudorandom bit
sequence of length $r$. We want this quantity to be small.
\item \textit{Error $\varepsilon$}: the deviation from the original randomized
algorithm. For example, if the original algorithm has a probability
of error $\frac{1}{3}$ and $\varepsilon = \frac{1}{6}$, the probability
of error for the new algorithm will be $< \frac{1}{2}$ (by triangle
inequality). We want $\varepsilon$ to be small, but it suffices that it
be ``small enough'' given the error reduction techniques discussed in
previous lectures.
\item \textit{Complexity}: measured in terms of the output length $r$.
We want PRGs with low complexity so that using them to generate random
bits does not increase the total cost of running a randomized algorithm
or its deterministic simulation using the PRG by too much.
\end{itemize}
\section{Uses of PRGs} \label{17:sec:uses}
Pseudorandom generators are used to generate a long pseudorandom string from
a short uniformly random seed, and allows us to reduce the amount of
randomness required to run a randomized algorithm. As a side effect they
can reduce the complexity of a deterministic simulation of a
randomized algorithm, by explicitly computing the probability of
acceptance over the set of all possible PRG seeds $\ell(r) < r$.
Namely, if $G$ is a $\frac{1}{6}$-PRG for $\BPTIME(t)$ computable in
$\DTIME(t')$, then
\begin{equation}
\BPTIME(t) \subseteq \DTIME(2^{\ell(t)} \cdot (t'(t) + t))
\end{equation}
This is by cycling over all random seeds, running the algorithm on the
output of $G$ for each, and outputting the majority answer.
For each seed value, the random string must be generated, taking
$t'(t)$ time, and the algorithm must be run, for an additional
$t$ steps. Since this enumerates all possible seeds and the
cumulative error is $< \frac{1}{2}$, a majority vote
provides the correct answer.
Similarly, if $G$ is a $\frac{1}{6}$-PRG for $\BPSPACE(s)$ computable in
$\DSPACE(s')$, then
\begin{equation}
\BPSPACE(s) \subseteq \DSPACE(\ell(2^s) + s'(2^s) + s)
\end{equation}
Given a PRG computable in polynomial time $t'$ with logarithmic seed
length $\ell(t)$, $\BPP \subseteq \P$. Similarly given a PRG with logarithmic
seed length that runs in log space, $\BPL \subseteq$ L. Such pseudorandom
generators are not known to exist, but this is an approach used
to attempt to prove the containments.
\section{Space-Bounded Derandomization} \label{17:sec:SBD}
Although we do not yet know how to construct a $\log$ space computable
PRG for $\BPL$ with $O(\log r)$ seed length, there are nontrivial constructions
approaching this goal. We now present a construction based on expanders that
yields a PRG for $\BPL$ with seed length $O((\log n)^2)$
\begin{theorem}
\label{17:th:prg}
There exists an $\varepsilon$-PRG for $\BPSPACE(s)$ with
\begin{equation}
\ell(r) = O(\log{\frac{r}{s}} \cdot (s + \log{\frac{1}{\varepsilon}}))
\end{equation}
computable in space $O(\ell(r))$.
\end{theorem}
\begin{corollary} \label{17:cor:1}
There is a $\frac{1}{6}$-PRG for BPL with $\ell(r) = O(\log^2{r})$
and computable in space $O(\log^2{r})$.
\end{corollary}
Corollary \ref{17:cor:1} immediately implies that
$\BPL \subseteq \DSPACE(\log^2{n})$. Earlier, it was already known that
$\BPL \subseteq$ $\NC^2$, but this theorem shows it can be done with PRGs
as well. A variation of the PRG can be used in a different way to show that
$\BPL\subseteq\DSPACE(\log^{1.5}{n})$ which is the best known bound.
The idea behind the proof of Theorem \ref{17:th:prg} is dividing a
space-bounded randomized computation into $2^k$ phases. Each phase uses
$r'$ random bits, where $r' = \frac{r}{2^k}$. Since the operation of this
machine is bounded by space $s$, $s$ bits must pass from phase to phase.
By pairing these blocks and using an expander to produce the random
bits for each block, we can reduce the overall level of randomness
used by the machine. Consider an expander with degree $d$ and $2^{r'}$
vertices. We let $G_{2r'}$ produce $2r'$ pseudorandom bits by
choosing a vertex in the expander at random, then moving to a random
neighbor (this is equivalent to selecting an edge at random and using
its endpoints). $G_{2r'}$ requires a seed length of $r' + \log{d}$ random bits
for each block pair; if this is $< 2r'$ we have reduced the amount of
randomness. This process is diagrammed in \ref{17:fig:blocks}.
\begin{figure}
\centering
\input{figs/17.blocks.1.pstex_t}
\vspace{24 pt}
\input{figs/17.blocks.2.pstex_t}
\caption{Dividing computation into blocks, with $s$ bits passing between
each block. The original computation is shown above, and below it is
shown with random bits of adjacent blocks coming from picking adjacent
vertices in an expander.}
\label{17:fig:blocks}
\end{figure}
If the expander used is good enough, the output from the modified
block pair will not differ greatly from the output of the original.
We rely on the expander mixing lemma to prove this.
Call the distribution of input (output resp.) states to a block pair
$S_{in}$ ($S_{out}$)
and the random inputs to the pair $\rho_{\text{left}}$ and $\rho_{\text{right}}$.
There are two distributions to consider for ($\rho_{\text{left}}$,$\rho_{\text{right}}$):
\begin{itemize}
\item Random: $U_{2r'}$ - the original randomized input.
\item Pseudo-random: $G_{2r'}(U_{r'},U_{\log{d}})$ - output from our expander.
Note that $G_{2r'}(\rho,\sigma) = (\rho, \sigma$-th neighbor of $\rho$ in
the expander).
\end{itemize}
The following lemma bounds the difference in output distribution between
the two scenarios.
\begin{lemma}
\label{17:lem:key}
For any distribution $S_{in}$ on $s$ bits where $\lambda$ is the
second largest eigenvalue of the expander,
\begin{equation}
||S_{out}(S_{in}, U_{2r'}) - S_{out}(S_{in}, G_{2r'}(U_{r'},U_{\log d}))||_1
\leq 2^s \cdot\lambda
\end{equation}
\end{lemma}
We prove this lemma in the next lecture, but for now we finish the description
of the PRG and the proof of its properties using this lemma.
We first want to bound the difference in output
distribution of running the algorithm on purely random bits versus running
the algorithm by grouping pairs of blocks and producing the random bits
from the expander.
Consider hybrid distributions, where $D_i$ is
the distribution formed by using the random distribution for the
first $2i$ blocks, then switching to the pseudo-random distribution for
the remainder. Thus $D_{2^{k-1}}$ is perfectly random, and
$D_0$ is entirely pseudo-random. The difference between these
two distributions is the difference between the randomized
algorithm and our pseudorandom version. Also one key point to note is that
the difference between any two consecutive hybrid distributions is
bounded as follows -
\begin{claim} \label{17:claim:1}
$\|D_i - D_{i-1}\|_1 \leq 2^s \cdot\lambda$.
\end{claim}
\proof The hybrid distributions $D_i$ and $D_{i-1}$ differs only in the
$i^{th}$ position with purely random and pseudorandom sources respectively.
From the key lemma, we can bound the error to $\leq 2^s \cdot \lambda$ for
the difference in the output of the $i^{th}$ level. If we prove the same error
bound for the outputs of $(i+1)^{th}$ level then we are done (by induction).
This is true as for any two binary input distributions $X$ and $Y$
($|X|=|Y|=n$) and a binary deterministic function $f$, if -
\begin{align*}
||X-Y||_1 \leq \delta & \Rightarrow \Pr[ \sum_i (x_i - y_i)] < \delta/n \\
& \Rightarrow \Pr[ \sum_i (f(x_i) - f(y_i))] < \delta/n \\
& \Rightarrow ||f(X)-f(Y)||_1 \leq \delta
\end{align*}
\noindent Thus, from the above equations and Lemma \ref{17:lem:key}, we have that
$\|D_i - D_{i-1}\|_1 \leq 2^s \cdot\lambda$. \hfill{$\Box$} \vspace*{2mm}
Hence, from the triangle inequality and Claim \ref{17:claim:1} we find:
\begin{equation}
\label{17:eq:difference}
\|D_{2^{k-1}} - D_0\|_1 = \left\| \sum_{i=0}^{2^k-1} D_i - D_{i-1} \right\|_1
\leq \sum_{i=1}^{2^{k-1}} \|D_i - D_{i-1}\|_1
\leq 2^{k-1} \cdot 2^s \cdot \lambda
\end{equation}
This provides a bound on the error introduced by the first step
of the derandomization. The amount of randomness has been reduced
from $2r'$ for each block pair to $r'+ \log{d}$, a savings of roughly
$r'$ as $d$ is constant. This is not a large savings but note that
we have reduced our original block chain to an easier instance of the
same problem - one with $2^{k-1}$ blocks, each taking $r'+ \log{d}$
random bits. These new computational blocks can be paired, with the
$r' + \log{d}$ random bits being generated by the expander as
described above. Note that the expander we use now has more vertices.
Pairing blocks recursively (as shown in Figure \ref{17:fig:three})
results in a PRG with the following parameters:
\begin{itemize}
\item $\varepsilon < 2^k \cdot 2^s \cdot \lambda$. This bound is
found by summing \eqref{17:eq:difference} over all levels of recursion.
\item $\ell(r) = r' + k \cdot \log{d}$. Each reduction requires an
additional $\log{d}$ random bits.
\item $O(\ell(r))$ space complexity. To compute a given output bit of the
PRG, we must compute neighbor relations in a series of expanders. Each
of these can be computed in linear space, so the amount of space used
at the topmost level dominates. Hence the total space used by the PRG
is $O(\ell(r))$.
\end{itemize}
\begin{figure}
\centering
\input{figs/17.blocks.3.pstex_t}
\caption{Recursively pairing blocks and applying the expander. $k$
expansions cover the entire computation.}
\label{17:fig:three}
\end{figure}
As defined above $r$ and $r'$ are related through $r' = \frac{r}{2^k}$.
The important terms in the parameters for this PRG are $\lambda$ and
$d$. Any constant degree expander will have a constant $\lambda$, which
will eventually be overshadowed by $2^s$, resulting in
$\varepsilon > 1$. To grow $\lambda$ along with $s$ we begin with a
constant-degree constant-$\lambda$ expander and raise it to the $t$-th
power. Allowing multi-edges in this graph results in a simple expression
of the new degree and the second largest eigenvalue in absolute value, namely
$\lambda(G^t) = (\lambda(G))^t$, and $d(G^t) = (d(G))^t$. Since we want
the error of our PRG to be less than $O(\varepsilon)$ we must satisfy
\begin{equation}
2^{k+s} \cdot \lambda_0^t < O(\varepsilon)
\end{equation}
We rearrange to derive the value of $t$ that must be used, and plug in
$d^t$ as the degree to determine the seed length
\begin{align}
&t = \Theta(k + s + \log{\frac{1}{\varepsilon}}) \\
&\ell(r) = \frac{r}{2^k} + k \cdot \Theta(k + s + \log{\frac{1}{\varepsilon}})
\cdot \log{d}
\end{align}
We know that $k \leq s$, since there are at most $2^s$ blocks
in our construction and each block uses at least one random bit.
Therefore, seed length can be given by -
\begin{equation}
\ell(r) = \frac{r}{2^k} + k \cdot \Theta(s + \log{\frac{1}{\varepsilon}})
\end{equation}
The second term grows with $k$ while the first descends. We have remarked
before that setting the two terms equal and solving for $k$ gives a result
that is minimal to within constant factors.
We use $k = \log{\frac{r}{s}}$. The seed length becomes
\begin{equation}
\ell(r) = O(\log{\frac{r}{s}} \cdot (s + \log{\frac{1}{\varepsilon}}))
\end{equation}
finishing the proof of Theorem \ref{17:th:prg}. All that remains is to prove
Lemma 1.
\medskip
Notice that in our construction each block was treated as a black
box. The only connection between blocks was the $s$ bits representing
the state of the machine. The algorithm relies on only these $s$
bits being transmitted between blocks, but places no limit on the
computations performed by each block individually. This PRG therefore
works for any algorithm which can be divided into $2^k$ blocks with
limited communication from block to block, even if each block
uses unbounded space.
\section{Next Lecture}
In the next lecture, we will prove the Key Lemma (Lemma \ref{17:lem:key})
using the Expander Mixing Lemma. We will also look into similar results in
the time bounded setting.
\section{References}
%\noindent M. Furst, J.B. Saxe, M. Sipser. \emph{Parity,
%circuits, and the polynomial-time hierarchy.} Mathematical Systems
%Theory, 17(1), p.13-27, 1984..\\
\noindent Michael E. Saks and Shiyu Zhou.
\emph{$\BP_{H}\SPACE(S) \subseteq \DSPACE(S^{3/2})$.}
Journal of Computer and System Sciences, 58(2):376–403, 1999.
\section*{Acknowledgements}
In writing the notes for this lecture, I perused the notes by Jake Rosin
for Lecture 14 from the Spring 2007 offering of CS~810, and the notes by
Amanda Hittson for Lecture 15 from the Spring 2010 offering of CS~710. The
figure credits goes to Jake Rosin.
\end{document}