\documentclass[11pt]{article}
\include{lecture}
\usepackage{subfigure}
\begin{document}
\lecture{2}{9/7/10}{From Classical to Quantum Model of Computation}{Tyson Williams}
Last class we introduced two models for deterministic computation. We discussed Turing Machines, which are models of sequential computation, and then families of uniform circuits, which are models of parallel computation. In both models, we required the operators to be physically realizable and imposed a uniformity condition; namely, that the state transitions could be described by a finite set of rules independent of the input.
In this lecture, we shall develop a model for probabilistic computation, from which our model for quantum computation will follow.
\section{Model for Probabilistic Computation}
\subsection{Overview}
Probabilistic computers can use randomness to determine which operations to perform on their inputs. Thus, the state at any given moment and the final output of a computation are both random variables. One way to represent a state $\ket{\psi}$ of dimension $m$ is as a probability distribution over base states, $\ket{s}$ for $s \in \{0,1\}^m$,
\begin{align*}
\ket{\psi} = \sum_{s \in \{0,1\}^m} p_s \ket{s} \qquad 0 \leq p_s \leq 1, \sum_{s \in \{0,1\}^m} p_s = 1
\end{align*}
where $p_s$ denotes the probability of observing base state $\ket{s}$. These state vectors have an $L_1$ norm of 1.
Since the output is now a random variable, we require a computation to provide the correct answer with high probability. That is, given relation $R$, input $x$, and output $y$,
\begin{align*}
(\forall x) \Pr \left[ (x,y) \in R \right] \geq 1-\epsilon
\end{align*}
where $\epsilon$ denotes the probability of err. If $\epsilon$ were smaller than another bad event, such as the computer crashing during the computation, then we are satisfied. In contrast, $\epsilon = 1/2$ is no good for decision problems, because the algorithm can just flip a fair coin and return the result. If $R$ is a function, then $\epsilon = 1/3$ is good enough because we can rerun the algorithm a polynomial number of times, take the majority answer, and achieve exponentially small error via the Chernoff bound. In fact, any $\epsilon$ bounded away from 1/2 will suffice.
\subsection{Local Operations}
In the probabilistic setting, a transition operator can depend on probabilistic outcomes (i.e., coin flips). Thus, the local effect of a transition operator can be described as the multiplication of a (left) stochastic matrix $T$ with a state vector $\ket{\psi}$,
\begin{align*}
(\forall j) \sum_i T_{ij} = 1 \qquad 0 \leq T_{ij} \leq 1.
\end{align*}
We interpret $T_{ij}$ as the probability of entering state $i$ after applying $T$ to state $j$. As before, the state after an operation is $T\ket{\psi}$ because
\begin{align*}
\left(\ket{\psi_{\text{after}}}\right)_i = \left(T\ket{\psi_{\text{before}}}\right)_i = \sum_j T_{ij} \left(\ket{\psi_{\text{before}}}\right)_j.
\end{align*}
The matrix for a deterministic operator, which is an all zeros matrix except for a single 1 per column, is just a special case of a stochastic matrix. See Figure \ref{02:fig:coin_flips} for examples of stochastic matrices for a fair coin flip and a biased coin flip.
\begin{figure}[ht]
\centering
\subfigure[Fair coin flip]{$\Qcircuit @C=1em @R=.7em {& \gate{C} & \qw} = \begin{bmatrix} \frac{1}{2} & \frac{1}{2}\\ \frac{1}{2} & \frac{1}{2} \end{bmatrix}$}
\qquad
\qquad
\subfigure[Biased coin flip]{$\Qcircuit @C=1em @R=.7em {& \gate{C_p} & \qw} = \begin{bmatrix} p & p\\ 1-p & 1-p \end{bmatrix}$}
\caption{Coin flip gates}
\label{02:fig:coin_flips}
\end{figure}
The following exercise shows that, in a strong sense, coin flips are the only genuinely probabilistic operations we need.
\begin{exercise}
Given a probabilistic circuit, $C$, of size $t$ and depth $d$, there is an equivalent probabilistic circuit $C'$ of size $O(t)$ and depth $O(d)$ such that the first level of $C'$ consists only of biased coin flips and all other levels of $C'$ are deterministic. Here, equivalent means that for any input $x$ the distribution of outputs $y$ is the same for $C$ and $C'$.
\end{exercise}
\subsection{Uniformity Condition}
We can think of a deterministic Turing Machine as having a Boolean measure of validity associated with every possible transition between configurations. A 1 signifies a completely valid transition, while a 0 denotes a completely invalid transition:
\begin{align*}
\delta: (Q \backslash \{q_{\text{halt}}\} \times \Gamma) \times (Q \times \Gamma \times \{L,P,R\}) \rightarrow \{0,1\}
\end{align*}
A probabilistic TM will have a probability of validity associated with every transition:
\begin{align*}
\delta_p: (Q \backslash \{q_{\text{halt}}\} \times \Gamma) \times (Q \times \Gamma \times \{L,P,R\}) \rightarrow [0,1]
\end{align*}
It is important to note that, in order to satisfy the uniformity condition, these probabilities must be easily definable. In particular, we require the $n^{\text{th}}$ bit of any bias be computable in time $poly(n)$. If we did not impose this constraint, we could use the probabilities to encode information, such as ``$0.$'' followed by the characteristic sequence of the halting language. To decide if the $n^{\text{th}}$ Turing machine halts, we could repeatedly sample from such a biased coin flip gate in order to estimate $p$. After we are confident in the value of the $n^{\text{th}}$ bit, we return that bit, thereby solving the halting language.
This uniformity condition allows for an infinite number of basic operations. If this is a problem, then we can also consider having just the fair coin flip gate as the only source of randomness. In this case, we would use this gate to get good estimates for any biased coin flips gates that we need. However, we would also have to relax the universality condition. Instead of being required to sample exactly from the distribution of any probabilistic circuit, we would only be required to sample approximately. We will discuss this notion of universality in the next lecture.
\subsection{A More Abstract View}
We define a \emph{pure state}, $\ket{\psi}$, as a convex combination of base states, $\ket{s}$. That is, $\ket{\psi} = \sum_s p_s\ket{s}$, where $p_s$ is the probability of being in base state $\ket{s}$, $\sum_s p_s = 1$, and $0 \leq p_s \leq 1$. A \emph{mixed state}, is a discrete probability distribution over pure states.
We can think of the probabilistic model as allowing two operations on any pure state $\ket{\psi}$.
\begin{enumerate}
\item Local, stochastic transformations, as specified by probabilistic transition matrices. These are $L_1$ preserving.
\item A terminal observation, which is a probabilistic process that transforms a mixed state into a base state after all transformations have been applied. That is, $\ket{\psi} \rightarrow \ket{s}$, where the probability of achieving $\ket{s}$ is $p_s$.
\end{enumerate}
\begin{exercise}
What happens if we allow observations at any point in time? That is, in between transitions? A motivation, consider the problem of composing two procedures, both of which observe their respective states after their transformations are complete?
\end{exercise}
\section{Model for Quantum Computation}
\subsection{Overview}
As with the probabilistic model, the state of a system is described by a super-position of base states, but here:
\begin{enumerate}
\item the coefficients are complex numbers (usually denoted by $\alpha$ because it stands for an amplitude)
\item vectors have an $L_2$ norm of 1 (i.e., $\sum_s |\alpha_s|^2 = 1$)
\end{enumerate}
A \emph{qubit} is the quantum analog of a classical bit and satisfies the above two conditions. The interpretation is that $\Pr [\text{observing } \ket{s}] = |\alpha_s|^2$. Note, this is a valid interpretation because the above defines a valid probability distribution.
\subsection{Local Operations}
For consistency of interpretation, global operations have to preserve the 2-norm. It is necessary and sufficient that local operations are unitary transformations. That is,
\begin{align*}
T^*T = I = TT^*,
\end{align*}
where $T^*$ is the conjugate transpose\footnote{The notation $T^*$ for the conjugate transpose is more common in linear algebra while $T^\dagger$ is more common in quantum mechanics.} of $T$. Unitary matrices have a full basis of eigenvectors with eigenvalues $|\lambda| = 1$. Since the determinant is the product of the eigenvalues, $|\det| = 1$ as well.
\begin{example}
Does the classical ``coin-flip'' transformation describe a valid quantum gate? No, because its transistion matrix is not unitary. It does not even have full rank.
\end{example}
The quantum analog of a fair coin flip is the Hadamard gate. It is described by the following matrix, which \emph{is} unitary:
\begin{align*}
\Qcircuit @C=1em @R=.7em {& \gate{H} & \qw}
= \frac{1}{\sqrt{2}}
\begin{bmatrix}
1 & 1\\
1 & -1
\end{bmatrix}
\end{align*}
If we apply the Hadamard gate to base states, we get the intuitive ``fair coin'' result. That is, regardless of which base state we are in, we end up with 50\% probability of being in base state $\ket{0}$ and 50\% probability of being in base state $\ket{1}$:
\begin{align*}
H(\ket{0}) &= \frac{1}{\sqrt{2}}\ket{0} + \frac{1}{\sqrt{2}}\ket{1}\\
H(\ket{1}) &= \frac{1}{\sqrt{2}}\ket{0} - \frac{1}{\sqrt{2}}\ket{1}
\end{align*}
What if we apply the Hadamard gate to a super-position of base states?
\begin{align*}
H\left(\frac{1}{\sqrt{2}}\ket{0} + \frac{1}{\sqrt{2}}\ket{1}\right) &= \frac{1}{2}(\ket{0} + \ket{1}) + \frac{1}{2}(\ket{0} - \ket{1}) = \ket{0}\\
H\left(\frac{1}{\sqrt{2}}\ket{0} - \frac{1}{\sqrt{2}}\ket{1}\right) &= \frac{1}{2}(\ket{0} + \ket{1}) - \frac{1}{2}(\ket{0} - \ket{1}) = \ket{1}
\end{align*}
Unlike in the probabilistic setting, we do not necessarily get a ``fair coin'' result. The above is an example of destructive interference, the key ingredient of quantum algorithm design. Quantum algorithms that run faster than their classical counterparts make constructive use of destructive interference, effectively canceling out wrong computation paths.
The tranformation matrix for the quantum analog of a biased coin flip is
\begin{align*}
\begin{bmatrix}
\sqrt{p} & \sqrt{p}\\
\sqrt{1 - p} & -\sqrt{1 - p}
\end{bmatrix}.
\end{align*}
Another prevalent quantum gate is the rotation
\begin{align*}
\Qcircuit @C=1em @R=.7em {& \gate{R_\theta} & \qw}
= \begin{bmatrix}
1 & 0 \\
0 & e^{i\theta}
\end{bmatrix},
\end{align*}
which effectively adds $\theta$ to the phase of the 1-component.
\begin{example}
Can we use deterministic gates in the quantum setting? Consider the NAND gate. The matrix associated with the NAND gate's transformation is not unitary, as both $\ket{00}$ and $\ket{10}$ map to the same output state, $\ket{10}$. In general, deterministic gates are unitary if and only if they are permutations of base states. That is, if they are reversible.
\end{example}
An important gate is the CNOT gate, which is shown schematically in Figure \ref{02:fig:cnot_gate}.
\begin{figure}[ht]
\begin{align*}
\Qcircuit @C=1em @R=.7em {
\lstick{b_1} & \targ & \rstick{b_1 \oplus b_2} \qw\\
\lstick{b_2} & \ctrl{-1} & \rstick{b_2} \qw}
\end{align*}
\caption{CNOT gate}
\label{02:fig:cnot_gate}
\end{figure}
The matrix associated with this transformation is given below:
\begin{align*}
T
= \begin{bmatrix}
1 & 0 & 0 & 1\\
0 & 0 & 0 & 0\\
0 & 0 & 1 & 0\\
0 & 1 & 0 & 0
\end{bmatrix}
\end{align*}
This gate flips its first input bit if the second bit, also known as the control bit, is a 1; otherwise it leaves the first input bit unchanged. Note that if $b_1 = 0$, then the CNOT gate effectively copies $b_2$.
\subsection{Simulating classical gates}
Even though classical gates, such as the NAND gate, do not translate directly into the quantum setting, they can be simulated. Given a transformation
\begin{align*}
f: \{0,1\}^* \rightarrow \{0,1\},
\end{align*}
we can define a new transformation
\begin{align*}
\tilde{f}: \{0,1\}^* \times \{0,1\} \rightarrow \{0,1\}^* \times \{0,1\}: (x,b) \rightarrow (x, b \oplus f(x)).
\end{align*}
Essentially, $\tilde{f}$ maintains a copy of its input in order to make the transformation reversible. One can perform this transformation on all classical gates.
\begin{example}
A reversible NAND gate is shown schematically in Figure \ref{02:fig:rev_nand}. The additional third bit, which we need to simulate the classical gate, is called an \emph{ancilla bit}.
\begin{figure}[ht]
\begin{align*}
\Qcircuit @C=1em @R=0em {
\lstick{b_1} & \multigate{2}{\text{R-NAND}} & \rstick{b_1} \qw\\
\lstick{b_2} & \ghost{\text{R-NAND}} & \rstick{b_2} \qw\\
\lstick{b_3} & \ghost{\text{R-NAND}} & \rstick{b_3 \oplus \overline{b_1 \wedge b_2}} \qw}
\end{align*}
\caption{Reversible NAND gate}
\label{02:fig:rev_nand}
\end{figure}
\end{example}
We can apply the above idea to an entire classical circuit. Sometimes, the ``garbage'' output due to the ancilla bits is problematic, as it is not defined by the original classical transformation. Specifically, this garbage output will prevent the destructive interference from happening as desired. We can circumvent this difficulty by copying the output of the circuit and then running the circuit in reverse as illustrated in Figure \ref{02:fig:circuit}.
\begin{figure}[ht]
\begin{align*}
\Qcircuit @C=1em @R=0.8em
{
\lstick{0} & /^t \qw & \multigate{3}{C'} & \push{g} \qw & /^t \qw & \qw & \qw & \multigate{3}{C'^{-1}} & \rstick{0} \qw\\
\lstick{0} & /^i \qw & \ghost{C'} & \push{z} \qw & /^i \qw & \qw & \qw & \ghost{C'^{-1}} & \rstick{0} \qw\\
\lstick{x} & /^n \qw & \ghost{C'} & \push{y} \qw & /^k \qw & \ctrl{1} & \qw & \ghost{C'^{-1}} & \rstick{x} \qw\\
\lstick{0} & /^k \qw & \ghost{C'} & \qw & \qw & \targ & \qw & \ghost{C'^{-1}} & \rstick{y} \qw \\
}
\end{align*}
\caption{Computation of $\tilde{f}$ for arbitrary classical circuit $C$.}
\label{02:fig:circuit}
\end{figure}
\begin{theorem}
If $f$ can be computed by a deterministic circuit of size $t$ and depth $d$, then $\tilde{f}$ can be computed by a reversible circuit of size $O(t)$ and depth $O(d)$ using $O(t)$ ancilla bits.
\end{theorem}
There are more efficient space usage transformations than specified by the above theorem, but this efficiency comes at the expense of time efficiency. It is an open question whether one can simulate a classical circuit in constant time and constant space relative to the original circuit.
\end{document}