\documentclass[11pt]{article}
\include{lecture}
\newcommand{\zM}{\mathbb{Z}_M}
\newcommand{\zm}{\mathbb{Z}_m}
\begin{document}
\lecture{11}{09/28/2010}{Applications of Eigenvalue Estimation }{Balasubramanian Sivan}
Last class we saw eigenvalue estimation as an application of phase estimation. We also discussed an application of eigenvalue estimation to Grover's algorithm. Today, we complete our discussion of Grover's algorithm from the perspective of eigenvalue estimation, and see two other applications, namely, quantum Fourier transform over $\zM$ when $M \neq 2^m$, and solving systems of linear equations over the reals.
\section{Grover's algorithm - an eigenvalue estimation perspective}
Recall that in the eigenvalue estimation problem, we are given a unitary operator $U$ and one of its eigenvectors. Our goal is to get a good approximation for the eigenvalue of $U$ corresponding to this eigenvector. Some applications of eigenvalue estimation are:
\begin{enumerate}
\item An implementation of Grover's algorithm.
\item Approximate quantum Fourier transform over $\zM$ when $M$ is not a power of 2 (We will see that the approach we used when $M$ was a power of 2 doesn't quite work here.)
\item An efficient algorithm for ``solving'' sparse and well-conditioned systems of linear equations in time poly-logarithmic in the number of dimensions of the system. We mentioned this application in our first lecture. See Section~\ref{11:section:systemOfEquations} for why we put solving between quotes.
\item Order finding $-$ Given $m$ and $a$, find the order of $a$ in $\zm$. We remark that order finding is a critical component in Shor's algorithm for factoring integers.
\end{enumerate}
We now proceed to discuss Grover's algorithm from the eigenvalue estimation perspective. Recall that the task of Grover's algorithm is to compute an input $x$ such that $f(x) = 1$, where $f:\{0,1\}^m \rightarrow 1$.
In Grover's algorithm that we saw in lecture 6, we had to know $t$ $-$ the number of strings $x$ such that $f(x) = 1$ $-$ in order to run in time $O(\sqrt{N/t})$ and find an $x$ that maps to 1. In yesterday's class, we mentioned a way to compute the value of $t$ approximately using eigenvalue estimation. The circuit we used yesterday for evaluating $t$ in Grover's algorithm, was an $n+m$ qubit circuit. We refer to the first $n$ bits as the top part, and the remaining $m$ qubits as the bottom part. Yesterday we used the result obtained from the first $n$ qubits to evaluate $t$. Today, we will observe the other part, namely the remaining $m$ qubits.
The circuit we used yesterday (see Figure~\ref{11:fig:groverEigenvalue}) was simply the circuit for eigenvalue estimation procedure, with the operator being the Grover iterate, and the input being $\ket{+}^{\otimes n}\ket{+}^{\otimes m}$. From Exercise 2 in lecture 10, we know that the Grover iterate has two eigenvectors, namely, $\ket{\varphi_+} = \frac{1}{\sqrt{2}}\ket{B} + \frac{i}{\sqrt{2}}\ket{C}$ and $\ket{\varphi_-} = \frac{i}{\sqrt{2}}\ket{B} + \frac{1}{\sqrt{2}}\ket{C}$, with respective eigenvalues $e^{2i\theta}$ and $e^{-2i\theta}$, where $\theta$ is the angle made by the uniform superposition state $\frac{1}{\sqrt{N}}\sum_x \ket{x}$ with the $B$ axis (i.e., the horizontal axis), and sin($\theta$) = $\sqrt{\frac{t}{N}}$. Here $\ket{B}$ and $\ket{C}$ are respectively, the superpositions of all the inputs that map to zero, and all the inputs that map to one, i.e., we have
\begin{align*}
\ket{B} &= \frac{1}{\sqrt{M - t}} \sum_{f(x) = 0} \ket{x}\\
\ket{C} &= \frac{1}{\sqrt{t}} \sum_{f(x) = 1} \ket{x}
\end{align*}
\begin{figure}[htb]
\centering
\[
\Qcircuit @C=.7em @R=.4em {
\lstick{\ket{+}} & \qw & \ctrl{4} & \qw & \qw & \cdots & & \qw & \multigate{3}{F^{-1}} & \meter & \qw \\
\lstick{\ket{+}} & \qw & \qw & \ctrl{3} & \qw & \cdots & & \qw & \ghost{F^{-1}} & \meter & \qw \\
\vdots & & & & & & & & \ghost{F^{-1}} & \vdots & \\
\lstick{\ket{+}} & \qw & \qw & \qw & \qw & \cdots & & \ctrl{1} & \ghost{F^{-1}} & \meter & \qw \\
\lstick{\ket{\psi}} & {/^m} \qw & \gate{G} & \gate{G^2} & \qw & \cdots & & \gate{G^{2^{n-1}}} & \qw & \meter & \qw
}
\]
\caption{Grover's algorithm through the eigenvalue estimation circuit. The first $n$ qubits of the circuit act as control wires for Grover iterates. The inverse Fourier transform is applied only on these first $n$ qubits. The Grover iterates act on the remaining $m$ qubits.}
\label{11:fig:groverEigenvalue}
\end{figure}
What is the state of the system before observation? It is the state resulting from applying the eigenvalue estimation circuit to $\ket{+}^{\otimes n}\ket{+}^{\otimes m}$, i.e.,
\begin{align}
\alpha_{+}\ket{\widetilde{2\theta}}\ket{\varphi_+} + \alpha_{-}\ket{\widetilde{-2\theta}}\ket{\varphi_{-}}\label{11:eqn:eigenvalueGroverOutput}
\end{align}
where $\ket{\widetilde{x}}$ is a superposition concentrated around $x$, which results from phase/eigenvalue estimation. What do we get if we observe the second register? First, note that if we were to observe $\ket{\varphi_+}$ alone or $\ket{\varphi_-}$ alone, then with a probability of half, we will observe an input that maps to 1. This is because both $\ket{\varphi_+}$ and $\ket{\varphi_-}$ have an amplitude square of 1/2 for $\ket{C}$. But we only get to observe the second register for the state in~\eqref{11:eqn:eigenvalueGroverOutput}.
If $\ket{\widetilde{2\theta}}$ and $\ket{\widetilde{-2\theta}}$ were orthogonal, the probability of observing an input that maps to one, is the sum of the probabilities of observing the same along with a constituent base state of $\ket{\widetilde{2\theta}}$ and observing the same along with a constituent base state of $\ket{\widetilde{-2\theta}}$, which is exactly $\alpha_+^2/2 + \alpha_-^2/2$, which is 1/2. But since $\ket{\widetilde{2\theta}}$ and $\ket{\widetilde{-2\theta}}$ are not necessarily orthogonal, to obtain the probability with which we observe an input that maps to 1, we find an upper bound on the overlap between $\ket{\widetilde{2\theta}}$ and $\ket{\widetilde{-2\theta}}$.
From our analysis of the phase estimation procedure, we know that the probability that the value $\tilde{2 \theta}$ that we observe is more than $\delta$ away from $2 \theta$ is $O(\frac{1}{\delta N})$. Using this, we note that
\begin{align*}
\Pr [\widetilde{2\theta} \notin (0,4\theta)] &= O\left(\frac{1}{\theta N}\right)\\
\Pr [\widetilde{-2\theta} \notin (-4\theta,0)] &= O\left(\frac{1}{\theta N}\right)
\end{align*}
If $0 < \theta < \pi/4$, then, $(0,4\theta)$ and $(-4\theta,0)$ don't overlap. From the above equations, the total probability mass of base states constituting $\ket{\widetilde{2\theta}}$ that lie outside $(0,\pi)$ is at most $O(\frac{1}{\theta N})$. Similarly, the total probability mass of the base states constituting $\ket{\widetilde{-2\theta}}$ that lie outside $(-\pi,0)$ is at most $O(\frac{1}{\theta N})$. Hence $\Pr$[success] $\geq \frac{1}{2} - O(\frac{1}{\theta N})$. So our algorithm is the following:
\begin{enumerate}
\item For $n = 1,2,\dots $
\begin{enumerate}
\item Run eigenvalue estimation circuit on $\ket{+}^{\otimes n}\ket{+}^{\otimes m}$, with the Grover iterate operator.
\item If $f$(observed $y$) = 1, then halt.
\end{enumerate}
\end{enumerate}
Once $N$ exceeds $O(\frac{1}{\theta})$, for some value of $\theta$, we have a good probability of success, say 1/3. If we denote by $i^*$ the trial number after which the probability of success exceeds 1/3, then the number of oracle calls till $i^*$, as we saw in Lecture 6, is $O(\sqrt{\frac{2^m}{t}})$. The expected number of oracle calls after $i^*$ is given by
\begin{align*}
\sum_{i > i^*}2^i\left(\frac{2}{3}\right)^{i-i^*}
\end{align*}
i.e., the event that we do trial number $i > i^*$, happens only when we fail in all the previous trials. In particular, we fail in all the trials after $i^*$, each happens with probability at most 2/3. Hence the above expression. This series does not converge. The trick we adopted to in Lecture 6 was to change our doubling to ``$\lambda$-ing'' for some appropriate $\lambda$ that ensures convergence of the series. But we cannot do that here because we can only decide $n$ here. Once we increase $n$ by 1, we are automatically doubling. So what we do is, instead of going from $n$ to $n+1$ immediately after failing at $n$, we try each $n$ twice before going to next $n$. Now, the trial number after which the probability of success exceeds 1/3 is $2i^*$. Till $2i^*$ trials, the number of oracle calls is $O(\sqrt{\frac{2^m}{t}})$. The expected number of Oracle calls after trial number $2i^*$ is as follows:
\begin{align*}
\sum_{i > 2i^*}2^{\lceil\frac{i}{2}\rceil}\left(\frac{2}{3}\right)^{i-2i^*}.
\end{align*}
This is a geometric series with a ratio of $\frac{2\sqrt{2}}{3} <1$, and thus converges. The leading term $\frac{2}{3}^{-2i^*}$ is $O(\sqrt{\frac{2^m}{t}})$.
Thus, our final algorithm is as follows:
\begin{enumerate}
\item For $n = 1,2,\dots $\\
Repeat twice:
\begin{enumerate}
\item Run eigenvalue estimation circuit on $\ket{+}^{\otimes n}\ket{+}^{\otimes m}$, with the Grover iterate operator.
\item If $f$(observed $y$) = 1, then halt.
\end{enumerate}
\end{enumerate}
Note that this algorithm is the same as our algorithm in Lecture 6, except that
\begin{itemize}
\item We did not have the top portion in our old circuit, namely the one corresponding to the first $n$ qubits. The top portion of the new circuit, does two things. One is to serve as control for the Grover iterates, and the other is to do an inverse Fourier transform on the first $n$ qubits.
\item In the earlier algorithm, for any given trial, we did not pick the number of iterations of Grover iterate to be $2^n$ exactly. Instead, we picked a number uniformly at random from 1 to $2^n$ to be the number of times Grover iterate must be applied.
\end{itemize}
Now, in our new circuit, since we do not use the first $n$ answer bits, we need not apply the inverse Fourier transform operation. Removing this inverse Fourier transform does not affect the probability of observing $y$ in the last $m$ qubits, for any given base state $y$. This is because the probability of observing $y$ is the sum of squares of the amplitudes of the states $\ket{x}\ket{y}$, where $\ket{x}$ is an $n$-qubit state. This quantity is not altered by removing the inverse Fourier transform for the first $n$ qubits. So, the only remaining difference from the algorithm in Lecture 6 is the control part. But note that using wires from a uniform superposition over all states from $0$ to $2^n - 1$ as a control for Grover iterates, is just equivalent to picking the number of applications of Grover iterates uniformly at random between 0 and $2^n-1$. So, both the algorithms are the same.
\section{Quantum Fourier transform over $\zM$ when $M$ is not a power of two}
We saw how to effect the Quantum Fourier transform over $\zM$ when $M$ is an exact power of 2, in an earlier lecture. Today we discuss the general $M$ case. When $M$ is not a power of 2, the first thing we do is to embed $\{0,1,2,\dots,M-1\}$ into an $n$ qubit system with base states in $\{0,1\}^n$. For example, choose $n = \lceil \log M\rceil$. When $M$ is not an exact power of two, this means that we do not use some base states in $\{0,1\}^n$, namely those states that correspond to $x$ such that $x\geq M$. Our goal is to effect the following transformation $F_M$:
\begin{align*}
F_M: \ket{x} \longrightarrow \begin{cases} \ket{\widehat{x}} = \frac{1}{\sqrt{M}} \sum_{y=0}^{M-1}e^{\frac{2\pi ixy}{M}}\ket{y} &\text{ for $0 \leq x < M$ }\\
\ket{x} &\text{ otherwise}\end{cases}
\end{align*}
We do not care about the action of $F_M$ for $x \geq M$.
It is worth pausing here to reflect that the idea of tensor products that we used for QFT when $M$ is an exact power of 2, does not work here. We wrote the Fourier transform of a state $ \ket{x}$ as the tensor product $\otimes_{j=1}^n \ket{y_j}$, where
\[ \ket{y_j}=\frac{\ket{0}+e^{2\pi i(\cdotp x_{n-j+1}\ldots x_n)}\ket{1}}{\sqrt{2}}.\] Note that we needed $2^n$ base states to use the tensor product idea.
To realize the transformation $F_M$, we first achieve a transformation from $\ket{x}\ket{0}\ket{0}$ to $\ket{x}\ket{\widehat{x}}\ket{0}$ as follows:
\begin{align*}
&\ket{x}\ket{0}\ket{0} \longrightarrow \ket{x}\left(\frac{1}{\sqrt{M}}\sum_{y=0}^{M-1}\ket{y}\right)\ket{0}
\longrightarrow \ket{x}\left(\frac{1}{\sqrt{M}}\sum_{y=0}^{M-1}\ket{y}\ket{xy}\right)
\longrightarrow \\
&\ket{x}\left(\frac{1}{\sqrt{M}}\sum_{y=0}^{M-1}e^{\frac{2\pi ixy}{M}}\ket{y}\ket{xy}\right)
\longrightarrow \ket{x}\left(\frac{1}{\sqrt{M}}\sum_{y=0}^{M-1}e^{\frac{2\pi ixy}{M}}\ket{y}\right)\ket{0} = \ket{x}\ket{\widehat{x}}\ket{0}
\end{align*}
Note that operation one is just producing a superposition of base states from 0 to $M-1$. Operation two is a multiplication operation, and hence is a deterministic one. Operation three is a controlled rotation. Operation four is simply undoing the reversible simulation of the multiplication operation we did.
Thus we have effected a transformation from $\ket{x}\ket{0}\ket{0}$ to $\ket{x}\ket{\widehat{x}}\ket{0}$, which is same as having effected the transformation from $\ket{x}\ket{0}$ to $\ket{x}\ket{\widehat{x}}$. But, this is not enough for us as the $\ket{x}$ in the first register will meddle with the interference pattern that must be exhibited by $\ket{\widehat{x}}$. We need our final state to be $\ket{\widehat{x}}\ket{0}$. Note that it is enough if we get the final state to be $\ket{0}\ket{\widehat{x}}$, as we can flip the two registers. We now show how to transform $\ket{x}\ket{\widehat{x}}$ to $\ket{0}\ket{\widehat{x}}$.
Actually, we show a unitary transformation from $\ket{0}\ket{\widehat{x}}$ to $\ket{x}\ket{\widehat{x}}$, and this is enough, as we know that unitary transformations are reversible. To achieve this, we first note that $\ket{\widehat{x}}$ is an eigenvector of the following operator $U.$
\begin{align*}
U: \ket{x} \longrightarrow \begin{cases} \ket{x-1 \text{ mod} M} \text{ for $0 \leq x < M$}\\
\ket{x} \text{ otherwise}\end{cases}
\end{align*}
To see that $\ket{\widehat{x}}$ is an eigenvector of $U$, note that
\begin{align*}
U\ket{\widehat{x}} &= \frac{1}{\sqrt{M}} \sum_{y=0}^{M-1} e^{\frac{2\pi ixy}{M}}\ket{y-1}\\
&= \frac{1}{\sqrt{M}} \sum_{y=0}^{M-1} e^{\frac{2\pi ix(y+1)}{M}}\ket{y}\\
&= e^{\frac{2\pi ix}{M}}\ket{\widehat{x}}.
\end{align*}
Thus, $\ket{\widehat{x}}$ is an eigenvector of $U$ with eigenvalue $e^{\frac{2\pi ix}{M}}$. In the notation we used for eigenvalue estimation, $\omega = \frac{x}{M}$. So, we run eigenvalue estimation on $\ket{\widehat{x}}$ with $U$ as the operator. This gives us $\ket{\widetilde{x}}\ket{\widehat{x}}$. Recall that the notation $\ket{\widetilde{x}}$ denotes a state that is concentrated around $\ket{x}$. If $\ket{\widetilde{x}}$ was exactly $\ket{x}$, then we have achieved the required transformation. But we will not get $\ket{x}$ exactly. For $k$ bits of accuracy, we require our operator $U$ applied $2^k$ times. Moreover, for our operator $U$, iterates are easy to compute because we have
\begin{align*}
U^r: \ket{x} \longrightarrow \begin{cases} \ket{x-r \text{ mod} M} \text{ for $0 \leq x < M$}\\
\ket{x} \text{ otherwise}\end{cases}
\end{align*}
So we can afford high values of $k$. But, is any error tolerated at all? The answer is yes because we know successive application of unitary operators never make error grow, and thus any further operations that we make on this approximate state will only carry the tiny error forward, without blowing it up.
\section{``Solving'' systems of linear equations}\label{11:section:systemOfEquations}
We explain why we wrote solving in quotes by describing the problem and the solution we develop. Consider a system of linear equations $Ax = b$, where $A$ is an $N\times N$ matrix that is invertible. Our goal is to effect the following transformation: start with a normalized state $\ket{b}$, in $n$ (= $\log N$) qubits, and transform it into the normalized solution $\ket{x}$. As in Fourier transform, what we have at the end is a state that contains in it all the information about the solution to the system, but we do not have access to all that information: after all we can make a measurement. Nevertheless, the transformation to $\ket{x}$ might be useful in some situations. We could for example do a linear transform from $\ket{x}$ to some other state, and then make a measurement.
We also require that the matrix $A$ is Hermitian, i.e., $A^* = A$. Note that $A$ being Hermitian is not a restriction, as we can get around it by solving the following system instead:
\begin{align*}
\left[\begin{array}{cc}
0 & A\\
A^{*} & 0
\end{array}\right]\left[\begin{array}{c}x'\\ x \end{array}\right] = \left[\begin{array}{c}b\\ 0 \end{array}\right].
\end{align*}
The coefficient matrix in the above equation is clearly Hermitian. $x'$ is some set of variables we do not care about.
The quantum algorithm for this problem runs in time $\widetilde{O}((\log N)s^2\kappa^2\frac{1}{\epsilon})$, where the $\widetilde{O}$ notation is just the $O$ notation with polylogarithmic terms ignored. In this particular case, polylogarithmic terms in $s,\kappa$ and $\epsilon$ have been ignored. We now explain the parameters in the expression for run-time.
\begin{enumerate}
\item $s$ is the sparseness of the matrix $A$, i.e., $A$ contains at most $s$ nonzero elements per row. We require that these non-zero elements can be located efficiently, i.e., in time negligible compared to the run-time mentioned above.
\item $\Omega(1) \leq ||A|| \leq 1$ is a constant less than $1$ and $||A^{-1}|| \leq \kappa$. Note that for a Hermitian matrix the 2-norm just represents the largest eigenvalue. We require the largest eigenvalue of $A$ to be less than 1. If this is not the case, we scale the matrix to achieve this. We need to have good estimates of the largest eigenvalue to do this. $\kappa$ is the notation for condition number in numerical analysis. We use it here for the following reason: The condition number of a matrix is $||A||\cdot ||A^{-1}||$ , i.e., the ratio of the largest eigenvalue and the smallest eigenvalue. Since we have $||A|| \leq 1$, and $||A^{-1}|| \leq \kappa$, it follows that the condition number is at most $\kappa$. We require a system with a small condition number to have an edge over running a classical algorithm for solving the system. This is because $\kappa$ could potentially be as large as $N$, in which case, we do not derive any benefit by running a quantum algorithm.
\item $\epsilon$ is the error parameter, i.e., we would like our final distribution to be at most $\epsilon$ away from the exact distribution.
\end{enumerate}
The run-time of any naive classical algorithm is $O(\text{poly}(N)\kappa\polylog(\frac{1}{\epsilon}))$. Thus, the dependence on $\kappa$ is better in the classical algorithm than the quantum algorithm. Also the dependence on $\epsilon$ is exponentially better in a classical algorithm. But the run-time of the classical algorithm grows polynomially in $N$, which is exponentially worse than the quantum algorithm's $\log(N)$ growth.
\subsection{Idea behind the algorithm}
We now discuss the idea behind the algorithm. The first point to observe is that the matrix $A$ is Hermitian, and thus, it has a full basis of eigenvectors, all with real eigenvalues. Thus, any state, in particular our $\ket{b}$, can be written as
\begin{align}
\ket{b} = \sum_{j}\beta_j\ket{\varphi_j}\label{11:eqn:systemRHS}
\end{align}
where the $\ket{\varphi_j}$'s form the eigenbasis of $A$, with $\lambda_j$'s being their eigenvalues. Thus our target state $\ket{x}$ can be written as
\begin{align}
\ket{x} = \sum_j \beta_j\lambda_j^{-1}\ket{\varphi_j}\label{11:eqn:solnSet}.
\end{align}
To see this is right, note that if we operate with $A$ on both sides, we get~\eqref{11:eqn:systemRHS}. Equation~\eqref{11:eqn:solnSet} suggests that the next goal is to be find these $\lambda_j$'s. Given $\ket{\varphi_j}$, we need to be able to extract $\lambda_j$ out. In order to do this, we need an operator $U$ with eigenvector $\ket{\varphi_j}$ and eigenvalue $e^{i\lambda_j} = e^{2\pi iw_j}$. (For phase estimation, we require $\omega_j$ to be between 0 and 1. That's why we restricted the largest eigenvalue to be at most 1.) Now, what is the operator with the above described property? It is $U = e^{iA}$, i.e., the matrix $iA$ exponentiated, where $e^X$ is defined through its Taylor expansion:
\begin{align}
e^X = \sum_{k=0}^{\infty} \frac{X^k}{k!}\label{11:eqn:matrixExponentiation}.
\end{align}
We know that~\eqref{11:eqn:matrixExponentiation} converges for every point in the complex plane, when $X$ is a number. But in fact, it converges when $X$ is a matrix also.
\begin{exercise}
Prove the following statements.
\begin{itemize}
\item The series defined in~\eqref{11:eqn:matrixExponentiation} converges for every matrix $X$.
\item If $X$ has an eigenvector $\ket{\varphi_j}$ with eigenvalue $\lambda_j$, then, $e^X$ too has $\ket{\varphi_j}$ an eigenvector with $e^{\lambda_j}$ as the corresponding eigenvalue.
\item If $X$ and $Y$ commute, i.e., $XY = YX$, then, $e^Xe^Y = e^{X+Y}$.
\item If $A$ is Hermitian, and if $U = e^{iA}$, then $U^* = e^{-iA^*}$ and $UU^* = I$.
\end{itemize}
\end{exercise}
\section{Next time}
In the next class, we will see how to use $e^{iA}$ in effecting the transformation we need for solving the system. We will also discuss another application of eigenvalue estimation, namely order finding.
\end{document}