College of Computer Science & Software Engineering, Shenzhen University, Shenzhen518000, China.
Kazan Federal University, Kazan42008, Russia.
Show Author Information
Hide Author Information
Abstract
This is a review of quantum methods for machine learning problems that consists of two parts. The first part, "quantum tools", presented some of the fundamentals and introduced several quantum tools based on known quantum search algorithms. This second part of the review presents several classification problems in machine learning that can be accelerated with quantum subroutines. We have chosen supervised learning tasks as typical classification problems to illustrate the use of quantum methods for classification.
F.Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, vol. 65, no. 6, pp. 386-408, 1958.
D.Anguita, S.Ridella, F.Rivieccio, and R.Zunino, Quantum optimization for training support vector machines, Neural Networks, vol. 16, nos. 5&6, pp. 763-770, 2003.
A.Kapoor, N.Wiebe, and K.Svore, Quantum perceptron models, in Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3999-4007.
[8]
N.Wiebe, A.Kapoor, and K. M.Svore, Quantum perceptron models, in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 4006-4014.
[9]
M.Schuld, I.Sinayskiy, and F.Petruccione, Quantum computing for pattern classification, in Pacific Rim International Conference on Artificial Intelligence, Springer, 2014, pp. 208-220.
Y.Ruan, X. L.Xue, H.Liu, J. N.Tan, and X.Li, Quantum algorithm for k-nearest neighbors classification based on the metric of hamming distance, International Journal of Theoretical Physics, vol. 56, no. 11, pp. 3496-3507, 2017.
N.Wiebe, A.Kapoor, and K. M.Svore, Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning, Quantum Information & Computation, vol. 15, nos. 3&4, pp. 316-356, 2015.
K.Fukunaga and P. M.Narendra, A branch and bound algorithm for computing k-nearest neighbors, IEEE Transactions on Computers, vol. C-24, no. 7, pp. 750-753, 1975.
P.Rebentrost, M.Mohseni, and S.Lloyd, Quantum support vector machine for big data classification, Physical Review Letters, vol. 113, no. 13, p. 130503, 2014.
Ablayev F, Ablayev M, Huang JZ, et al. On Quantum Methods for Machine Learning Problems Part II: Quantum Classification Algorithms. Big Data Mining and Analytics, 2020, 3(1): 56-67. https://doi.org/10.26599/BDMA.2019.9020018
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
3.1 SVM
The SVM is a well-known algorithm for the classification problem[
1
]. For function , we can write the following:
where , the coefficients and are parameters of the algorithm, is the inner product of the vectors, and the hyperplane separates two classes.
As a plane, we choose the farthest one of the nearest vectors. The closest vectors to the plane are called "support vectors" and should be such that or, equivalently, . For finding the parameters of the plane, we should maximize the distance between different classes with or, equivalently, minimize . The minimization problem can be solved with the following conditions:
Based on the Kuhn-Tucker theorem, we can replace the problem by the dual formulation of the Lagrange function:
with the following conditions for :
•;
•, or .
At the same time, we have the following conditions:
Therefore, we obtain the next relation between and :
Finally, we obtain the following optimization problem on the variables:
with the following conditions for :
•;
•.
In a case of a non-linear separation between classes, we can add errors ; in that case . The minimization function is replaced by , where is a constant that is a parameter of the algorithm. The Lagrange function in Eq. (
4
) is replaced by
with and . The relation between and can be obtained using an equation similar to Eq. (
6
). At the same time, we obtain the additional equality . Because , we get . The final minimization problem is
with the following conditions for :
•;
•.
If the input data is not linearly separable, then we can replace the inner product by another function , which is called the kernel of the SVM algorithm. Typically, researchers choose the following functions as possible kernels: and for some constant ; or for some constant .
In the general case, can have several local minimums, and gradient-based methods are unhelpful. In that case, the problem is NP-hard. We can apply discretization and then use brute force or analog algorithms. We can also apply the Grover’s search to this problem and obtain a quadratic speed-up. A description of the algorithm follows.
Supposing that is a given precision for our solution, then we can split the segment to points with interval ; these points will be possible solutions. There are points for one , so the -th point is , for . Enumerating all possible solutions with precision , the solution number is for and . We can obtain by and vice versa.
With the function as , the problem is a maximum search for the function . The quantum version of the algorithm is presented as Algorithm 1 (QSVM represents Quantum Supporting Vector Machine).
The complexity of the quantum algorithm is presented in Property 1 and can be compared with the complexities given in the subsequent properties.
Property 1. The expected quantum query complexity of the quantum version of the learning phase for the SVM algorithm is .
Proof. is required for precomputing all of the inner products. The maximum search algorithm then computes the function . The complexity of computing is and the size of the search space is . Hence, the complexity of the maximum search is and the total complexity is .
For comparison, the classical algorithm has the following complexity.
Property 2. The query complexity of the learning phase for the SVM algorithm is .
Note that the different approach to the solution given in Ref. [
14
] achieves an exponential speed-up.
3.2 Perceptron
The perceptron is a simple Artificial Neural Network (ANN). It can also be considered as a building block for ANNs[
2
] and as a modification of the SVM algorithm[
3
]. Wiebe et al.[
7
,
8
] suggested a quantum algorithm for training a classical perceptron that shows a quadratic speed-up. The algorithm is based on Grover’s search algorithm[
6
].
In the classical model, the function can be written as follows:
where is the input, and are parameters of the algorithm; is the inner product of the vectors, the hyperplane separates two classes, and and should be chosen such that . There is an assumption that all training vectors are separated by a margin of in the feature space.
The idea of the classical algorithm is to invoke random sampling of and check whether the hydroplane classifies the training set correctly. The classical algorithm requires to search for the existence of a pair that is misclassified; that is, . samples need to be checked to get the correct one with error probability for some constant .
Property 3. The query complexity of sampling algorithm A for perceptron is , where is a constant and the training vectors are separated by a margin of in the feature space.
Quantum algorithm for perceptron training. The quantum algorithm operates with a quantum register of qubits. As a function for oracle, we use , where is a Boolean-value function that is if and only if the vector is misclassified. We use the amplitude amplification version of the Grover’s search algorithm to find misclassified vectors. The query complexity of the procedure is . As in the classical case, we invoke random sampling of for times. The algorithm is presented as Algorithm 2.
As was proven in Ref. [
11
], if for some , then the algorithm returns the result with error probability . The complexity of the algorithm is presented in the following property.
Property 4. The query complexity of Algorithm 3.2 is , where is a constant and the training vectors are separated by a margin of in the feature space.
4.1 Training set superposition construction subroutine
A utility procedure will be used in the quantum algorithms of this section. The procedure is an algorithm in the quantum branching program model, and it is required to store a superposition of training vectors in a single quantum state of qubits; that is, it needs to get the following quantum state:
Trugenberger[
15
] introduced an algorithm for generating this state unitarily from a simple initial state of qubits.
Property 5. If is an input vector size and is the number of input vectors in a training set, then there is an algorithm A for constructing the quantum state . The algorithm has the following complexity properties:
•
•
•.
Let us construct the algorithm A as the proof of the property.
We use three registers. The first is a register of qubits for storing input vectors ; the second register is a utility register of two qubits; and the third is a register of qubits that is holding memory. The full initial state is
This state is split into two terms: one corresponding to the already stored input vectors and another ready to process a new input vector from the training set. These two parts are distinguished by the state of the second utility qubit . The state is for the stored input vectors, and is readiness for processing.
Before processing the vector , the algorithm reads it and stores it to the register. This can be done using the CNOT operator. The controlling bit is an input bit and the target bit is the -th qubit of , where . If before the procedure, then after the procedure . If before the procedure, then after the procedure . Therefore, we can use the same procedure for erasing the register. That is why we consider an input with the duplication of each vector ; that is, the input is . For the next part of the procedure, we assume that .
For each input vector to be stored, the operations described below are performed:
where and are the control qubits, and is the target qubit.
This operation simply copies the input vector into the memory register of the processing term, identified by .
The next operation works as follows. If the contents of the input vector and the memory register are identical, then all of the qubits of the memory register are set to (this will be exactly the case only for the processing term):
where in the CNOT operator, is the control qubit and is the target qubit.
The next operation changes the first utility qubit of the processing term to a , while keeping it unchanged for the stored input vectors term:
where are control qubits and is the target qubit.
This is followed by the controlled normalization operator:
for
This operation separates out the new input vector to be stored with the already correct normalization factor.
The utility qubit is then restored to its original value:
where are control qubits and is the target qubit.
The next operation restores the memory register to its original state:
where in the CNOT operator, is the control qubit and is the target qubit.
When these operations are complete,
The third register of the processing term and the second term in the equation above are then restored to the initial value :
where and are control qubits, and is the target qubit.
At this point, a new input vector can be loaded into register and the sequence of operations reiterated. At the end of the whole process, the -register is exactly in the state .
The time complexity of the algorithm is .
The algorithm is presented as Algorithm 4.1, where is a training set, is the length of the training set, and is the length of each input vector in the training set. The algorithm constructs a superposition of input vectors of the training set .
In the next two subsections, we describe two quantum -NN algorithms. The first is from Ref. [
9
] and the second is from Ref. [
10
]. Both algorithms can be split into two stages: the preliminary stage and the main stage.
The similarity of these two algorithms lies in the preliminary stage that prepares a superposition of the training data using Algorithm 4.1. Their differences are as follows:
•They have different mechanisms for computing the distance between vectors;
•The first algorithm changes the superposition of the training data in the main stage, while the second algorithm does not change it until the measurement.
Since the preliminary stage of both algorithms has the time complexity , these two algorithms have the same time complexity as their classic counterparts. If a more efficient way to construct the superposition in , or to receive the superposition from a quantum memory device, were to be found, the performance of these quantum algorithms would be independent of the number of training vectors. It is impossible to achieve such an effect with the classical algorithm.
Additionally, if a method of training vectors storage uses a smaller number of qubits, without affecting the proportions between the features of the vectors, then it would be possible to reduce the memory size for these algorithms.
4.2 Schuld-Sinayskiy-Petruccione (SSP) algorithm
The algorithm, presented in Ref. [
9
], is a quantum version of the classical weighted nearest neighbor algorithm. The idea of the quantum version of the algorithm is as follows. Firstly, we prepare a superposition of all vectors from the training set as one quantum state, using the procedure from Section 4.1. Second, we compute the Hamming distance between each training vector and the test vector, and store this into the amplitude of each training vector in the superposition. Having done that, if we measure the class qubit, we will get the appropriate class with high probability. If we repeat the algorithm enough times, then we can get the probability distribution, which will carry information about the average closeness of members of each class to the test vector ( "" ).
Only the preliminary stage of the algorithm (the procedure in Section 4.1) reads input variables; the main stage is a quantum circuit that does not read input data.
Property 6. If is the size of an input vector, is the number of input vectors in a training set, and is a number of classes, then the algorithm SSP-kNN for classifying a test vector has the following complexity properties:
The SSP algorithm can be described as follows.
The first step of the algorithm is preparing a superposition of all of the training vectors into one quantum state using the construction of a training set superposition algorithm from Section 4.1.
The algorithm for constructing the superposition of vectors of the training set has time complexity. The test vector is represented as .
The second step is to prepare the following initial state:
This state is made up of three registers: The first contains the test vector ( qubits), the second contains the superposition of the training vectors ( qubits), and the third contains a utility qubit. The part of the second register that stores and contains qubits will be called the "class register". Initially, we assign to the last utility qubit.
The third step of the algorithm is to apply the Hadamard gate to the utility qubit:
The fourth step is to prepare a state for obtaining the Hamming distance. We apply the CNOT gate to with as the control register and as a target register, and then apply the NOT gate to reverse the states in the second register:
where
The fifth step is to apply the unitary operator , where
in which is a Pauli-Z matrix. The operator assigns the number of zeros in the second register as a phase in the case of a value of the utility qubit. In the case of a value of the utility qubit, the operator assigns the same value, but as a negative. Note that the number of zeros is the number of unequal bits for and ; that is, the Hamming distance between these vectors. Denoting this as , the state after applying is
The sixth step is to apply a Hadamard gate on the utility qubit, and write the phase information of the -th state into amplitudes:
The seventh step is to measure the utility qubit. If the test vector is close to the training vector, then the probability of a result is high; otherwise, the probability of a result is high. The probability of getting is
In other words, if there are many training vectors for which the Hamming distance between them and the test vector is small, then is high.
If we obtain a result, then the next step is to measure the "class register". We have a higher probability of getting a class for classes that have training vectors with smaller Hamming distances from the test vector. After obtaining a result from the utility qubit measurement, we have the following state:
The probability of obtaining -result from the "class register" measurement is
The class has a higher probability where more training vectors of this class have a small Hamming distance from the test vector.
Since for small angles , , and , the following statement is true.
(1) If for vectors among vectors of the training set
then the following double inequality for the probability is true:
(2) If vectors among above vectors belong to class, then the following double inequality for probability , conditioned on the probability , is true:
For computing the time complexity of the algorithm, there are NOT gates and CNOT gates in the fourth step, while the fifth step requires gates for the unitary operator implementation due to Ref. [
16
], and there are two Hadamard gates in Steps 3 and 6. The total time complexity of the main stage of the algorithm is therefore .
The algorithm is presented as Algorithm 4.2 and is labelled as , where is a training set, is the length of the training set, is the length of each vector in the training set, and is the test vector. If we obtain a -result from the utility qubit measurement, then we are unable to obtain a class, and the algorithm returns to represent "unknown" .
This algorithm is run enough times as necessary to obtain a precise picture from the measurement results.
5 Quantum Nearest Neighbors Algorithm with Quadratic Speed-up
In this section, we present a classification method that is developed for non-binary classification problems. Wiebe et al.[
11
] suggested a quantum version of the NN algorithm with a quadratic speed-up. This quantum algorithm uses the minimum search algorithm[
5
] and subroutines for computing distances between vectors[
11
].
In the classical algorithm, the function can be written as the -NN algorithm for :
The classical algorithm is a linear search among vectors for the minimum of . The procedure for computing the distance between two vectors has complexity , so the algorithm has the following complexity.
Property 8. The classical query complexity of the -NN algorithm is .
Quantum version of the nearest neighbor algorithm. First, the maximum search algorithm[
5
] is used; second, distance computing algorithms are used for computing the distance between vectors and . Both of these methods were described in the first part of the paper.
The assumptions are follows:
(1) The input vectors and are -sparse for some . In other words, each vector has only non-zero elements.
(2) Quantum oracles are provided in the form
where is the position of the -th non-zero element of .
(3) There is an such that for any .
(4) Each vector is normalized to 1, for convenience.
It is known that for any two vectors and , the following equality holds:
where is the inner product of and . Therefore, the following property holds:
The quantum algorithm is presented as Algorithm 6.
The algorithm has the complexity given in the following property.
Property 9. The quantum query complexity of the quantum version of the nearest neighbor algorithm is .
Comments. As already discussed, the algorithm works for -sparse vectors, where . This means that in the general case, we realize the speed-up if . This setup is reasonable for a range of problems, as problems with -sparse vectors in training and testing sets are not unusual.
Another restriction of the algorithm is . Typically, a prepossessing procedure can be performed that normalizes data and changes . The complexity of this procedure is , but it will be performed only once, so we do not take account of the complexity of the peprocessing procedure in the case of a large test set.
6 Conclusion
This paper surveys quantum algorithms for binary classification and discusses the quantum nearest neighbor algorithms. It further analyzes the quantum nearest neighbor algorithms and reveals their quadratic speed-up over classical algorithms.