Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal Processing, Anhui University, Hefei230601, China
Show Author Information
Hide Author Information
Abstract
Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
T. G.Dietterich, R. H.Lathrop, and T.Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol. 89, no. 1, pp. 31-71, 1997.
B. B.Ni, Z.Song, and S. C.Yan, Web image mining towards universal age estimator, inProceedings of the 17th ACM International Conference on MultimediaInt, ACM, 2009, pp. 85-94.
A.Zafra, C.Romero, S.Ventura, and E.Herrera-Viedma, Multi-instance genetic programming for web index recommendation, Expert Systems with Applications, vol. 36, no. 9, pp. 11470-11479, 2009.
S.Andrews, I.Tsochantaridis, and T.Hofmann, Support vector machines for multiple-instance learning, Advances in Neural Information Processing Systems, vol. 15, pp. 561-568, 2002.
Z. H.Zhou, Y. Y.Sun, and Y. F.Li, Multi-instance learning by treating instances as non-I.I.D samples, inProceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 1249-1256.
Z.Jorgensen, Y.Zhou, and M.Inge, A multiple instance learning strategy for combating good word attacks on spam filters, The Journal of Machine Learning Research, vol. 9, no. 6, pp. 1115-1146, 2008.
J. B.Bi, Y. X.Chen, and J. Z.Wang, A sparse support vector machine approach to region-based image categorization, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1121-1128, 2005.
Z. Y.Fu, A.Robles-Kelly, and J.Zhou, MILIS: Multiple instance learning with instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 958-977, 2011.
S. H.Feng and D.Xu, Transductive multi-instance multi-label learning algorithm with application to automatic image annotation, Expert Systems with Applications, vol. 37, no. 1, pp. 661-670, 2010.
D. T.Nguyen, C. D.Nguyen, R.Hargraves, L. A.Kurgan, and K. J.Cios, mi-DS: Multiple-instance learning algorithm, IEEE Transactions on Systems, Man, and Cybernetics Society. Part B, Cybernetics, vol. 43, no. 1, pp. 143-154, Feb.2013.
B.Babenko, M. H.Yang, and S.Belongie, Visual tracking with online multiple instance learning, inIEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 983-990.
B.Babenko, M. H.Yang, and S.Belongie, Robust object tracking with online multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1619-1632, 2011.
C.Leistner, A.Saffari, and H.Bischof, MIForests: Multiple-instance learning with randomized trees, inProceedings of the 11th European Conference on Computer Vision: Part VI, Springer, 2010, pp. 29-42.
Y.Xie, Y. Y.Qu, C. H.Li, and W. SZhang, Online multiple instance gradient feature selection for robust visual tracking, Pattern Recognition Letters, vol. 33, no. 9, pp. 1075-1082, 2012.
M.Bellare, T.Ristenpart, and S.Tessaro, Multi-instance security and its application to password-based cryptography, Advances in Cryptology-CRYPTO 2012, Springer, pp. 312-329, 2012.
Q.Zhang and S. A.Goldman, EM-DD: An improved multiple-instance learning technique, Advances in Neural Information Processing Systems, vol. 14, no. 2022, pp. 1073-1080, 2001.
Y. J.Han, Q.Tao, and J.Wang, Avoiding false positive in multi-instance learning, Advances in Neural Information Processing Systems, vol. 23, pp. 1-8, 2010.
M. Y.Kim and F.Torre, Gaussian processes multiple instance learning, inProceedings of the 27th International Conference on Machine Learning, 2010, pp. 535-542.
[23]
F. X.Li and C.Sminchisescu, Convex multiple-instance learning by estimating likelihood ratio, Advances in Neural Information Processing Systems, vol. 23, pp. 1360-1368, 2010.
J. T.Kwok and P. M.Cheung, Marginalized multi-instance kernels, inProceedings of the 20th International Joint Conference on Artifical Intelligence, Morgan Kaufmann Publishers Inc, 2007, pp. 901-906.
[26]
D.Zhang, F.Wang, L.Si, and T.Li, MIC: Maximum margin multiple instance clustering, inProceedings of the 21th International Joint Conference on Artifical intelligence, Morgan Kaufmann Publishers Inc, 2009, pp. 1339-1344.
[27]
C. H.Li, I.Gondra, and L.Liu, An efficient parallel neural network-based multi-instance learning algorithm, The Journal of Supercomputing, vol. 62, no. 2, pp. 724-740, 2012.
H. Y.Wang, Q.Yang, and H. B.Zha, Adaptive p-posterior mixture-model kernels for multiple instance learning, inProceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 1136-1143.
J.Wang and J. D.Zucker, Solving the multiple-instance problem: A lazy learning approach,inProceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2000, pp. 1119-1125.
[30]
T.Deselaers and V.Ferrari, A conditional random field for multiple-instance learning, inProceedings of the 27th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2010, pp. 1119-1125.
[31]
B.Babenko, N.Verma, P.Dollár, and S. J.Belongie, Multiple instance learning with manifold bags, inProceedings of the 28th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2011, pp. 81-88.
[32]
D.Zhang, Y.Liu, L.Si, J.Zhang, and R. D.Lawrence, Multiple instance learning on structured data, Advances in Neural Information Processing Systems, vol. 24, pp. 145-153, 2011.
L. X.Jiang, Z. H.Cai, D. H.Wang, and H.Zhang, Bayesian citation-KNN with distance weighting, International Journal of Machine Learning and Cybernetics, pp. 1-7, 2013.
O.Maron, Learning from ambiguity, Ph.D. dissertation, Massachusetts Institute of Technology, USA, 1998.
[35]
L.Zhang and B.Zhang, A geometrical representation of mcculloch-pitts neural model and its applications, IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 925-929, 1999.
W. S.McCulloch and W.Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biology, vol. 5, no. 4, pp. 115-133, 1943.
Y.Lia, D. M. J.Taxa, R. P. W.Duina, and M.Looga, Multiple-instance learning as a classifier combining problem, Pattern Recognition, vol. 46, no. 3, pp. 865-874, 2013.
10.1109/TST.2013.6574674.F001
The shape of a molecule changes as it rotates internal bonds[
1
].
10.1109/TST.2013.6574674.F002
Input vectors of instance x and their projections.
10.1109/TST.2013.6574674.F003
A sphere neighbor for classification.
2.1 Constructive covering algorithm
Constructive Covering Algorithm (CCA) was proposed by Zhang and Zhang[
35
] in 1999, as a constructive supervised learning algorithm for McCulloch-Pitts Neural Model[
36
]. The main idea of CCA is mapping all patterns in the data set to a d-dimensional sphere at first. Then the sphere neighbors (Covers) are utilized to divide the patterns.
Given a classification data set , where , represents classes. Let N and d be the size and dimension of , be a pattern in . Let be a subset of , which contains all the patterns of the -th class, . The classification model created by CCA is a cover set: , where is the center of a cover, is its radius.
Since the dimension of instances in MI data set is identical, all input vectors can be restricted to a -dimensional hyper-sphere . The input vector is a bounded set of the -dimensional space.
Define a transformation :
Use to do a projection: : , is a -dimensional sphere of the -dimensional space. is the radius of the -dimensional space, the value of must be greater or equal to the maximum value of the mold length of all the instances. is the additional value of , i.e., the value of the -dimension after projection as shown in
Fig. 2
.
10.1109/TST.2013.6574674.F002
Input vectors of instance x and their projections.
After all the instances in are projected upward on by transformation , construct a series of positive covers ( "positive sphere neighbors" ) that only consist of instances in the positive bags and a series of negative covers ( "negative sphere neighbors" ) that only consist of instances in the negative bags. This procedure is shown as follows.
To generate the sphere neighbors that only contain instances belonging to the same class, first of all, select an instance , randomly. Let be the set of instances has the same label as . Therefore, the set of instances has the opposite label from can be denoted as .
Then, we compute the distance between and , is the nearest instance from which belongs to the set of and the distance between and , where is the furthest instance from which belongs to the set of . Note that in CCA, must be smaller than .
The above procedure can be expressed as Eqs. (
2
) and (
3
):
where denotes the "inner product" between instances and . Note that the bigger "inner product" indicates the smaller distance.
Next, we calculate the radius of a sphere neighbor so that in this sphere neighbor there are only instances belonging to the same class (See
Fig. 3
). Let be the radius of a sphere neighbor. The following formula is utilized to calculate the value of :
10.1109/TST.2013.6574674.F003
A sphere neighbor for classification.
The basic constructive covering algorithm is shown in Algorithm 1.
We are inspired by the inherent feature of CCA. That is, the learning results of CCA is a series of , each of which only contains samples belonging to the same class. It is this feature that makes CCA very suitable for breaking through and reorganizing the structure of MI data.
k-Nearest Neighbor (kNN) algorithm[
37
] is one of the most popular learning algorithms in data mining. kNN is well known for its relatively simple implementation and decent results. The main idea of kNN algorithm is to find a set of k objects in the training data that are close to the test pattern, and base the assignment of a label on the predominance of a particular class in this neighbor. kNN is a lazy learning technique based on voting and distances of the k nearest neighbors. Given a training set and a test pattern , kNN computes the similarity (distance) between and the nearest k neighbors. The label of is assigned by voting from the majority of neighbors.
The basic kNN algorithm is shown in Algorithm 2.
10.1109/TST.2013.6574674.F004
The original distribution of MI bags before preprocess.
10.1109/TST.2013.6574674.F005
The bag structure of MI data set after preprocess. Instances from negative bags are covered by "thick lines" ; instance from positive bags are covered by "thin lines" .
3.2 Exclude the noises in positive bags
An example of the original distribution of MI data in the two-dimensional space is given in
Fig. 4
.
10.1109/TST.2013.6574674.F004
The original distribution of MI bags before preprocess.
Unfilled shapes represent feature vectors of positive bags and filled shapes represent feature vectors of negative bags. All points of the same shape denote feature vectors of the same bag. For example, the bag "unfilled regular hexagon" is a positive bag and it has four instances. As
Fig. 4
depicts, from the distribution of instances in each positive bag, we cannot find the distribution pattern that the instances in each positive bag have got to abide due to the existence of the false positive instances in each positive bag. Specifically, it is difficult to find a linear or non-linear dividing line to separate the positive bags and the negative bags precisely. On the other hand, the distribution of instances in the same positive bag is random and discrete. From our perspective, it is precisely because of the irregular structural features of the positive bags, i.e., the ambiguity of the positive bags leads to the inherent difficulty of MI problem.
The learning result of CCA is a set and this feature that makes CCA very suitable for breaking through and restructuring the structure of MI data. By regarding the set as the new structure of MI bags, the original structure of bags is reorganized from irregular to regular, so that the false positive instances can be excluded.
In order to discriminate the noises in the positive bags, construct a set that only contains instances from the positive bags and a set = that only contains instances from the negative bags, respectively. Thus, the irregular discrete structure of bag can be broken through by converting the original data set into a set as depicted in
Fig. 5
. The new structure is very convenient for excluding the false positive instances in the positive sphere neighbors by using kNN algorithm. The details are as follows.
10.1109/TST.2013.6574674.F005
The bag structure of MI data set after preprocess. Instances from negative bags are covered by "thick lines" ; instance from positive bags are covered by "thin lines" .
First, construct the set by using CCA. After this procedure, kNN is utilized to exclude the false positive sphere neighbors that mainly consist of false positive instances. According to the kNN algorithm, a obtained by CCA is considered as a whole. For each , if the majority of its nearest neighbors are belong to the set , we delete it from the set and add it to the set . The distance between and can be calculated by using Hausdorff distance. Edgar[
38
] and Wang and Zucker[
29
] pointed out two kinds of such Hausdorff distance, that is maximal Hausdorff distance (maxHD) and minimal Hausdorff distance (minHD ).
According to the definition, given two sets of instances and .
The maxHD is defined as
where
The minHD is defined as
where is the Euclidean distance between instance and instance .
The complete description of excluding the noises in positive bags is illustrated in Algorithm 3.
After preprocess the MI data in the first stage, the original structure of MI data set is converted into positive cover set and negative cover set . In addition, a fair amount of false positive instances in the set are excluded.
3.3 Predict the labels of test bags
In this stage, a or an is treated as the new structure of a bag, then the kNN algorithm at the bag-level is utilized to predict the real labels of the unknown bags. Because of a large number of noises in the positive bags are excluded during the procedures above. It is quite convenient to predict the labels of test bags by using kNN algorithm at the bag-level. We compute the similarity between each test bag and its nearest neighbors, if there are more negative around a test bag, the label of the test bag is negative, otherwise positive. The complete description of the predict algorithm is illustrated in Algorithm 4.