Sort:
Regular Paper Issue
A Heuristic Sampling Method for Maintaining the Probability Distribution
Journal of Computer Science and Technology 2021, 36(4): 896-909
Published: 05 July 2021
Abstract Collect

Sampling is a fundamental method for generating data subsets. As many data analysis methods are developed based on probability distributions, maintaining distributions when sampling can help to ensure good data analysis performance. However, sampling a minimum subset while maintaining probability distributions is still a problem. In this paper, we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the distribution. Furthermore, a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions: one based on the chi-square test and the other based on likelihood functions. Experiments on four types of datasets with a size of 60000 show that when the significant difference level, α, is set to 0:05, the algorithm can exclude 99:9%, 99:0%, 93:1% and 96:7% of the samples based on their Bayesian networks—ASIA, ALARM, HEPAR2, and ANDES, respectively. When subsets of the same size are sampled, the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0:03; by contrast, the subsets generated by random sampling pass only 83:8% of the tests, and the average distribution difference is approximately 0:24.

Open Access Issue
Machine Knowledge and Human Cognition
Big Data Mining and Analytics 2020, 3(4): 292-299
Published: 16 November 2020
Abstract PDF (5 MB) Collect
Downloads:132

Intelligent machines are knowledge systems with unique knowledge structure and function. In this paper, we discuss issues including the characteristics and forms of machine knowledge, the relationship between knowledge and human cognition, and the approach to acquire machine knowledge. These issues are of great significance to the development of artificial intelligence.

Total 2