Abstract
Stroke prediction and prevention is an important focus in healthcare due to the significant morbidity and mortality associated with strokes. In this study, we investigate using Generative Adversarial Networks (GANs) to augment a stroke dataset and evaluate the effects on prediction performance. The original dataset contained patient medical records and demographics used to predict stroke occurrence. We trained a GAN on these data and generated synthetic samples to augment the training set. Five machine learning models were developed on the original and augmented datasets, including decision tree, k-nearest neighbors, random forest, Support Vector Machine (SVM), and logistic regression classifiers. Experiments indicate statistically significant improvements in prediction accuracy, F1-score, specificity, and sensitivity with GAN augmentation across all models. The random forest classifier achieved the highest average accuracy of 0.981 on augmented data, versus 0.967 on original data. GANs prove effective for tackling class imbalance and enabling more robust stroke prediction from limited real-world data. This demonstrates the potential of data augmentation and generative models to enhance healthcare Artificial Intelligence (AI) applications.