Brain Stroke Prediction Using ANN Model. -

Abstract

Stroke is a serious medical condition that occurs when the brain doesn’t receive enough blood, causing brain cells to die. If doctors do not diagnose a stroke correctly, it can result in brain injury, paralysis, or even death. Because stroke symptoms often appear suddenly and unpredictable conditions can trigger them, preventing a Brain Stroke Prediction Using ANN Model is challenging. Therefore, studying the interrelationship of risk factors in patients’ health records and understanding their impact on stroke prediction is crucial. In our paper, we address this important issue by proposing a model that uses an Artificial Neural Network, which predicts strokes with 100% accuracy.

Introduction

The different body parts and their functions form the foundation of human life. Stroke is a hazardous condition that claims human lives, frequently occurring after the age of 65. Heart attacks affect how the heart works, while strokes similarly impact the brain. Strokes result from either a restriction in blood supply to the brain or the rupture and bleeding of brain blood vessels. When a rupture or blockage occurs, blood and oxygen cannot reach the brain’s tissues. Currently, stroke ranks as the fifth greatest cause of death in both industrialized and developing nations [1].

There has been limited research on brain strokes. This paper aims to demonstrate how machine learning algorithms, boosting techniques, and artificial neural networks (ANN) can predict when a brain stroke will occur. The primary contribution of this research is the application of various algorithms to a publicly available dataset (from the Kaggle website) to compare and identify the best approach for predicting the onset of stroke. In this study, we use neural networks as classification algorithms to predict the presence of stroke disease based on various associated characteristics.

Related Work

Numerous academics have previously utilized machine learning to forecast strokes. Govindarajan et al. [2] classified stroke disorders in 507 individuals using text mining and a machine learning classifier. They tested a variety of machine learning methods, including the Artificial Neural Network (ANN), for training purposes and found that the SGD algorithm provided the highest accuracy at 95 percent.

Amini et al. [3] conducted research to predict stroke occurrences. They classified 50 risk factors, including stroke, diabetes, cardiovascular disease, smoking, hyperlipidemia, and alcohol consumption, in 807 healthy and unhealthy individuals. They identified the c4.5 decision tree algorithm (95 percent accuracy) and the K-nearest neighbor algorithm (94 percent accuracy) as the most accurate methods.

Cheng et al. [4] conducted a study to estimate the prognosis of ischemic stroke. They used 82 ischemic stroke patient data sets and developed two ANN models, which achieved accuracy values of 79 and 95 percent.

Hackers are distorting the images of women

Data Collection

The dataset we used in our project is collected from kaggle. The dataset contains 665 data alongside 11 features. We used 10 features as our input feature and stroke feature as our output feature.[5]

Methodology

Preprocessing: Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step
while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across the clean and formatted data.
i) Label Encoding: Label Encoding is a technique of converting the labels into numeric form so that it could be ingested to a machine learning model. It is an
important step in data preprocessing for supervised learning techniques. In this method, we generally replace each value in a categorical column with numbers from 0 to N-1. Label Encoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1.
ii) One hot encoding: One-hot Encoding. In this technique, the integer encoded variable is removed and a new binary variable is added for each unique integer value. The binary variables are often called “dummy variables” in other fields, such as statistics. In One-hot Encoding, the label-encoded data is further divided into n number of columns. Here, n denotes the total number of unique labels generated while performing label encoding.
iii) Min-max scaling: MinMaxScaler rescales the data set such that all feature values are in the range [0, 1]. This is done feature-wise in an independent way. The
MinMaxScaler scaling might compress all inliers in a narrow range.

Brain Stroke Prediction Using ANN Model.

Classification

ANN: Neural networks are loosely representative of the human brain learning. An Artificial Neural Network consists of Neurons which in turn are responsible for creating layers. These Neurons are also known as tuned parameters.The output from each layer is passed on to the next layer. There are different nonlinear activation functions to each layer, which helps in the learning process and the output of each layer .

Results

Accuracy: Accuracy is a metric that generally describes how the model performs across all classes. Brain Stroke Prediction Using ANN Model is useful when all classes are of equal importance. To calculate accuracy, you divide the number of correct predictions by the total number of predictions:

Accuracy=(TP+TN)(TP+FP+FN+TN)\text{Accuracy} = \frac{(TP+TN)}{(TP+FP+FN+TN)}Accuracy=(TP+FP+FN+TN)(TP+TN)

True Negatives (TN): These are the cases where the model correctly predicts negative values, meaning both the actual class value and the predicted class value are “no.”
False Positives (FP): These occur when the actual class is negative, but the model incorrectly predicts it as positive.
False Negatives (FN): These occur when the actual class is positive, but the model incorrectly predicts it as negative.

Precision: The precision is calculated as the ratio between the numbers of Positive samples correctly classified to the total number of samples classified as Positive. The precision measures the model’s accuracy in classifying a sample as positive.
Precision = TP/ (TP+FP)
Recall : The recall is calculated as the ratio between the numbers of Positive samples correctly classified as Positive to the total number of Positive samples. The recall measures the model’s ability to detect Positive samples. The higher the recall, the more positive samples detected.
Recall = TP / (TP+FN)
F1-score : F1 Score is the weighted average of Precision and Recall.
Therefore, this score takes both false positives and false negatives into account.
F1-score = 2*(Recall * Precision) / (Recall + Precision)
Confusion Matrix: It is a performance measurement for machine learning classification problems where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

Result

As we had text data in our dataset so we had to preprocess the data and convert the text data into numeric data. After performing the preprocessing steps, we then split the training data and test data. We used 80% data as training data and 20% data as test data. After splitting the data, we then used ANN with a single neuron in the output layer. The accuracy and loss curve is shown below figure:

After calculating the accuracy and loss of the model, we got an accuracy of 100%. The confusion matrix and precision,recall and f1-score of ANN classifier is below:

Similarly we tried it for 2 neurons in the output layer using softmax activation function, to do this we had to use the same dataset and again preprocess it. Split the output column using one-hot encoding. One column for output 0 and another for output 1. After splitting the data into training and test and fit the model this is the accuracy and loss curve that we got:

Analysis

When the proposed method for recognizing Stroke patients is compared
with the previous studies available in this field, its novelty becomes apparent . Our
model got 100 percent accuracy when we used a single neuron in the output layer
using sigmoid activation function which is greater than most of the work done in
this field before.

Discussion

Now, as we progressed from data collection through preprocessing to feature selection to model construction and application, the author approach was applied in the algorithm proposed and the steps included to get the results. We used 100 epochs for training the dataset on the model, if the epochs are less, then it means the model has not been trained well, and we need to train the model properly to get the best accuracy than the other algorithms. For 100 epoch, we are getting an accuracy of 100 percent which means it predicts stroke 100 out of 100 times, which is a great boost to the health business.

Conclusion

The Brain Stroke Prediction Using ANN Model prediction decision support system supports and guides clinicians in making the finest, most accurate, and quickest judgments possible while also minimizing total treatment costs. The proposed strategy reduces treatment costs and improves quality of life by predicting strokes at an early stage. We were able to achieve an accuracy of 100 percent on a particular dataset by employing Artificial Neural Networks, which is a fantastic result in terms of science and creativity and will help us have fewer patients die from strokes and necessary help would be provided at an early stage.