CountOxide17596
Background  Stroke is a disease that affects the arteries leading…

Background 

Stroke is a disease that affects the arteries leading to and within the brain. It occurs when something blocks the blood supply to part of the brain or when a blood vessel in the brain bursts. In either case, parts of the brain become damaged or die. A stroke can cause lasting brain damage, long-term disability, or even death (refer to https://www.cdc.gov/stroke/about.htm). 

Mindful Healthcare or “MindCare” is one of the main medical institutions in Singapore providing treatment for patients suffering from a stroke. Like most healthcare providers, one of the primary objectives of MindCare is to improve the quality of life of patients and reduce healthcare costs through more effective diagnoses and treatments. Hence, MindCare has embarked on the use of data mining to predict whether patients would suffer from stroke based on their demographic profile, lifestyle, and medical conditions, within 3 years of admission into MindCare due to stroke-related symptoms. 

 

 

A dataset containing 1582 patient records (with 273 patients or 17.3% suffering from stroke within 3 years) is extracted from the database for analysis. The description of the dataset is given in Table 1. 

 

Table 1. Description of MindCare’s Dataset Variable Name   Variable Description  Possible Values 
PatID  Patient Identification Number  0001 to 1582 
Gender  Gender 

Female 

Male 

Age  Age at the point of admission.  18 to 82 
MaritalStatus  Marital status of the patient at the point of admission. 

Single 

Married 

WorkStatus  Whether the patient is working at the point of admission. 

No 

Yes 

Smoke  Whether the patient is a smoker at the point of admission. 

Formerly Smoked 

Never Smoked 

Smoke 

Unknown 

BMI  The body mass index (BMI) of the patient at the point of admission.  17 to 65 
AvgGlucose  The average glucose level in the patient’s blood at the point of admission.  55 to 272 
Hypertension  Whether the patient suffers from hypertension at the point of admission. 

No 

Yes 

HeartDisease  Whether the patient suffers from heart disease at the point of admission. 

No 

Yes 

Stroke  Whether the patient suffered from a stroke within 3 years of admission. 

Yes 

No 

 

For predictive modelling, three decision trees were constructed using CART, C5.0 and CHAID. The settings for the data partitioning and decision trees are as follows:

 

Data Partition 

Training the model: 80% 

Testing the model: 20% 

No changes to the other settings 

CHAID 

Minimum records in parent branch (%): 5% 

Minimum records in child branch (%): 2% 

No changes to the other settings 

C5.0 

Minimum records per child branch: 25 

No changes to the other settings 

CART 

Minimum records in parent branch (%): 5% 

Minimum records in child branch (%): 2% 

Overfit prevention set (%): 0% 

No changes to the other settings 

Answer the following questions based on the data mining stream and outputs provided in the Appendix A: 

1(a) Calculate the overall accuracy rate, accuracy rates and the hit rates for patients who suffered from stroke, and accuracy rates and hit rates for those who did not. 

 

1(b) Which model would you select for deployment? Discuss all the information that you have considered in making your decision. 

 

1(c) Based on the selected decision tree in (b), discuss the rules that are associated with patients who suffered from stroke. 

 

1(d) Based on the rules discussed in (c), appraise the deployment of the model by suggesting how these rules can be deployed to reduce the risk of patients suffering from stroke. 

 

1(e) Discuss one potential issue with binning “Age” into 3 bins of “Youth”, “Adult” and “Senior” for modelling. 

 

 

1(f) With reference the variables listed in Table 1, discuss one (1) possible limitation of the data and provide an example. 

 

 

(Note: Appendix A is for assessment purposes only and does not comprise medical advice or actual treatments.

 

 

 

Appendix A

Image transcription text

For Identification of For Deployment the Champion Model RT
ERT ERT Stroke Stroke Stroke [GR-Stroke $C-Stroke.. Stroke
Analysis 5.0 EXCELT Stroke MindCare_strokePatie.. P…
Show more

 

 

 

Image transcription text

MindCare_StrokePatient.xlsx X Preview # Refresh EXCEL
C:Users65981OneDriveUniSIMANL310Assignments2023MindC…
Show more

 

Table 2: Measurement and Role Settings

                               

 

Image transcription text

Analysis of [Stroke] 0 X File Edit xe TR-Stroke Analysis
Annotations IC-Stroke 8: Collapse All De Expand All $R1-Stroke
‘9 Results for output field Stroke P Individual Models 9…
Show more

Figure 3: Analysis Node Output and Lift Charts

(“Yes” represents patients who suffered from stroke and “No” patients who did not suffer from stroke)

 

 

Image transcription text

Stroke Node 0 Category % No 82.743 1309 Yes 17.257 273 Tota
100.000 1582 Age Adj. P-value=0.000, Chi-square=321.029,
df=4 70.000 (32.000, 45.000) (45.000, 57.000] (57.00…
Show more

Figure 4: Visualisation of CHAID (For Deployment)

 

 

 

 

 

Image transcription text

Stroke Node 0 Category % I NO 82.743 1309 Yes 17.257 273
Total 100.000 1582 HeartDisease No Node 1 Node Category %
Category No 39.815 43 85.889 1266 Yes 60.185 65 Y…
Show more

 

Figure 5: Visualisation of C5.0 (For Deployment)