Research conducted by Division of Endocrinology, Diabetes, and Metabolism, Ohio State University, suggests that hospital readmission is an important contributor to total medical expenditure and is an emerging indicator of quality of care. Many factors such as patient demographics, diagnositc procedures, medications, etc. influence patient readmission. The goal of this project is to analyze key factors that impact the readmission of a patient and build a classification model that predicts readmission of a patient based on the key factors. This work presented here has been limited to diabetic patient readmission.We started with the exploratory analysis of the various features mentioned in the dataset to detect interesting patterns.The statistical testing section highlights the results of various hypothesis testing we performed to test the signifance of features on the readmission outcome.Finally, we used machine learning to build various classification models on the data.The accuracy of the classifiers is outlined in the Machine Learning section.
Tools & Techniques Used:
R was primarily used for the data prepartion. Python was used to perform hypothesis test and Machine Learning was performed using Weka GUI.
For the visualizations, we used D3.JS and Google Charts API
The data contains over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria:
The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.
The original dataset can be found here.
This graph indicates the frequency with which patients are being readmitted. There's a downward trend in the frequency of being readmitted and we considered patients who were readmitted more than 5 times as outliers and excluded them from further analysis.
The original data contained all the patient admissions. For building Machine Learning models, it is required that each row be an independent instance. Since the same patient was readmitted multiple times, this assumption would be invalid. Hence, the data prepared for Machine Learning part contained only the first encounter of every patient.