Binary classification: Heart Disease

  • Technologies used: Scikit-Learn, Python, numpy, pandas, Matplotlib, Seaborn, XGBoost
  • Github URL: Project Link

For this project, a binary classification model is trained on the Heart Disease dataset in order to predict the likelihood of a heart disease using some key indicators.

4 models were trained: a Basic Logistic Regression Model, a Decision Tree, a Random Forest and a Gradient Boosting model. Out of the 4, Gradient Boosting was the model with the better performance (using the XGBoost library). The trained model is provided in the file xgb_model_eta=0.1_max_depth=6_min_child_weight=1.bin, which can be loaded with pickle.

The exploratory data analysis and model selection was done with the help of a Jupyter Notebook, notebook.ipynb.

The model training script was exported to train.py.

A Flask app was created in predict.py, which can be deployed with any WSGI server. This project has been developed and tested with Gunicorn.