In this folder you can find the scripts I used for my Master Graduation Project.
In this project I tried to do a supervised classification of older cancer patients developing a postoperative complication using a broad set of variables. Partial Least Square Discriminant Analysis (PLS-DA) was used to classify on this diverse and non-balanced set of data. The types of data ranged from activity data extracted from Fitbits, to medical data gathered from their hospital-files.
The specific format of the used dataset was based on the data exports from the projects' Redcap page. For more information on this please contact Maarten Lahr (Department of Epidemiology, UMCG) or Barbara van Leeuwen (Department of Surgery, UMCG).
Before running the scripts the following datafiles should be stored within one folder:
- Surgery and admission + Complications (both Redcap instruments combined in one .csv)
- Data SACM
- Baseline Assessment (T1=b)
- Completion Data
The following files are stored on the Surgery hard drive in the digital UMCG environment.
- 8x combined PA files
- Patient uuid with email and patientnumber.csv (Eurecat, raw .csv)
- PhysicalActivities_umcg.csv (Eurecat, raw .csv)
After downloading all data, the scripts should be run in the following order:
DemoParser.py - Script to transform Redcap-output to usable dataframe with the patients' demographics
DateParser.py - Uses the input of several Redcap exports to create a dataframe with all important dates and number of days between events.
FinalDF_Parser.py - Script to create dataframe with scores from all test-moments
EurecatParser.py - Script to create physical activity data
FinalCombiner.py - Script that combines all datasets into one final dataframe
PLS.py - Script performing the PLS-DA and plotting R2-Q2, Wold’s R, PRESS and ROC-AUC plots
ScatterBoxplotter.py – Script to plot scatter-boxplots from final dataset