Scripts to create a dataset from Redcap outputs to use for a PLS-DA classification.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Dijkhof 12a8d9bf57 README.md 5 months ago
DatesParser.py the rest of the scripts 5 months ago
DemoParser.py the rest of the scripts 5 months ago
FinalDF_Parser.py the rest of the scripts 5 months ago
PAParser.py the rest of the scripts 5 months ago
PLS.py test 5 months ago
README.md README.md 5 months ago
ScatterBoxplotter.py the rest of the scripts 5 months ago

README.md

In this folder you can find the scripts I used for my Master Graduation Project.

In this project I tried to do a supervised classification of older cancer patients developing a postoperative complication using a broad set of variables. Partial Least Square Discriminant Analysis (PLS-DA) was used to classify on this diverse and non-balanced set of data. The types of data ranged from activity data extracted from Fitbits, to medical data gathered from their hospital-files.

The specific format of the used dataset was based on the data exports from the projects' Redcap page. For more information on this please contact Maarten Lahr (Department of Epidemiology, UMCG) or Barbara van Leeuwen (Department of Surgery, UMCG).

Before running the scripts the following datafiles should be stored within one folder:

  • Demographics
  • Surgery and admission + Complications (both Redcap instruments combined in one .csv)
  • Data SACM
  • Baseline Assessment (T1=b)
  • Completion Data

The following files are stored on the Surgery hard drive in the digital UMCG environment.

  • 8x combined PA files
  • Patient uuid with email and patientnumber.csv (Eurecat, raw .csv)
  • PhysicalActivities_umcg.csv (Eurecat, raw .csv)

After downloading all data, the scripts should be run in the following order:

  1. DemoParser.py - Script to transform Redcap-output to usable dataframe with the patients' demographics

  2. DateParser.py - Uses the input of several Redcap exports to create a dataframe with all important dates and number of days between events.

  3. FinalDF_Parser.py - Script to create dataframe with scores from all test-moments

  4. EurecatParser.py - Script to create physical activity data

  5. FinalCombiner.py - Script that combines all datasets into one final dataframe

  6. PLS.py - Script performing the PLS-DA and plotting R2-Q2, Wold’s R, PRESS and ROC-AUC plots

  7. ScatterBoxplotter.py – Script to plot scatter-boxplots from final dataset