<p>So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values <code>M</code> and <code>F</code>. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.</p>
<p>The data is already quite clean, but we still need to transform some variables. The <code>bacteria</code> column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The <code><ahref="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> function of the <code>dplyr</code> package makes this really easy:</p>
<aclass="sourceLine"id="cb14-22"title="22"><spanclass="co">#> Table 1: Intrinsic resistance in Enterobacteriaceae (1293 changes)</span></a>
<aclass="sourceLine"id="cb14-22"title="22"><spanclass="co">#> Table 1: Intrinsic resistance in Enterobacteriaceae (1241 changes)</span></a>
<aclass="sourceLine"id="cb14-23"title="23"><spanclass="co">#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)</span></a>
<aclass="sourceLine"id="cb14-24"title="24"><spanclass="co">#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)</span></a>
<aclass="sourceLine"id="cb14-25"title="25"><spanclass="co">#> Table 4: Intrinsic resistance in Gram-positive bacteria (2812 changes)</span></a>
<aclass="sourceLine"id="cb14-25"title="25"><spanclass="co">#> Table 4: Intrinsic resistance in Gram-positive bacteria (2713 changes)</span></a>
<aclass="sourceLine"id="cb14-26"title="26"><spanclass="co">#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)</span></a>
<aclass="sourceLine"id="cb14-27"title="27"><spanclass="co">#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)</span></a>
<aclass="sourceLine"id="cb14-28"title="28"><spanclass="co">#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)</span></a>
@ -455,9 +462,9 @@
<aclass="sourceLine"id="cb14-38"title="38"><spanclass="co">#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)</span></a>
<aclass="sourceLine"id="cb14-39"title="39"><spanclass="co">#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)</span></a>
<aclass="sourceLine"id="cb14-41"title="41"><spanclass="co">#> => EUCAST rules affected 7,447 out of 20,000 rows</span></a>
<aclass="sourceLine"id="cb14-41"title="41"><spanclass="co">#> => EUCAST rules affected 7,301 out of 20,000 rows</span></a>
<aclass="sourceLine"id="cb14-42"title="42"><spanclass="co">#> -> added 0 test results</span></a>
<aclass="sourceLine"id="cb14-43"title="43"><spanclass="co">#> -> changed 4,105 test results (0 to S; 0 to I; 4,105 to R)</span></a></code></pre></div>
<aclass="sourceLine"id="cb14-43"title="43"><spanclass="co">#> -> changed 3,954 test results (0 to S; 0 to I; 3,954 to R)</span></a></code></pre></div>
<aclass="sourceLine"id="cb16-3"title="3"><spanclass="co">#></span><spanclass="al">NOTE</span><spanclass="co">: Using column `bacteria` as input for `col_mo`.</span></a>
<aclass="sourceLine"id="cb16-4"title="4"><spanclass="co">#></span><spanclass="al">NOTE</span><spanclass="co">: Using column `date` as input for `col_date`.</span></a>
<aclass="sourceLine"id="cb16-5"title="5"><spanclass="co">#></span><spanclass="al">NOTE</span><spanclass="co">: Using column `patient_id` as input for `col_patient_id`.</span></a>
<aclass="sourceLine"id="cb16-6"title="6"><spanclass="co">#> => Found 5,692 first isolates (28.5% of total)</span></a></code></pre></div>
<p>So only 28.5% is suitable for resistance analysis! We can now filter on it with the <code><ahref="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> function, also from the <code>dplyr</code> package:</p>
<aclass="sourceLine"id="cb16-6"title="6"><spanclass="co">#> => Found 5,669 first isolates (28.3% of total)</span></a></code></pre></div>
<p>So only 28.3% is suitable for resistance analysis! We can now filter on it with the <code><ahref="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> function, also from the <code>dplyr</code> package:</p>
<p>For future use, the above two syntaxes can be shortened with the <code><ahref="../reference/first_isolate.html">filter_first_isolate()</a></code> function:</p>
@ -509,10 +516,10 @@
<tbody>
<trclass="odd">
<tdalign="center">1</td>
<tdalign="center">2010-06-20</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-01-04</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -520,10 +527,10 @@
</tr>
<trclass="even">
<tdalign="center">2</td>
<tdalign="center">2010-07-31</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-01-27</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
@ -531,8 +538,8 @@
</tr>
<trclass="odd">
<tdalign="center">3</td>
<tdalign="center">2010-08-26</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-05-07</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
@ -542,8 +549,8 @@
</tr>
<trclass="even">
<tdalign="center">4</td>
<tdalign="center">2010-12-11</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-06-09</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -553,8 +560,8 @@
</tr>
<trclass="odd">
<tdalign="center">5</td>
<tdalign="center">2010-12-30</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-07-23</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -564,30 +571,30 @@
</tr>
<trclass="even">
<tdalign="center">6</td>
<tdalign="center">2011-04-02</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-09-29</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">I</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
</tr>
<trclass="odd">
<tdalign="center">7</td>
<tdalign="center">2011-04-06</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-14</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
</tr>
<trclass="even">
<tdalign="center">8</td>
<tdalign="center">2011-04-07</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-15</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
@ -597,29 +604,29 @@
</tr>
<trclass="odd">
<tdalign="center">9</td>
<tdalign="center">2011-05-28</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-16</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
</tr>
<trclass="even">
<tdalign="center">10</td>
<tdalign="center">2011-09-09</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-11-16</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">TRUE</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
</tr>
</tbody>
</table>
<p>Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The <code><ahref="../reference/key_antibiotics.html">key_antibiotics()</a></code> function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.</p>
<p>Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The <code><ahref="../reference/key_antibiotics.html">key_antibiotics()</a></code> function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.</p>
<p>If a column exists with a name like ‘key(…)ab’ the <code><ahref="../reference/first_isolate.html">first_isolate()</a></code> function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:</p>
<aclass="sourceLine"id="cb19-7"title="7"><spanclass="co">#></span><spanclass="al">NOTE</span><spanclass="co">: Using column `patient_id` as input for `col_patient_id`.</span></a>
<aclass="sourceLine"id="cb19-8"title="8"><spanclass="co">#></span><spanclass="al">NOTE</span><spanclass="co">: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.</span></a>
<aclass="sourceLine"id="cb19-9"title="9"><spanclass="co">#> [Criterion] Inclusion based on key antibiotics, ignoring I.</span></a>
<aclass="sourceLine"id="cb19-10"title="10"><spanclass="co">#> => Found 15,866 first weighted isolates (79.3% of total)</span></a></code></pre></div>
<aclass="sourceLine"id="cb19-10"title="10"><spanclass="co">#> => Found 15,859 first weighted isolates (79.3% of total)</span></a></code></pre></div>
<tableclass="table">
<thead><trclass="header">
<thalign="center">isolate</th>
@ -647,10 +654,10 @@
<tbody>
<trclass="odd">
<tdalign="center">1</td>
<tdalign="center">2010-06-20</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-01-04</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -659,10 +666,10 @@
</tr>
<trclass="even">
<tdalign="center">2</td>
<tdalign="center">2010-07-31</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-01-27</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
@ -671,20 +678,20 @@
</tr>
<trclass="odd">
<tdalign="center">3</td>
<tdalign="center">2010-08-26</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-05-07</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
<tdalign="center">FALSE</td>
<tdalign="center">TRUE</td>
</tr>
<trclass="even">
<tdalign="center">4</td>
<tdalign="center">2010-12-11</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-06-09</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -695,8 +702,8 @@
</tr>
<trclass="odd">
<tdalign="center">5</td>
<tdalign="center">2010-12-30</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-07-23</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
@ -707,32 +714,32 @@
</tr>
<trclass="even">
<tdalign="center">6</td>
<tdalign="center">2011-04-02</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-09-29</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">I</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
<tdalign="center">TRUE</td>
</tr>
<trclass="odd">
<tdalign="center">7</td>
<tdalign="center">2011-04-06</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-14</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
<tdalign="center">TRUE</td>
</tr>
<trclass="even">
<tdalign="center">8</td>
<tdalign="center">2011-04-07</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-15</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
@ -743,11 +750,11 @@
</tr>
<trclass="odd">
<tdalign="center">9</td>
<tdalign="center">2011-05-28</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-10-16</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
@ -755,23 +762,23 @@
</tr>
<trclass="even">
<tdalign="center">10</td>
<tdalign="center">2011-09-09</td>
<tdalign="center">Y4</td>
<tdalign="center">2010-11-16</td>
<tdalign="center">M1</td>
<tdalign="center">B_ESCHR_COL</td>
<tdalign="center">S</td>
<tdalign="center">S</td>
<tdalign="center">R</td>
<tdalign="center">S</td>
<tdalign="center">TRUE</td>
<tdalign="center">S</td>
<tdalign="center">FALSE</td>
<tdalign="center">TRUE</td>
</tr>
</tbody>
</table>
<p>Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.</p>
<p>Instead of 1, now 9 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 51% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.</p>
<p>As with <code><ahref="../reference/first_isolate.html">filter_first_isolate()</a></code>, there’s a shortcut for this new algorithm too:</p>
<p>The functions <code>portion_R</code>, <code>portion_RI</code>, <code>portion_I</code>, <code>portion_IS</code> and <code>portion_S</code> can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:</p>
<p>Or can be used in conjuction with <code><ahref="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> and <code><ahref="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code>, both from the <code>dplyr</code> package:</p>
<scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js"integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8="crossorigin="anonymous"></script><!-- Font Awesome icons --><linkrel="stylesheet"href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css"integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0="crossorigin="anonymous">
<ahref="#spss-sas-stata"class="anchor"></a>SPSS / SAS / Stata</h2>
<p>SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendlyness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are statistical packages popular in big industries.</p>
</div>
<divid="compared-to-r"class="section level2">
<h2class="hasAnchor">
<ahref="#compared-to-r"class="anchor"></a>Compared to R</h2>
<p>As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:</p>
<ul>
<li>
<p><strong>R is highly modular.</strong></p>
<p>The <ahref="https://cran.r-project.org/web/packages/">official R network (CRAN)</a> features almost 14,000 packages at the time of writing, our <code>AMR</code> package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitLab or GitHub. So there may even be a lot more than 14,000 packages out there.</p>
<p>Bottomline is, you can really extend it yourself or ask somebody to do this for you. Take for example our <code>AMR</code> package. SPSS, SAS and Stata will never know what a valid MIC value is (so data might not be clean) or what the Gram stain of <em>E. coli</em> is. Or the fact that all species of <em>Klebiella</em> are resistant to amoxicillin.</p>
</li>
<li>
<p><strong>R is extremely flexible.</strong></p>
<p>Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, gathering, grouping, summarising and drawing plots is endless - with SPSS, SAS or Stata you are bound to their algorithms and styles. It may be a bit flexible, but you can never create that very specific publication-ready plot without using other (paid) software.</p>
</li>
<li>
<p><strong>R can be easily automated.</strong></p>
<p>Over the last years, <ahref="https://rmarkdown.rstudio.com/">R Markdown</a> has really made an interesting development. With R Markdown, you can very easily reproduce your reports, whether it’s to Word, Powerpoint, a website, a PDF document or just the raw data to Excel. I use this a lot to generate monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.</p>
<p>For an even more professional environment, you could create <ahref="https://shiny.rstudio.com/">Shiny apps</a>: live manipulation of data using a custom made website. The webdesign knowledge needed (Javascript, CSS, HTML) is almost <em>zero</em>.</p>
</li>
<li>
<p><strong>R has a huge community.</strong></p>
<p>Many R users just ask questions on website like <ahref="https://stackoverflow.com">stackoverflow.com</a>, the largest online community for programmers. At the time of writing, around <ahref="https://stackoverflow.com/questions/tagged/r?sort=votes">275,000 R questions</a> have been asked on this platform (which covers questions and answer for any programming language). In my own experience, most questions are answered within a couple of minutes.</p>
</li>
<li>
<p><strong>R understands any data type, including SPSS/SAS/Stata.</strong></p>
<p>And that’s not vice versa I’m afraid. You can import data from any source into R. As said, from SPSS/SAS/Stata (<ahref="https://haven.tidyverse.org/">link</a>), but also from Excel (<ahref="https://readxl.tidyverse.org/">link</a>), from flat files like CSV, TXT or TSV (<ahref="https://readr.tidyverse.org/">link</a>), or directly from databases or datawarehouses from anywhere on the world (<ahref="https://dbplyr.tidyverse.org/">link</a>). You can even scrape websites to download tables that are live on the internet (<ahref="https://github.com/hadley/rvest">link</a>).</p>
<p>And the best part - you can export from R to all data formats as well. So you can import an SPSS file, do your analysis neatly in R and export back to SPSS. Although you might omit that very last step.</p>
</li>
<li>
<p><strong>R is completely free and open-source.</strong></p>
<p>No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying <ahref="https://www.ibm.com/products/spss-statistics/pricing">between USD 1,300 and USD 8,500</a> per computer <em>per year</em>. SAS Analytics Pro costs <ahref="https://www.sas.com/store/products-solutions/sas-analytics-pro/prodPERSANL.html">around USD 10,000</a> per computer. Stata also has a business model with subscription fees, varying <ahref="https://www.stata.com/order/new/bus/single-user-licenses/dl/">between USD 600 and USD 1,200</a> per computer per year, but lower prices come with a limitation of the number of variables you can work with.</p>
<p>If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.</p>
</li>
</ul>
<p>If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.</p>
<ahref="#import-data-from-spsssasstata"class="anchor"></a>Import data from SPSS/SAS/Stata</h2>
<divid="rstudio"class="section level3">
<h3class="hasAnchor">
<ahref="#rstudio"class="anchor"></a>RStudio</h3>
<p>To work with R, probably the best option is to use <ahref="https://www.rstudio.com/products/rstudio/">RStudio</a>. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menu to work with other data sources. You can also run <ahref="https://www.rstudio.com/products/rstudio/">RStudio Server</a>, which is nothing less than the complete RStudio software available as a website (e.g.in your corporate network or at home).</p>
<p>To import a data file, just click <em>Import Dataset</em> in the Environment tab:</p>
<p><imgsrc="../import1.png"></p>
<p>If additional packages are needed, RStudio will ask you if they should be installed on beforehand.</p>
<p>In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:</p>
<p><imgsrc="../import2.png"></p>
<p>If you want named variables to be imported as factors so it resembles SPSS more, use <code><ahref="https://haven.tidyverse.org/reference/as_factor.html">as_factor()</a></code>.</p>