SPSS tutorial

pull/67/head
parent efb2af9e3a
commit 9209787a53

@ -13,7 +13,7 @@ Authors@R: c(
given = c("Christian", "F."),
family = "Luz",
email = "c.f.luz@umcg.nl",
role = c("aut", "rev"),
role = "aut",
comment = c(ORCID = "0000-0001-5809-5995")),
person(
given = c("Erwin", "E.", "A."),

@ -41,6 +41,9 @@ navbar:
- text: 'Work with WHONET data'
icon: 'fa-globe-americas'
href: 'articles/WHONET.html'
- text: 'Import data from SPSS/SAS/Stata'
icon: 'fa-file-upload'
href: 'articles/SPSS.html'
- text: 'Apply EUCAST rules'
icon: 'fa-exchange-alt'
href: 'articles/EUCAST.html'

@ -121,6 +121,13 @@
Work with WHONET data
</a>
</li>
<li>
<a href="articles/SPSS.html">
<span class="fa fa-file-upload"></span>
Import data from SPSS/SAS/Stata
</a>
</li>
<li>
<a href="articles/EUCAST.html">
<span class="fa fa-exchange-alt"></span>

@ -83,6 +83,13 @@
Work with WHONET data
</a>
</li>
<li>
<a href="../articles/SPSS.html">
<span class="fa fa-file-upload"></span>
Import data from SPSS/SAS/Stata
</a>
</li>
<li>
<a href="../articles/EUCAST.html">
<span class="fa fa-exchange-alt"></span>
@ -320,67 +327,67 @@
</tr></thead>
<tbody>
<tr class="odd">
<td align="center">2016-08-15</td>
<td align="center">E9</td>
<td align="center">Hospital C</td>
<td align="center">2017-12-27</td>
<td align="center">C5</td>
<td align="center">Hospital A</td>
<td align="center">Staphylococcus aureus</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">M</td>
</tr>
<tr class="even">
<td align="center">2013-02-21</td>
<td align="center">I8</td>
<td align="center">2012-11-20</td>
<td align="center">Y5</td>
<td align="center">Hospital D</td>
<td align="center">Streptococcus pneumoniae</td>
<td align="center">Escherichia coli</td>
<td align="center">S</td>
<td align="center">I</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">M</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">F</td>
</tr>
<tr class="odd">
<td align="center">2015-05-12</td>
<td align="center">F4</td>
<td align="center">2011-06-25</td>
<td align="center">I4</td>
<td align="center">Hospital B</td>
<td align="center">Escherichia coli</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">M</td>
</tr>
<tr class="even">
<td align="center">2013-09-20</td>
<td align="center">M3</td>
<td align="center">2016-10-16</td>
<td align="center">N10</td>
<td align="center">Hospital A</td>
<td align="center">Escherichia coli</td>
<td align="center">S</td>
<td align="center">I</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">M</td>
<td align="center">S</td>
<td align="center">F</td>
</tr>
<tr class="odd">
<td align="center">2013-08-03</td>
<td align="center">V8</td>
<td align="center">Hospital C</td>
<td align="center">Streptococcus pneumoniae</td>
<td align="center">S</td>
<td align="center">2015-09-01</td>
<td align="center">L1</td>
<td align="center">Hospital D</td>
<td align="center">Escherichia coli</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">F</td>
<td align="center">M</td>
</tr>
<tr class="even">
<td align="center">2017-10-01</td>
<td align="center">V9</td>
<td align="center">Hospital C</td>
<td align="center">Staphylococcus aureus</td>
<td align="center">2013-12-14</td>
<td align="center">Z3</td>
<td align="center">Hospital A</td>
<td align="center">Klebsiella pneumoniae</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">I</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">F</td>
@ -404,8 +411,8 @@
#&gt;
#&gt; Item Count Percent Cum. Count Cum. Percent
#&gt; --- ----- ------- -------- ----------- -------------
#&gt; 1 M 10,444 52.2% 10,444 52.2%
#&gt; 2 F 9,556 47.8% 20,000 100.0%</code></pre>
#&gt; 1 M 10,432 52.2% 10,432 52.2%
#&gt; 2 F 9,568 47.8% 20,000 100.0%</code></pre>
<p>So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values <code>M</code> and <code>F</code>. From a researcher perspective: there are slightly more men. Nothing we didnt already know.</p>
<p>The data is already quite clean, but we still need to transform some variables. The <code>bacteria</code> column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> function of the <code>dplyr</code> package makes this really easy:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb12-1" title="1">data &lt;-<span class="st"> </span>data <span class="op">%&gt;%</span></a>
@ -436,10 +443,10 @@
<a class="sourceLine" id="cb14-19" title="19"><span class="co">#&gt; Kingella kingae (no changes)</span></a>
<a class="sourceLine" id="cb14-20" title="20"><span class="co">#&gt; </span></a>
<a class="sourceLine" id="cb14-21" title="21"><span class="co">#&gt; EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)</span></a>
<a class="sourceLine" id="cb14-22" title="22"><span class="co">#&gt; Table 1: Intrinsic resistance in Enterobacteriaceae (1293 changes)</span></a>
<a class="sourceLine" id="cb14-22" title="22"><span class="co">#&gt; Table 1: Intrinsic resistance in Enterobacteriaceae (1241 changes)</span></a>
<a class="sourceLine" id="cb14-23" title="23"><span class="co">#&gt; Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)</span></a>
<a class="sourceLine" id="cb14-24" title="24"><span class="co">#&gt; Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)</span></a>
<a class="sourceLine" id="cb14-25" title="25"><span class="co">#&gt; Table 4: Intrinsic resistance in Gram-positive bacteria (2812 changes)</span></a>
<a class="sourceLine" id="cb14-25" title="25"><span class="co">#&gt; Table 4: Intrinsic resistance in Gram-positive bacteria (2713 changes)</span></a>
<a class="sourceLine" id="cb14-26" title="26"><span class="co">#&gt; Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)</span></a>
<a class="sourceLine" id="cb14-27" title="27"><span class="co">#&gt; Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)</span></a>
<a class="sourceLine" id="cb14-28" title="28"><span class="co">#&gt; Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)</span></a>
@ -455,9 +462,9 @@
<a class="sourceLine" id="cb14-38" title="38"><span class="co">#&gt; Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)</span></a>
<a class="sourceLine" id="cb14-39" title="39"><span class="co">#&gt; Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)</span></a>
<a class="sourceLine" id="cb14-40" title="40"><span class="co">#&gt; </span></a>
<a class="sourceLine" id="cb14-41" title="41"><span class="co">#&gt; =&gt; EUCAST rules affected 7,447 out of 20,000 rows</span></a>
<a class="sourceLine" id="cb14-41" title="41"><span class="co">#&gt; =&gt; EUCAST rules affected 7,301 out of 20,000 rows</span></a>
<a class="sourceLine" id="cb14-42" title="42"><span class="co">#&gt; -&gt; added 0 test results</span></a>
<a class="sourceLine" id="cb14-43" title="43"><span class="co">#&gt; -&gt; changed 4,105 test results (0 to S; 0 to I; 4,105 to R)</span></a></code></pre></div>
<a class="sourceLine" id="cb14-43" title="43"><span class="co">#&gt; -&gt; changed 3,954 test results (0 to S; 0 to I; 3,954 to R)</span></a></code></pre></div>
</div>
<div id="adding-new-variables" class="section level1">
<h1 class="hasAnchor">
@ -482,8 +489,8 @@
<a class="sourceLine" id="cb16-3" title="3"><span class="co">#&gt; </span><span class="al">NOTE</span><span class="co">: Using column `bacteria` as input for `col_mo`.</span></a>
<a class="sourceLine" id="cb16-4" title="4"><span class="co">#&gt; </span><span class="al">NOTE</span><span class="co">: Using column `date` as input for `col_date`.</span></a>
<a class="sourceLine" id="cb16-5" title="5"><span class="co">#&gt; </span><span class="al">NOTE</span><span class="co">: Using column `patient_id` as input for `col_patient_id`.</span></a>
<a class="sourceLine" id="cb16-6" title="6"><span class="co">#&gt; =&gt; Found 5,692 first isolates (28.5% of total)</span></a></code></pre></div>
<p>So only 28.5% is suitable for resistance analysis! We can now filter on it with the <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> function, also from the <code>dplyr</code> package:</p>
<a class="sourceLine" id="cb16-6" title="6"><span class="co">#&gt; =&gt; Found 5,669 first isolates (28.3% of total)</span></a></code></pre></div>
<p>So only 28.3% is suitable for resistance analysis! We can now filter on it with the <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> function, also from the <code>dplyr</code> package:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb17-1" title="1">data_1st &lt;-<span class="st"> </span>data <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb17-2" title="2"><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span>(first <span class="op">==</span><span class="st"> </span><span class="ot">TRUE</span>)</a></code></pre></div>
<p>For future use, the above two syntaxes can be shortened with the <code><a href="../reference/first_isolate.html">filter_first_isolate()</a></code> function:</p>
@ -509,10 +516,10 @@
<tbody>
<tr class="odd">
<td align="center">1</td>
<td align="center">2010-06-20</td>
<td align="center">Y4</td>
<td align="center">2010-01-04</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
@ -520,10 +527,10 @@
</tr>
<tr class="even">
<td align="center">2</td>
<td align="center">2010-07-31</td>
<td align="center">Y4</td>
<td align="center">2010-01-27</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
@ -531,8 +538,8 @@
</tr>
<tr class="odd">
<td align="center">3</td>
<td align="center">2010-08-26</td>
<td align="center">Y4</td>
<td align="center">2010-05-07</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
@ -542,8 +549,8 @@
</tr>
<tr class="even">
<td align="center">4</td>
<td align="center">2010-12-11</td>
<td align="center">Y4</td>
<td align="center">2010-06-09</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
@ -553,8 +560,8 @@
</tr>
<tr class="odd">
<td align="center">5</td>
<td align="center">2010-12-30</td>
<td align="center">Y4</td>
<td align="center">2010-07-23</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
@ -564,30 +571,30 @@
</tr>
<tr class="even">
<td align="center">6</td>
<td align="center">2011-04-02</td>
<td align="center">Y4</td>
<td align="center">2010-09-29</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">I</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">FALSE</td>
</tr>
<tr class="odd">
<td align="center">7</td>
<td align="center">2011-04-06</td>
<td align="center">Y4</td>
<td align="center">2010-10-14</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">FALSE</td>
</tr>
<tr class="even">
<td align="center">8</td>
<td align="center">2011-04-07</td>
<td align="center">Y4</td>
<td align="center">2010-10-15</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
@ -597,29 +604,29 @@
</tr>
<tr class="odd">
<td align="center">9</td>
<td align="center">2011-05-28</td>
<td align="center">Y4</td>
<td align="center">2010-10-16</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">FALSE</td>
</tr>
<tr class="even">
<td align="center">10</td>
<td align="center">2011-09-09</td>
<td align="center">Y4</td>
<td align="center">2010-11-16</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">TRUE</td>
<td align="center">S</td>
<td align="center">FALSE</td>
</tr>
</tbody>
</table>
<p>Only 2 isolates are marked as first according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The <code><a href="../reference/key_antibiotics.html">key_antibiotics()</a></code> function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.</p>
<p>Only 1 isolates are marked as first according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The <code><a href="../reference/key_antibiotics.html">key_antibiotics()</a></code> function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.</p>
<p>If a column exists with a name like key(…)ab the <code><a href="../reference/first_isolate.html">first_isolate()</a></code> function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb19-1" title="1">data &lt;-<span class="st"> </span>data <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb19-2" title="2"><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span>(<span class="dt">keyab =</span> <span class="kw"><a href="../reference/key_antibiotics.html">key_antibiotics</a></span>(.)) <span class="op">%&gt;%</span><span class="st"> </span></a>
@ -630,7 +637,7 @@
<a class="sourceLine" id="cb19-7" title="7"><span class="co">#&gt; </span><span class="al">NOTE</span><span class="co">: Using column `patient_id` as input for `col_patient_id`.</span></a>
<a class="sourceLine" id="cb19-8" title="8"><span class="co">#&gt; </span><span class="al">NOTE</span><span class="co">: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.</span></a>
<a class="sourceLine" id="cb19-9" title="9"><span class="co">#&gt; [Criterion] Inclusion based on key antibiotics, ignoring I.</span></a>
<a class="sourceLine" id="cb19-10" title="10"><span class="co">#&gt; =&gt; Found 15,866 first weighted isolates (79.3% of total)</span></a></code></pre></div>
<a class="sourceLine" id="cb19-10" title="10"><span class="co">#&gt; =&gt; Found 15,859 first weighted isolates (79.3% of total)</span></a></code></pre></div>
<table class="table">
<thead><tr class="header">
<th align="center">isolate</th>
@ -647,10 +654,10 @@
<tbody>
<tr class="odd">
<td align="center">1</td>
<td align="center">2010-06-20</td>
<td align="center">Y4</td>
<td align="center">2010-01-04</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
@ -659,10 +666,10 @@
</tr>
<tr class="even">
<td align="center">2</td>
<td align="center">2010-07-31</td>
<td align="center">Y4</td>
<td align="center">2010-01-27</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
@ -671,20 +678,20 @@
</tr>
<tr class="odd">
<td align="center">3</td>
<td align="center">2010-08-26</td>
<td align="center">Y4</td>
<td align="center">2010-05-07</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">FALSE</td>
<td align="center">FALSE</td>
<td align="center">TRUE</td>
</tr>
<tr class="even">
<td align="center">4</td>
<td align="center">2010-12-11</td>
<td align="center">Y4</td>
<td align="center">2010-06-09</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
@ -695,8 +702,8 @@
</tr>
<tr class="odd">
<td align="center">5</td>
<td align="center">2010-12-30</td>
<td align="center">Y4</td>
<td align="center">2010-07-23</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
@ -707,32 +714,32 @@
</tr>
<tr class="even">
<td align="center">6</td>
<td align="center">2011-04-02</td>
<td align="center">Y4</td>
<td align="center">2010-09-29</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">I</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">FALSE</td>
<td align="center">TRUE</td>
</tr>
<tr class="odd">
<td align="center">7</td>
<td align="center">2011-04-06</td>
<td align="center">Y4</td>
<td align="center">2010-10-14</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">FALSE</td>
<td align="center">TRUE</td>
</tr>
<tr class="even">
<td align="center">8</td>
<td align="center">2011-04-07</td>
<td align="center">Y4</td>
<td align="center">2010-10-15</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
@ -743,11 +750,11 @@
</tr>
<tr class="odd">
<td align="center">9</td>
<td align="center">2011-05-28</td>
<td align="center">Y4</td>
<td align="center">2010-10-16</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">FALSE</td>
@ -755,23 +762,23 @@
</tr>
<tr class="even">
<td align="center">10</td>
<td align="center">2011-09-09</td>
<td align="center">Y4</td>
<td align="center">2010-11-16</td>
<td align="center">M1</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">TRUE</td>
<td align="center">S</td>
<td align="center">FALSE</td>
<td align="center">TRUE</td>
</tr>
</tbody>
</table>
<p>Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked first weighted - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.</p>
<p>Instead of 1, now 9 isolates are flagged. In total, 79.3% of all isolates are marked first weighted - 51% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.</p>
<p>As with <code><a href="../reference/first_isolate.html">filter_first_isolate()</a></code>, theres a shortcut for this new algorithm too:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb20-1" title="1">data_1st &lt;-<span class="st"> </span>data <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb20-2" title="2"><span class="st"> </span><span class="kw"><a href="../reference/first_isolate.html">filter_first_weighted_isolate</a></span>()</a></code></pre></div>
<p>So we end up with 15,866 isolates for analysis.</p>
<p>So we end up with 15,859 isolates for analysis.</p>
<p>We can remove unneeded columns:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb21-1" title="1">data_1st &lt;-<span class="st"> </span>data_1st <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb21-2" title="2"><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span>(<span class="op">-</span><span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(first, keyab))</a></code></pre></div>
@ -796,44 +803,28 @@
</tr></thead>
<tbody>
<tr class="odd">
<td>1</td>
<td align="center">2016-08-15</td>
<td align="center">E9</td>
<td align="center">Hospital C</td>
<td align="center">B_STPHY_AUR</td>
<td>4</td>
<td align="center">2016-10-16</td>
<td align="center">N10</td>
<td align="center">Hospital A</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">M</td>
<td align="center">Gram positive</td>
<td align="center">Staphylococcus</td>
<td align="center">aureus</td>
<td align="center">S</td>
<td align="center">F</td>
<td align="center">Gram negative</td>
<td align="center">Escherichia</td>
<td align="center">coli</td>
<td align="center">TRUE</td>
</tr>
<tr class="even">
<td>2</td>
<td align="center">2013-02-21</td>
<td align="center">I8</td>
<td>5</td>
<td align="center">2015-09-01</td>
<td align="center">L1</td>
<td align="center">Hospital D</td>
<td align="center">B_STRPTC_PNE</td>
<td align="center">S</td>
<td align="center">I</td>
<td align="center">R</td>
<td align="center">R</td>
<td align="center">M</td>
<td align="center">Gram positive</td>
<td align="center">Streptococcus</td>
<td align="center">pneumoniae</td>
<td align="center">TRUE</td>
</tr>
<tr class="odd">
<td>3</td>
<td align="center">2015-05-12</td>
<td align="center">F4</td>
<td align="center">Hospital B</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
@ -843,30 +834,30 @@
<td align="center">coli</td>
<td align="center">TRUE</td>
</tr>
<tr class="even">
<td>5</td>
<td align="center">2013-08-03</td>
<td align="center">V8</td>
<td align="center">Hospital C</td>
<td align="center">B_STRPTC_PNE</td>
<tr class="odd">
<td>6</td>
<td align="center">2013-12-14</td>
<td align="center">Z3</td>
<td align="center">Hospital A</td>
<td align="center">B_KLBSL_PNE</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">F</td>
<td align="center">Gram positive</td>
<td align="center">Streptococcus</td>
<td align="center">Gram negative</td>
<td align="center">Klebsiella</td>
<td align="center">pneumoniae</td>
<td align="center">TRUE</td>
</tr>
<tr class="odd">
<td>6</td>
<td align="center">2017-10-01</td>
<td align="center">V9</td>
<td align="center">Hospital C</td>
<tr class="even">
<td>7</td>
<td align="center">2011-09-19</td>
<td align="center">U9</td>
<td align="center">Hospital A</td>
<td align="center">B_STPHY_AUR</td>
<td align="center">S</td>
<td align="center">I</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">F</td>
@ -875,15 +866,31 @@
<td align="center">aureus</td>
<td align="center">TRUE</td>
</tr>
<tr class="odd">
<td>8</td>
<td align="center">2016-02-23</td>
<td align="center">H1</td>
<td align="center">Hospital A</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">M</td>
<td align="center">Gram negative</td>
<td align="center">Escherichia</td>
<td align="center">coli</td>
<td align="center">TRUE</td>
</tr>
<tr class="even">
<td>7</td>
<td align="center">2014-06-20</td>
<td align="center">E9</td>
<td>9</td>
<td align="center">2016-08-02</td>
<td align="center">I4</td>
<td align="center">Hospital B</td>
<td align="center">B_ESCHR_COL</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">S</td>
<td align="center">R</td>
<td align="center">S</td>
<td align="center">M</td>
<td align="center">Gram negative</td>
@ -908,9 +915,9 @@
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb23-1" title="1"><span class="kw"><a href="../reference/freq.html">freq</a></span>(<span class="kw"><a href="https://www.rdocumentation.org/packages/base/topics/paste">paste</a></span>(data_1st<span class="op">$</span>genus, data_1st<span class="op">$</span>species))</a></code></pre></div>
<p>Or can be used like the <code>dplyr</code> way, which is easier readable:</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb24-1" title="1">data_1st <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="../reference/freq.html">freq</a></span>(genus, species)</a></code></pre></div>
<p><strong>Frequency table of <code>genus</code> and <code>species</code> from a <code>data.frame</code> (15,866 x 13)</strong></p>
<p><strong>Frequency table of <code>genus</code> and <code>species</code> from a <code>data.frame</code> (15,859 x 13)</strong></p>
<p>Columns: 2<br>
Length: 15,866 (of which NA: 0 = 0.00%)<br>
Length: 15,859 (of which NA: 0 = 0.00%)<br>
Unique: 4</p>
<p>Shortest: 16<br>
Longest: 24</p>
@ -927,33 +934,33 @@ Longest: 24</p>
<tr class="odd">
<td align="left">1</td>
<td align="left">Escherichia coli</td>
<td align="right">7,860</td>
<td align="right">49.5%</td>
<td align="right">7,860</td>
<td align="right">49.5%</td>
<td align="right">7,917</td>
<td align="right">49.9%</td>
<td align="right">7,917</td>
<td align="right">49.9%</td>
</tr>
<tr class="even">
<td align="left">2</td>
<td align="left">Staphylococcus aureus</td>
<td align="right">3,892</td>
<td align="right">24.5%</td>
<td align="right">11,752</td>
<td align="right">74.1%</td>
<td align="right">3,901</td>
<td align="right">24.6%</td>
<td align="right">11,818</td>
<td align="right">74.5%</td>
</tr>
<tr class="odd">
<td align="left">3</td>
<td align="left">Streptococcus pneumoniae</td>
<td align="right">2,556</td>
<td align="right">16.1%</td>
<td align="right">14,308</td>
<td align="right">90.2%</td>
<td align="right">2,478</td>
<td align="right">15.6%</td>
<td align="right">14,296</td>
<td align="right">90.1%</td>
</tr>
<tr class="even">
<td align="left">4</td>
<td align="left">Klebsiella pneumoniae</td>
<td align="right">1,558</td>
<td align="right">9.8%</td>
<td align="right">15,866</td>
<td align="right">1,563</td>
<td align="right">9.9%</td>
<td align="right">15,859</td>
<td align="right">100.0%</td>
</tr>
</tbody>
@ -964,7 +971,7 @@ Longest: 24</p>
<a href="#resistance-percentages" class="anchor"></a>Resistance percentages</h2>
<p>The functions <code>portion_R</code>, <code>portion_RI</code>, <code>portion_I</code>, <code>portion_IS</code> and <code>portion_S</code> can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb25-1" title="1">data_1st <span class="op">%&gt;%</span><span class="st"> </span><span class="kw"><a href="../reference/portion.html">portion_IR</a></span>(amox)</a>
<a class="sourceLine" id="cb25-2" title="2"><span class="co">#&gt; [1] 0.4766797</span></a></code></pre></div>
<a class="sourceLine" id="cb25-2" title="2"><span class="co">#&gt; [1] 0.4776468</span></a></code></pre></div>
<p>Or can be used in conjuction with <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code>, both from the <code>dplyr</code> package:</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb26-1" title="1">data_1st <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb26-2" title="2"><span class="st"> </span><span class="kw"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span>(hospital) <span class="op">%&gt;%</span><span class="st"> </span></a>
@ -977,19 +984,19 @@ Longest: 24</p>
<tbody>
<tr class="odd">
<td align="center">Hospital A</td>
<td align="center">0.4749633</td>
<td align="center">0.4658398</td>
</tr>
<tr class="even">
<td align="center">Hospital B</td>
<td align="center">0.4879713</td>
<td align="center">0.4821041</td>
</tr>
<tr class="odd">
<td align="center">Hospital C</td>
<td align="center">0.4761905</td>
<td align="center">0.4874459</td>
</tr>
<tr class="even">
<td align="center">Hospital D</td>
<td align="center">0.4600062</td>
<td align="center">0.4803681</td>
</tr>
</tbody>
</table>
@ -1007,23 +1014,23 @@ Longest: 24</p>
<tbody>
<tr class="odd">
<td align="center">Hospital A</td>
<td align="center">0.4749633</td>
<td align="center">4773</td>
<td align="center">0.4658398</td>
<td align="center">4757</td>
</tr>
<tr class="even">
<td align="center">Hospital B</td>
<td align="center">0.4879713</td>
<td align="center">5570</td>
<td align="center">0.4821041</td>
<td align="center">5532</td>
</tr>
<tr class="odd">
<td align="center">Hospital C</td>
<td align="center">0.4761905</td>
<td align="center">0.4874459</td>
<td align="center">2310</td>
</tr>
<tr class="even">
<td align="center">Hospital D</td>
<td align="center">0.4600062</td>
<td align="center">3213</td>
<td align="center">0.4803681</td>
<td align="center">3260</td>
</tr>
</tbody>
</table>
@ -1043,27 +1050,27 @@ Longest: 24</p>
<tbody>
<tr class="odd">
<td align="center">Escherichia</td>
<td align="center">0.7249364</td>
<td align="center">0.8996183</td>
<td align="center">0.9725191</td>
<td align="center">0.7305798</td>
<td align="center">0.8974359</td>
<td align="center">0.9760010</td>
</tr>
<tr class="even">
<td align="center">Klebsiella</td>
<td align="center">0.7291399</td>
<td align="center">0.9037227</td>
<td align="center">0.9762516</td>
<td align="center">0.7370441</td>
<td align="center">0.9040307</td>
<td align="center">0.9776072</td>
</tr>
<tr class="odd">
<td align="center">Staphylococcus</td>
<td align="center">0.7410072</td>
<td align="center">0.9229188</td>
<td align="center">0.9786742</td>
<td align="center">0.7303256</td>
<td align="center">0.9200205</td>
<td align="center">0.9771853</td>
</tr>
<tr class="even">
<td align="center">Streptococcus</td>
<td align="center">0.7292645</td>
<td align="center">0.7332526</td>
<td align="center">0.0000000</td>
<td align="center">0.7292645</td>
<td align="center">0.7332526</td>
</tr>
</tbody>
</table>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 50 KiB

@ -83,6 +83,13 @@
Work with WHONET data
</a>
</li>
<li>
<a href="../articles/SPSS.html">
<span class="fa fa-file-upload"></span>
Import data from SPSS/SAS/Stata
</a>
</li>
<li>
<a href="../articles/EUCAST.html">
<span class="fa fa-exchange-alt"></span>

@ -83,6 +83,13 @@
Work with WHONET data
</a>
</li>
<li>
<a href="../articles/SPSS.html">
<span class="fa fa-file-upload"></span>
Import data from SPSS/SAS/Stata
</a>
</li>
<li>
<a href="../articles/EUCAST.html">
<span class="fa fa-exchange-alt"></span>

@ -0,0 +1,397 @@
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How to import data from SPSS / SAS / Stata • AMR (for R)</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png">
<!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.3.7/flatly/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous">
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous">
<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script><!-- sticky kit --><script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet">
<script src="../pkgdown.js"></script><!-- docsearch --><script src="../docsearch.js"></script><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/docsearch.js/2.6.1/docsearch.min.css" integrity="sha256-QOSRU/ra9ActyXkIBbiIB144aDBdtvXBcNc3OTNuX/Q=" crossorigin="anonymous">
<link href="../docsearch.css" rel="stylesheet">
<script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/jquery.mark.min.js" integrity="sha256-4HLtjeVgH0eIB3aZ9mLYF6E8oU5chNdjU6p6rrXpl9U=" crossorigin="anonymous"></script><link href="../extra.css" rel="stylesheet">
<script src="../extra.js"></script><meta property="og:title" content="How to import data from SPSS / SAS / Stata">
<meta property="og:description" content="">
<meta property="og:image" content="https://msberends.gitlab.io/AMR/logo.png">
<meta name="twitter:card" content="summary">
<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container template-article">
<header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<span class="navbar-brand">
<a class="navbar-link" href="../index.html">AMR (for R)</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.5.0.9017</span>
</span>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="../index.html">
<span class="fa fa-home"></span>
Home
</a>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
<span class="fa fa-question-circle"></span>
How to
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="../articles/AMR.html">
<span class="fa fa-directions"></span>
Conduct AMR analysis
</a>
</li>
<li>
<a href="../articles/resistance_predict.html">
<span class="fa fa-dice"></span>
Predict antimicrobial resistance
</a>
</li>
<li>
<a href="../articles/WHONET.html">
<span class="fa fa-globe-americas"></span>
Work with WHONET data
</a>
</li>
<li>
<a href="../articles/SPSS.html">
<span class="fa fa-file-upload"></span>
Import data from SPSS/SAS/Stata
</a>
</li>
<li>
<a href="../articles/EUCAST.html">
<span class="fa fa-exchange-alt"></span>
Apply EUCAST rules
</a>
</li>
<li>
<a href="../reference/mo_property.html">
<span class="fa fa-bug"></span>
Get properties of a microorganism
</a>
</li>
<li>
<a href="../reference/atc_property.html">
<span class="fa fa-capsules"></span>
Get properties of an antibiotic
</a>
</li>
<li>
<a href="../articles/freq.html">
<span class="fa fa-sort-amount-down"></span>
Create frequency tables
</a>
</li>
<li>
<a href="../articles/G_test.html">
<span class="fa fa-clipboard-check"></span>
Use the G-test
</a>
</li>
<li>
<a href="../articles/benchmarks.html">
<span class="fa fa-shipping-fast"></span>
Other: benchmarks
</a>
</li>
</ul>
</li>
<li>
<a href="../reference/">
<span class="fa fa-book-open"></span>
Manual
</a>
</li>
<li>
<a href="../authors.html">
<span class="fa fa-users"></span>
Authors
</a>
</li>
<li>
<a href="../news/">
<span class="far fa far fa-newspaper"></span>
Changelog
</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="https://gitlab.com/msberends/AMR">
<span class="fab fa fab fa-gitlab"></span>
Source Code
</a>
</li>
<li>
<a href="../LICENSE-text.html">
<span class="fa fa-book"></span>
Licence
</a>
</li>
</ul>
<form class="navbar-form navbar-right" role="search">
<div class="form-group">
<input type="search" class="form-control" name="search-input" id="search-input" placeholder="Search..." aria-label="Search for..." autocomplete="off">
</div>
</form>
</div>
<!--/.nav-collapse -->
</div>
<!--/.container -->
</div>
<!--/.navbar -->
</header><div class="row">
<div class="col-md-9 contents">
<div class="page-header toc-ignore">
<h1>How to import data from SPSS / SAS / Stata</h1>
<h4 class="author">Matthijs S. Berends</h4>
<h4 class="date">14 February 2019</h4>
<div class="hidden name"><code>SPSS.Rmd</code></div>
</div>
<div id="spss-sas-stata" class="section level2">
<h2 class="hasAnchor">
<a href="#spss-sas-stata" class="anchor"></a>SPSS / SAS / Stata</h2>
<p>SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendlyness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are statistical packages popular in big industries.</p>
</div>
<div id="compared-to-r" class="section level2">
<h2 class="hasAnchor">
<a href="#compared-to-r" class="anchor"></a>Compared to R</h2>
<p>As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:</p>
<ul>
<li>
<p><strong>R is highly modular.</strong></p>
<p>The <a href="https://cran.r-project.org/web/packages/">official R network (CRAN)</a> features almost 14,000 packages at the time of writing, our <code>AMR</code> package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitLab or GitHub. So there may even be a lot more than 14,000 packages out there.</p>
<p>Bottomline is, you can really extend it yourself or ask somebody to do this for you. Take for example our <code>AMR</code> package. SPSS, SAS and Stata will never know what a valid MIC value is (so data might not be clean) or what the Gram stain of <em>E. coli</em> is. Or the fact that all species of <em>Klebiella</em> are resistant to amoxicillin.</p>
</li>
<li>
<p><strong>R is extremely flexible.</strong></p>
<p>Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, gathering, grouping, summarising and drawing plots is endless - with SPSS, SAS or Stata you are bound to their algorithms and styles. It may be a bit flexible, but you can never create that very specific publication-ready plot without using other (paid) software.</p>
</li>
<li>
<p><strong>R can be easily automated.</strong></p>
<p>Over the last years, <a href="https://rmarkdown.rstudio.com/">R Markdown</a> has really made an interesting development. With R Markdown, you can very easily reproduce your reports, whether its to Word, Powerpoint, a website, a PDF document or just the raw data to Excel. I use this a lot to generate monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.</p>
<p>For an even more professional environment, you could create <a href="https://shiny.rstudio.com/">Shiny apps</a>: live manipulation of data using a custom made website. The webdesign knowledge needed (Javascript, CSS, HTML) is almost <em>zero</em>.</p>
</li>
<li>
<p><strong>R has a huge community.</strong></p>
<p>Many R users just ask questions on website like <a href="https://stackoverflow.com">stackoverflow.com</a>, the largest online community for programmers. At the time of writing, around <a href="https://stackoverflow.com/questions/tagged/r?sort=votes">275,000 R questions</a> have been asked on this platform (which covers questions and answer for any programming language). In my own experience, most questions are answered within a couple of minutes.</p>
</li>
<li>
<p><strong>R understands any data type, including SPSS/SAS/Stata.</strong></p>
<p>And thats not vice versa Im afraid. You can import data from any source into R. As said, from SPSS/SAS/Stata (<a href="https://haven.tidyverse.org/">link</a>), but also from Excel (<a href="https://readxl.tidyverse.org/">link</a>), from flat files like CSV, TXT or TSV (<a href="https://readr.tidyverse.org/">link</a>), or directly from databases or datawarehouses from anywhere on the world (<a href="https://dbplyr.tidyverse.org/">link</a>). You can even scrape websites to download tables that are live on the internet (<a href="https://github.com/hadley/rvest">link</a>).</p>
<p>And the best part - you can export from R to all data formats as well. So you can import an SPSS file, do your analysis neatly in R and export back to SPSS. Although you might omit that very last step.</p>
</li>
<li>
<p><strong>R is completely free and open-source.</strong></p>
<p>No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying <a href="https://www.ibm.com/products/spss-statistics/pricing">between USD 1,300 and USD 8,500</a> per computer <em>per year</em>. SAS Analytics Pro costs <a href="https://www.sas.com/store/products-solutions/sas-analytics-pro/prodPERSANL.html">around USD 10,000</a> per computer. Stata also has a business model with subscription fees, varying <a href="https://www.stata.com/order/new/bus/single-user-licenses/dl/">between USD 600 and USD 1,200</a> per computer per year, but lower prices come with a limitation of the number of variables you can work with.</p>
<p>If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.</p>
</li>
</ul>
<p>If you sometimes write syntaxes in SPSS to run a complete analysis or to automate some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.</p>
</div>
<div id="import-data-from-spsssasstata" class="section level2">
<h2 class="hasAnchor">
<a href="#import-data-from-spsssasstata" class="anchor"></a>Import data from SPSS/SAS/Stata</h2>
<div id="rstudio" class="section level3">
<h3 class="hasAnchor">
<a href="#rstudio" class="anchor"></a>RStudio</h3>
<p>To work with R, probably the best option is to use <a href="https://www.rstudio.com/products/rstudio/">RStudio</a>. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menu to work with other data sources. You can also run <a href="https://www.rstudio.com/products/rstudio/">RStudio Server</a>, which is nothing less than the complete RStudio software available as a website (e.g. in your corporate network or at home).</p>
<p>To import a data file, just click <em>Import Dataset</em> in the Environment tab:</p>
<p><img src="../import1.png"></p>
<p>If additional packages are needed, RStudio will ask you if they should be installed on beforehand.</p>
<p>In the the window that opens, you can define all options (parameters) that should be used for import and youre ready to go:</p>
<p><img src="../import2.png"></p>
<p>If you want named variables to be imported as factors so it resembles SPSS more, use <code><a href="https://haven.tidyverse.org/reference/as_factor.html">as_factor()</a></code>.</p>
<p>The difference is this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" title="1">SPSS_data</a>
<a class="sourceLine" id="cb1-2" title="2"><span class="co"># # A tibble: 4,203 x 4</span></a>
<a class="sourceLine" id="cb1-3" title="3"><span class="co"># v001 sex status statusage</span></a>
<a class="sourceLine" id="cb1-4" title="4"><span class="co"># &lt;dbl&gt; &lt;dbl+lbl&gt; &lt;dbl+lbl&gt; &lt;dbl&gt;</span></a>
<a class="sourceLine" id="cb1-5" title="5"><span class="co"># 1 10002 1 1 76.6</span></a>
<a class="sourceLine" id="cb1-6" title="6"><span class="co"># 2 10004 0 1 59.1</span></a>
<a class="sourceLine" id="cb1-7" title="7"><span class="co"># 3 10005 1 1 54.5</span></a>
<a class="sourceLine" id="cb1-8" title="8"><span class="co"># 4 10006 1 1 54.1</span></a>
<a class="sourceLine" id="cb1-9" title="9"><span class="co"># 5 10007 1 1 57.7</span></a>
<a class="sourceLine" id="cb1-10" title="10"><span class