* Function `freq()` has moved to a new package, [`clean`](https://github.com/msberends/clean) ([CRAN link](https://cran.r-project.org/package=clean)). Creating frequency tables is actually not the scope of this package (never was) and this function has matured a lot over the last two years. We decided to create a new package for data cleaning and checking and it perfectly fits the `freq()` function. The [`clean`](https://github.com/msberends/clean) package is available on CRAN and will be installed automatically when updating the `AMR` package, that now imports it. In a later stage, the `skewness()` and `kurtosis()` functions will be moved to the `clean` package too.
### New
* Additional way to calculate co-resistance, i.e. when using multiple antibiotics as input for `portion_*` functions or `count_*` functions. This can be used to determine the empiric susceptibily of a combination therapy. A new parameter `only_all_tested` (**which defaults to `FALSE`**) replaces the old `also_single_tested` and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the `portion` and `count` help pages), where the %SI is being determined:
<aclass="sourceLine"id="cb4-8"data-line-number="8"><spanclass="co"># </span><spanclass="al">NOTE</span><spanclass="co">: Reliability might be improved if these antimicrobial results would be available too: CAP (capreomycin), RIB (rifabutin), RFP (rifapentine)</span></a></code></pre></div>
<p>And review the result with a frequency table:</p>
<p><strong>Frequency table of <code>mdr</code> from <code>my_TB_data</code> (5,000 x 8)</strong></p>
<p>We also created a package dedicated to data cleaning and checking, called the <code>clean</code> package. It gets automatically installed with the <code>AMR</code> package, so we only have to load it:</p>
<p>It contains the <code><ahref="https://www.rdocumentation.org/packages/clean/topics/freq">freq()</a></code> function, to create a frequency table:</p>
<aclass="sourceLine"id="cb3-5"data-line-number="5"><spanclass="st"></span><spanclass="co"># transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class</span></a>
<p><strong>Frequency table of <code>mo</code> from <code>data</code> (500 x 54)</strong></p>
<p>No errors or warnings, so all values are transformed succesfully.</p>
<p>We created a package dedicated to data cleaning and checking, called the <code>clean</code> package. It gets automatically installed with the <code>AMR</code> package, so we only have to load it:</p>
<p>It contains the <code><ahref="https://www.rdocumentation.org/packages/clean/topics/freq">freq()</a></code> function, to create frequency tables.</p>
<p>So letโs check our data, with a couple of frequency tables:</p>
<divclass="sourceCode"id="cb5"><preclass="sourceCode r"><codeclass="sourceCode r"><aclass="sourceLine"id="cb5-1"data-line-number="1"><spanclass="co"># our newly created `mo` variable</span></a>
<p>In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.</p>
<p>To achieve this speed, the <code>as.mo</code> function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of <em>Thermus islandicus</em> (<code>B_THERMS_ISL</code>), a bug probably never found before in humans:</p>
<p>That takes 10.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like <em>Thermus islandicus</em>) are almost fast - these are the most probable input from most data sets.</p>
<p>That takes 8.8 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like <em>Thermus islandicus</em>) are almost fast - these are the most probable input from most data sets.</p>
<p>In the figure below, we compare <em>Escherichia coli</em> (which is very common) with <em>Prevotella brevis</em> (which is moderately common) and with <em>Thermus islandicus</em> (which is very uncommon):</p>
<divclass="sourceCode"id="cb4"><preclass="sourceCode r"><codeclass="sourceCode r"><aclass="sourceLine"id="cb4-1"data-line-number="1"><spanclass="kw"><ahref="https://www.rdocumentation.org/packages/graphics/topics/par">par</a></span>(<spanclass="dt">mar =</span><spanclass="kw"><ahref="https://www.rdocumentation.org/packages/base/topics/c">c</a></span>(<spanclass="dv">5</span>, <spanclass="dv">16</span>, <spanclass="dv">4</span>, <spanclass="dv">2</span>)) <spanclass="co"># set more space for left margin text (16)</span></a>
<aclass="sourceLine"id="cb6-7"data-line-number="7"><spanclass="co"># expr min lq mean median uq max neval</span></a>
<aclass="sourceLine"id="cb6-8"data-line-number="8"><spanclass="co"># A 6.730 7.030 8.030 7.750 8.72 9.73 10</span></a>
<aclass="sourceLine"id="cb6-9"data-line-number="9"><spanclass="co"># B 22.400 23.000 27.100 23.600 27.10 46.00 10</span></a>
<aclass="sourceLine"id="cb6-10"data-line-number="10"><spanclass="co"># C 0.835 0.877 0.978 0.925 1.12 1.18 10</span></a></code></pre></div>
<p>So going from <code><ahref="../reference/mo_property.html">mo_fullname("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0009 seconds - it doesnโt even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
<aclass="sourceLine"id="cb6-8"data-line-number="8"><spanclass="co"># A 6.350 6.600 7.050 6.870 7.35 8.37 10</span></a>
<aclass="sourceLine"id="cb6-9"data-line-number="9"><spanclass="co"># B 21.300 21.500 25.300 22.200 22.70 48.20 10</span></a>
<aclass="sourceLine"id="cb6-10"data-line-number="10"><spanclass="co"># C 0.624 0.753 0.804 0.783 0.87 1.01 10</span></a></code></pre></div>
<p>So going from <code><ahref="../reference/mo_property.html">mo_fullname("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0008 seconds - it doesnโt even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
<aclass="sourceLine"id="cb7-12"data-line-number="12"><spanclass="co"># expr min lq mean median uq max neval</span></a>
<aclass="sourceLine"id="cb7-13"data-line-number="13"><spanclass="co"># A 0.468 0.470 0.533 0.489 0.595 0.690 10</span></a>
<aclass="sourceLine"id="cb7-14"data-line-number="14"><spanclass="co"># B 0.504 0.513 0.555 0.520 0.571 0.711 10</span></a>
<aclass="sourceLine"id="cb7-15"data-line-number="15"><spanclass="co"># C 0.629 0.687 0.864 0.855 1.050 1.130 10</span></a>
<aclass="sourceLine"id="cb7-16"data-line-number="16"><spanclass="co"># D 0.505 0.515 0.575 0.530 0.649 0.767 10</span></a>
<aclass="sourceLine"id="cb7-17"data-line-number="17"><spanclass="co"># E 0.442 0.457 0.529 0.481 0.531 0.774 10</span></a>
<aclass="sourceLine"id="cb7-18"data-line-number="18"><spanclass="co"># F 0.447 0.510 0.554 0.568 0.609 0.618 10</span></a>
<aclass="sourceLine"id="cb7-19"data-line-number="19"><spanclass="co"># G 0.443 0.470 0.492 0.477 0.506 0.601 10</span></a>
<aclass="sourceLine"id="cb7-20"data-line-number="20"><spanclass="co"># H 0.448 0.459 0.491 0.466 0.515 0.633 10</span></a></code></pre></div>
<aclass="sourceLine"id="cb7-13"data-line-number="13"><spanclass="co"># A 0.436 0.454 0.460 0.460 0.462 0.491 10</span></a>
<aclass="sourceLine"id="cb7-14"data-line-number="14"><spanclass="co"># B 0.472 0.480 0.496 0.488 0.513 0.542 10</span></a>
<aclass="sourceLine"id="cb7-15"data-line-number="15"><spanclass="co"># C 0.657 0.672 0.757 0.750 0.797 0.952 10</span></a>
<aclass="sourceLine"id="cb7-16"data-line-number="16"><spanclass="co"># D 0.478 0.495 0.500 0.499 0.503 0.540 10</span></a>
<aclass="sourceLine"id="cb7-17"data-line-number="17"><spanclass="co"># E 0.436 0.446 0.456 0.448 0.455 0.507 10</span></a>
<aclass="sourceLine"id="cb7-18"data-line-number="18"><spanclass="co"># F 0.437 0.447 0.455 0.454 0.460 0.478 10</span></a>
<aclass="sourceLine"id="cb7-19"data-line-number="19"><spanclass="co"># G 0.428 0.441 0.449 0.447 0.455 0.477 10</span></a>
<aclass="sourceLine"id="cb7-20"data-line-number="20"><spanclass="co"># H 0.438 0.445 0.456 0.451 0.472 0.477 10</span></a></code></pre></div>
<p>Of course, when running <code><ahref="../reference/mo_property.html">mo_phylum("Firmicutes")</a></code> the function has zero knowledge about the actual microorganism, namely <em>S. aureus</em>. But since the result would be <code>"Firmicutes"</code> too, there is no point in calculating the result. And because this package โknowsโ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.</p>
<li>Descriptive statistics: frequency tables, kurtosis and skewness (<ahref="./articles/freq.html">tutorial</a>)</li>
</ul>
<p>This package is ready-to-use for a professional environment by specialists in the following fields:</p>
<p>Medical Microbiology</p>
@ -340,8 +332,6 @@
@@ -340,8 +332,6 @@
<li>Calculate the resistance (and even co-resistance) of microbial isolates with the <code><ahref="reference/portion.html">portion_R()</a></code>, <code><ahref="reference/portion.html">portion_IR()</a></code>, <code><ahref="reference/portion.html">portion_I()</a></code>, <code><ahref="reference/portion.html">portion_SI()</a></code> and <code><ahref="reference/portion.html">portion_S()</a></code> functions. Similarly, the <em>number</em> of isolates can be determined with the <code><ahref="reference/count.html">count_R()</a></code>, <code><ahref="reference/count.html">count_IR()</a></code>, <code><ahref="reference/count.html">count_I()</a></code>, <code><ahref="reference/count.html">count_SI()</a></code> and <code><ahref="reference/count.html">count_S()</a></code> functions. All these functions can be used with the <code>dplyr</code> package (e.g.ย in conjunction with <code>summarise()</code>)</li>
<li>Plot AMR results with <code><ahref="reference/ggplot_rsi.html">geom_rsi()</a></code>, a function made for the <code>ggplot2</code> package</li>
<li>Predict antimicrobial resistance for the nextcoming years using logistic regression models with the <code><ahref="reference/resistance_predict.html">resistance_predict()</a></code> function</li>
<li>Conduct descriptive statistics to enhance base R: calculate <code><ahref="reference/kurtosis.html">kurtosis()</a></code>, <code><ahref="reference/skewness.html">skewness()</a></code> and create frequency tables with <code><ahref="reference/freq.html">freq()</a></code>
<li>Function <code>freq()</code> has moved to a new package, <ahref="https://github.com/msberends/clean"><code>clean</code></a> (<ahref="https://cran.r-project.org/package=clean">CRAN link</a>). Creating frequency tables is actually not the scope of this package (never was) and this function has matured a lot over the last two years. We decided to create a new package for data cleaning and checking and it perfectly fits the <code>freq()</code> function. The <ahref="https://github.com/msberends/clean"><code>clean</code></a> package is available on CRAN and will be installed automatically when updating the <code>AMR</code> package, that now imports it. In a later stage, the <code><ahref="../reference/skewness.html">skewness()</a></code> and <code><ahref="../reference/kurtosis.html">kurtosis()</a></code> functions will be moved to the <code>clean</code> package too.</li>