Note: some changes in this version were suggested by anonymous reviewers from the journal we submitted our manuscipt to. We are those reviewers very grateful for going through our code so thoroughly!
@ -39,6 +39,7 @@ Note: some changes in this version were suggested by anonymous reviewers from th
#> [1] 24 24
```
* Improvements for `as.mo()`:
* Any user input value that could mean more than one taxonomic entry is now considered 'uncertain'. Instead of a warning, a message will be thrown and the accompanying `mo_uncertainties()` has been changed completely; it now prints all possible candidates with their score.
* Big speed improvement for already valid microorganism ID. This also means an significant speed improvement for using `mo_*` functions like `mo_name()` on microoganism IDs.
* Added parameter `ignore_pattern` to `as.mo()` which can also be given to `mo_*` functions like `mo_name()`, to exclude known non-relevant input from analysing. This can also be set with the option `AMR_ignore_pattern`.
* `get_locale()` now uses `Sys.getlocale()` instead of `Sys.getlocale("LC_COLLATE")`
@ -48,6 +49,7 @@ Note: some changes in this version were suggested by anonymous reviewers from th
* BORSA is now recognised as an abbreviation for *Staphylococcus aureus*, meaning that e.g. `mo_genus("BORSA")` will return "Staphylococcus"
* Added a feature from AMR 1.1.0 and earlier again, but now without other package dependencies: `tibble` printing support for classes `<rsi>`, `<mic>`, `<disk>`, `<ab>` and `<mo>`. When using `tibble`s containing antimicrobial columns (class `<rsi>`), "S" will print in green, "I" will print in yellow and "R" will print in red. Microbial IDs (class `<mo>`) will emphasise on the genus and species, not on the kingdom.
* Names of antiviral agents in data set `antivirals` now have a starting capital letter, like it is the case in the `antibiotics` data set
* Updated the documentation of the `WHONET` data set to clarify that all patient names are fictitious
### Other
* Removed unnecessary references to the `base` package
#' Data set with `r format(nrow(WHONET), big.mark = ",")` isolates - WHONET example
#'
#' This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are based on our [example_isolates] data set. All patient names are created using online surname generators and are only in place for practice purposes.
#' This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are from our [example_isolates] data set. All patient names are created using online surname generators and are only in place for practice purposes.
#' @format A [`data.frame`] with `r format(nrow(WHONET), big.mark = ",")` observations and `r ncol(WHONET)` variables:
#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (``r as.mo("Neisseria gonorrhoeae")``) needs review.
#'
#' There are three helper functions that can be run after using the [as.mo()] function:
#' - Use [mo_uncertainties()] to get a [`data.frame`] with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \eqn{(n - 0.5 * L) / n}, where *n* is the number of characters of the full taxonomic name of the microorganism, and *L* is the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between that full name and the user input.
#' - Use [mo_uncertainties()] to get a [`data.frame`] that prints in a pretty format with all taxonomic names that were guessed. The output contains a score that is based on the human pathogenic prevalence and the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between the full taxonomic name and the user input.
#' - Use [mo_failures()] to get a [`character`] [`vector`] with all values that could not be coerced to a valid value.
#' - Use [mo_renamed()] to get a [`data.frame`] with all values that could be coerced based on old, previously accepted taxonomic names.
#'
@ -178,6 +178,14 @@ as.mo <- function(x,
...){
check_dataset_integrity()
if (tryCatch(all(x%in%MO_lookup$mo,na.rm=TRUE)
&isFALSE(Becker)
&isFALSE(Lancefield),error=function(e)FALSE)){
# don't look into valid MO codes, just return them
# is.mo() won't work - codes might change between package versions
return(to_class_mo(x))
}
if (tryCatch(all(tolower(x)%in%MO_lookup$fullname_lower,na.rm=TRUE)
<p>So getting official taxonomic names of 2,000,000 (!!) items consisting of 90 unique values only takes 0.102 seconds. You only lose time on your unique input values.</p>
<p>So getting official taxonomic names of 2,000,000 (!!) items consisting of 90 unique values only takes 0.133 seconds. You only lose time on your unique input values.</p>
B = <spanclass="fu"><ahref="../reference/mo_property.html">mo_name</a></span>(<spanclass="st">"S. aureus"</span>),
C = <spanclass="fu"><ahref="../reference/mo_property.html">mo_name</a></span>(<spanclass="st">"Staphylococcus aureus"</span>),
times = <spanclass="fl">10</span>)
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="co"># Result of one value was guessed with uncertainty. Use mo_uncertainties() to review it.</span>
<spanclass="fu"><ahref="https://rdrr.io/r/base/print.html">print</a></span>(<spanclass="kw">run_it</span>, unit = <spanclass="st">"ms"</span>, signif = <spanclass="fl">3</span>)
<spanclass="co"># Unit: milliseconds</span>
<spanclass="co"># expr min lq mean median uq max neval</span>
<spanclass="co"># A 7.08 7.29 8.00 8.25 8.49 9.22 10</span>
<spanclass="co"># B 12.30 13.50 14.20 14.50 14.70 14.80 10</span>
<spanclass="co"># C 2.14 2.26 7.35 2.38 2.51 52.30 10</span>
<spanclass="co"># A 7.83 7.96 8.19 8.22 8.33 8.84 10</span>
<spanclass="co"># B 18.10 19.50 27.80 20.20 20.70 65.90 10</span>
<spanclass="co"># C 1.77 2.11 2.34 2.27 2.33 3.22 10</span>
</pre></div>
<p>So going from <code><ahref="../reference/mo_property.html">mo_name("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0024 seconds - it doesn’t even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
<p>So going from <code><ahref="../reference/mo_property.html">mo_name("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0023 seconds - it doesn’t even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
B = <spanclass="fu"><ahref="../reference/mo_property.html">mo_genus</a></span>(<spanclass="st">"Staphylococcus"</span>),
@ -320,14 +390,14 @@
<spanclass="fu"><ahref="https://rdrr.io/r/base/print.html">print</a></span>(<spanclass="kw">run_it</span>, unit = <spanclass="st">"ms"</span>, signif = <spanclass="fl">3</span>)
<spanclass="co"># Unit: milliseconds</span>
<spanclass="co"># expr min lq mean median uq max neval</span>
<spanclass="co"># A 1.29 1.38 1.64 1.47 1.84 2.28 10</span>
<spanclass="co"># B 1.27 1.62 1.76 1.69 1.82 2.71 10</span>
<spanclass="co"># C 1.28 1.32 1.56 1.48 1.77 2.09 10</span>
<spanclass="co"># D 1.29 1.46 1.68 1.66 1.77 2.24 10</span>
<spanclass="co"># E 1.26 1.39 5.34 1.64 1.77 39.00 10</span>
<spanclass="co"># F 1.26 1.33 1.58 1.44 1.80 2.14 10</span>
<spanclass="co"># G 1.32 1.51 1.65 1.68 1.75 2.05 10</span>
<spanclass="co"># H 1.31 1.43 1.71 1.68 1.86 2.49 10</span>
<spanclass="co"># A 1.56 1.62 5.61 1.93 2.26 38.90 10</span>
<spanclass="co"># B 1.50 1.72 1.88 1.90 2.01 2.34 10</span>
<spanclass="co"># C 1.52 1.76 1.88 1.89 1.96 2.27 10</span>
<spanclass="co"># D 1.47 1.62 1.85 1.86 1.89 2.80 10</span>
<spanclass="co"># E 1.51 1.84 1.98 1.88 2.07 2.56 10</span>
<spanclass="co"># F 1.44 1.50 1.68 1.57 1.89 2.19 10</span>
<spanclass="co"># G 1.47 1.48 1.65 1.59 1.84 2.00 10</span>
<spanclass="co"># H 1.55 1.60 1.75 1.69 1.81 2.34 10</span>
</pre></div>
<p>Of course, when running <code><ahref="../reference/mo_property.html">mo_phylum("Firmicutes")</a></code> the function has zero knowledge about the actual microorganism, namely <em>S. aureus</em>. But since the result would be <code>"Firmicutes"</code> anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.</p>
</div>
@ -356,13 +426,13 @@
<spanclass="fu"><ahref="https://rdrr.io/r/base/print.html">print</a></span>(<spanclass="kw">run_it</span>, unit = <spanclass="st">"ms"</span>, signif = <spanclass="fl">4</span>)
<spanclass="co"># Unit: milliseconds</span>
<spanclass="co"># expr min lq mean median uq max neval</span>
<spanclass="co"># en 13.29 13.54 17.53 13.70 14.93 58.25 100</span>
<spanclass="co"># de 14.25 14.46 19.09 14.69 16.23 58.96 100</span>
<p>A data set with 67,151 rows and 16 columns, containing the following column names:<br><em>‘mo’, ‘fullname’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’, ‘subspecies’, ‘rank’, ‘ref’, ‘species_id’, ‘source’, ‘prevalence’, ‘snomed’</em>.</p>
<p>This data set is in R available as <code>microorganisms</code>, after you load the <code>AMR</code> package.</p>
<p>It was last updated on 1 September 2020 11:07:11 CEST. Find more info about the structure of this data set <ahref="https://msberends.github.io/AMR/reference/microorganisms.html">here</a>.</p>
<p>It was last updated on 3 September 2020 20:59:45 CEST. Find more info about the structure of this data set <ahref="https://msberends.github.io/AMR/reference/microorganisms.html">here</a>.</p>
<ahref="#last-updated-3-september-2020" class="anchor"></a><small>Last updated: 3 September 2020</small>
<ahref="#last-updated-12-september-2020" class="anchor"></a><small>Last updated: 12 September 2020</small>
</h2>
<p>Note: some changes in this version were suggested by anonymous reviewers from the journal we submitted our manuscipt to. We are those reviewers very grateful for going through our code so thoroughly!</p>
<divid="new"class="section level3">
@ -299,6 +299,7 @@
<li>
<p>Improvements for <code><ahref="../reference/as.mo.html">as.mo()</a></code>:</p>
<ul>
<li>Any user input value that could mean more than one taxonomic entry is now considered ‘uncertain’. Instead of a warning, a message will be thrown and the accompanying <code><ahref="../reference/as.mo.html">mo_uncertainties()</a></code> has been changed completely; it now prints all possible candidates with their score.</li>
<li>Big speed improvement for already valid microorganism ID. This also means an significant speed improvement for using <code>mo_*</code> functions like <code><ahref="../reference/mo_property.html">mo_name()</a></code> on microoganism IDs.</li>
<li>Added parameter <code>ignore_pattern</code> to <code><ahref="../reference/as.mo.html">as.mo()</a></code> which can also be given to <code>mo_*</code> functions like <code><ahref="../reference/mo_property.html">mo_name()</a></code>, to exclude known non-relevant input from analysing. This can also be set with the option <code>AMR_ignore_pattern</code>.</li>
</ul>
@ -310,6 +311,7 @@
<li><p>BORSA is now recognised as an abbreviation for <em>Staphylococcus aureus</em>, meaning that e.g.<code><ahref="../reference/mo_property.html">mo_genus("BORSA")</a></code> will return “Staphylococcus”</p></li>
<li><p>Added a feature from AMR 1.1.0 and earlier again, but now without other package dependencies: <code>tibble</code> printing support for classes <code><rsi></code>, <code><mic></code>, <code><disk></code>, <code><ab></code> and <code><mo></code>. When using <code>tibble</code>s containing antimicrobial columns (class <code><rsi></code>), “S” will print in green, “I” will print in yellow and “R” will print in red. Microbial IDs (class <code><mo></code>) will emphasise on the genus and species, not on the kingdom.</p></li>
<li><p>Names of antiviral agents in data set <code>antivirals</code> now have a starting capital letter, like it is the case in the <code>antibiotics</code> data set</p></li>
<li><p>Updated the documentation of the <code>WHONET</code> data set to clarify that all patient names are fictitious</p></li>
</ul>
</div>
<divid="other"class="section level3">
@ -961,7 +963,7 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
<li>Fixed bug where not all old taxonomic names would be printed, when using a vector as input for <code><ahref="../reference/as.mo.html">as.mo()</a></code>
</li>
<li>Manually added <em>Trichomonas vaginalis</em> from the kingdom of Protozoa, which is missing from the Catalogue of Life</li>
<li>Small improvements to <code><ahref="https://rdrr.io/r/graphics/plot.default.html">plot()</a></code> and <code><ahref="https://rdrr.io/r/graphics/barplot.html">barplot()</a></code> for MIC and RSI classes</li>
<li>Small improvements to <code><ahref="../reference/plot.html">plot()</a></code> and <code><ahref="https://rdrr.io/r/graphics/barplot.html">barplot()</a></code> for MIC and RSI classes</li>
<li>Allow Catalogue of Life IDs to be coerced by <code><ahref="../reference/as.mo.html">as.mo()</a></code>
</li>
</ul>
@ -1169,10 +1171,10 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
<li><p>New function <code><ahref="../reference/age.html">age()</a></code> to calculate the (patients) age in years</p></li>
<li><p>New function <code><ahref="../reference/age_groups.html">age_groups()</a></code> to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.</p></li>
<li>
<p>New function <code><ahref="../reference/resistance_predict.html">ggplot_rsi_predict()</a></code> as well as the base R <code><ahref="https://rdrr.io/r/graphics/plot.default.html">plot()</a></code> function can now be used for resistance prediction calculated with <code><ahref="../reference/resistance_predict.html">resistance_predict()</a></code>:</p>
<p>New function <code><ahref="../reference/resistance_predict.html">ggplot_rsi_predict()</a></code> as well as the base R <code><ahref="../reference/plot.html">plot()</a></code> function can now be used for resistance prediction calculated with <code><ahref="../reference/resistance_predict.html">resistance_predict()</a></code>:</p>
<metaproperty="og:title"content="Data set with 500 isolates - WHONET example — WHONET"/>
<metaproperty="og:description"content="This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are based on our example_isolates data set. All patient names are created using online surname generators and are only in place for practice purposes." />
<metaproperty="og:description"content="This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are from our example_isolates data set. All patient names are created using online surname generators and are only in place for practice purposes." />
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9015</span>
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9016</span>
</span>
</div>
@ -239,7 +239,7 @@
</div>
<divclass="ref-description">
<p>This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are based on our <ahref='example_isolates.html'>example_isolates</a> data set. All patient names are created using online surname generators and are only in place for practice purposes.</p>
<p>This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The antibiotic results are from our <ahref='example_isolates.html'>example_isolates</a> data set. All patient names are created using online surname generators and are only in place for practice purposes.</p>
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9015</span>
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9016</span>
</span>
</div>
@ -347,7 +347,7 @@
</ul>
<p>There are three helper functions that can be run after using the <code>as.mo()</code> function:</p><ul>
<li><p>Use <code>mo_uncertainties()</code> to get a <code><ahref='https://rdrr.io/r/base/data.frame.html'>data.frame</a></code>with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \((n - 0.5 * L) / n\), where <em>n</em> is the number of characters of the full taxonomic name of the microorganism, and <em>L</em> is the <ahref='https://en.wikipedia.org/wiki/Levenshtein_distance'>Levenshtein distance</a> between that full name and the user input.</p></li>
<li><p>Use <code>mo_uncertainties()</code> to get a <code><ahref='https://rdrr.io/r/base/data.frame.html'>data.frame</a></code>that prints in a pretty format with all taxonomic names that were guessed. The output contains a score that is based on the human pathogenic prevalence and the <ahref='https://en.wikipedia.org/wiki/Levenshtein_distance'>Levenshtein distance</a> between the full taxonomic name and the user input.</p></li>
<li><p>Use <code>mo_failures()</code> to get a <code><ahref='https://rdrr.io/r/base/character.html'>character</a></code><code><ahref='https://rdrr.io/r/base/vector.html'>vector</a></code> with all values that could not be coerced to a valid value.</p></li>
<li><p>Use <code>mo_renamed()</code> to get a <code><ahref='https://rdrr.io/r/base/data.frame.html'>data.frame</a></code> with all values that could be coerced based on old, previously accepted taxonomic names.</p></li>
</ul>
@ -456,7 +456,8 @@ This package contains the complete taxonomic tree of almost all microorganisms (
<spanclass='co'># although this works easier and does the same:</span>
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9015</span>
<spanclass="version label label-default"data-toggle="tooltip"data-placement="bottom"title="Latest development version">1.3.0.9016</span>
</span>
</div>
@ -340,12 +340,14 @@
<li><p>For <strong>cleaning raw / untransformed data</strong>. The data will be cleaned to only contain values S, I and R and will try its best to determine this with some intelligence. For example, mixed values with R/SI interpretations and MIC values such as <code>"<0.25; S"</code> will be coerced to <code>"S"</code>. Combined interpretations for multiple test methods (as seen in laboratory records) such as <code>"S; S"</code> will be coerced to <code>"S"</code>, but a value like <code>"S; I"</code> will return <code>NA</code> with a warning that the input is unclear.</p></li>
<li><p>For <strong>interpreting minimum inhibitory concentration (MIC) values</strong> according to EUCAST or CLSI. You must clean your MIC values first using <code><ahref='as.mic.html'>as.mic()</a></code>, that also gives your columns the new data class <code><ahref='as.mic.html'>mic</a></code>. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the <code>mo</code> parameter.</p><ul>
<li><p>Using <code>dplyr</code>, R/SI interpretation can be done very easily with either:</p><pre><spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate_all.html'>mutate_if</a></span>(<spanclass='kw'>is.mic</span>, <spanclass='kw'>as.rsi</span>) <spanclass='co'># until dplyr 1.0.0</span>
<spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span>(<spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/across.html'>across</a></span>(<spanclass='fu'>where</span>(<spanclass='kw'>is.mic</span>), <spanclass='kw'>as.rsi</span>)) <spanclass='co'># since dplyr 1.0.0</span></pre></li>
<spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span>(<spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/across.html'>across</a></span>(<spanclass='fu'>where</span>(<spanclass='kw'>is.mic</span>), <spanclass='kw'>as.rsi</span>)) <spanclass='co'># since dplyr 1.0.0</span>
</pre></li>
<li><p>Operators like "<=" will be stripped before interpretation. When using <code>conserve_capped_values = TRUE</code>, an MIC value of e.g. ">2" will always return "R", even if the breakpoint according to the chosen guideline is ">=4". This is to prevent that capped values from raw laboratory data would not be treated conservatively. The default behaviour (<code>conserve_capped_values = FALSE</code>) considers ">2" to be lower than ">=4" and might in this case return "S" or "I".</p></li>
</ul></li>
<li><p>For <strong>interpreting disk diffusion diameters</strong> according to EUCAST or CLSI. You must clean your disk zones first using <code><ahref='as.disk.html'>as.disk()</a></code>, that also gives your columns the new data class <code><ahref='as.disk.html'>disk</a></code>. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the <code>mo</code> parameter.</p><ul>
<li><p>Using <code>dplyr</code>, R/SI interpretation can be done very easily with either:</p><pre><spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate_all.html'>mutate_if</a></span>(<spanclass='kw'>is.disk</span>, <spanclass='kw'>as.rsi</span>) <spanclass='co'># until dplyr 1.0.0</span>
<spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span>(<spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/across.html'>across</a></span>(<spanclass='fu'>where</span>(<spanclass='kw'>is.disk</span>), <spanclass='kw'>as.rsi</span>)) <spanclass='co'># since dplyr 1.0.0</span></pre></li>
<spanclass='kw'>your_data</span><spanclass='op'>%>%</span><spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span>(<spanclass='fu'><ahref='https://dplyr.tidyverse.org/reference/across.html'>across</a></span>(<spanclass='fu'>where</span>(<spanclass='kw'>is.disk</span>), <spanclass='kw'>as.rsi</span>)) <spanclass='co'># since dplyr 1.0.0</span>
</pre></li>
</ul></li>
<li><p>For <strong>interpreting a complete data set</strong>, with automatic determination of MIC values, disk diffusion diameters, microorganism names or codes, and antimicrobial test results. This is done very simply by running <code>as.rsi(data)</code>.</p></li>