#' - Getting SNOMED codes of a microorganism, or get its name associated with a SNOMED code
#' - Getting LOINC codes of an antibiotic, or get its name associated with a LOINC code
#' - Machine reading the EUCAST and CLSI guidelines from 2011-2020 to translate MIC values and disk diffusion diameters to R/SI
#' - Principal component analysis for AMR
#' @section Read more on our website!:
#' On our website <https://msberends.gitlab.io/AMR> you can find [a comprehensive tutorial](https://msberends.gitlab.io/AMR/articles/AMR.html) about how to conduct AMR analysis, the [complete documentation of all functions](https://msberends.gitlab.io/AMR/reference) (which reads a lot easier than here in R) and [an example analysis using WHONET data](https://msberends.gitlab.io/AMR/articles/WHONET.html).
#' - Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.
#'
#' This leads to e.g.:
#' - `"Streptococcus group B (known as S. agalactiae)"`. The text between brackets will be removed and a warning will be thrown that the result *Streptococcus group B* (`B_STRPT_GRPB`) needs review.
#' - `"S. aureus - please mind: MRSA"`. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result *Staphylococcus aureus* (`B_STPHY_AURS`) needs review.
#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (`B_NESSR_GNRR`) needs review.
#' - `"Streptococcus group B (known as S. agalactiae)"`. The text between brackets will be removed and a warning will be thrown that the result *Streptococcus group B* (``r as.mo("Streptococcus group B")``) needs review.
#' - `"S. aureus - please mind: MRSA"`. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result *Staphylococcus aureus* (``r as.mo("Staphylococcus aureus")``) needs review.
#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (``r as.mo("Neisseria gonorrhoeae")``) needs review.
#'
#' The level of uncertainty can be set using the argument `allow_uncertain`. The default is `allow_uncertain = TRUE`, which is equal to uncertainty level 2. Using `allow_uncertain = FALSE` is equal to uncertainty level 0 and will skip all rules. You can also use e.g. `as.mo(..., allow_uncertain = 1)` to only allow up to level 1 uncertainty.
#' Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
<p>In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second.</p>
<p>To achieve this speed, the <code>as.mo</code> function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of <em>Methanosarcina semesiae</em> (<code>B_MTHNSR_SEMS</code>), a bug probably never found before in humans:</p>
<p>That takes 6.1 times as much time on average. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like <em>Methanosarcina semesiae</em>) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.</p>
<spanid="cb3-9"><ahref="#cb3-9"></a><spanclass="co"># expr min lq mean median uq</span></span>
<p>That takes 5.5 times as much time on average. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like <em>Methanosarcina semesiae</em>) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.</p>
<p>In the figure below, we compare <em>Escherichia coli</em> (which is very common) with <em>Prevotella brevis</em> (which is moderately common) and with <em>Methanosarcina semesiae</em> (which is uncommon):</p>
<p>Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: <strong>repetitive results</strong> and <strong>already precalculated results</strong>.</p>
@ -287,11 +272,11 @@
<spanid="cb4-4"><ahref="#cb4-4"></a><spanclass="st"></span><spanclass="co"># keep only the unique ones</span></span>
<spanid="cb5-7"><ahref="#cb5-7"></a><spanclass="co"># expr min lq mean median uq max neval</span></span>
<spanid="cb5-8"><ahref="#cb5-8"></a><spanclass="co"># A 6.58 6.590 7.340 6.630 6.780 13.00 10</span></span>
<spanid="cb5-9"><ahref="#cb5-9"></a><spanclass="co"># B 13.50 13.700 18.700 13.900 14.600 60.80 10</span></span>
<spanid="cb5-10"><ahref="#cb5-10"></a><spanclass="co"># C 0.72 0.863 0.917 0.898 0.935 1.26 10</span></span></code></pre></div>
<p>So going from <code><ahref="../reference/mo_property.html">mo_name("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0009 seconds - it doesn’t even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
<spanid="cb5-7"><ahref="#cb5-7"></a><spanclass="co"># expr min lq mean median uq max neval</span></span>
<spanid="cb5-8"><ahref="#cb5-8"></a><spanclass="co"># A 6.760 6.900 7.43 7.070 7.540 9.290 10</span></span>
<spanid="cb5-9"><ahref="#cb5-9"></a><spanclass="co"># B 14.200 14.400 18.80 14.900 16.000 51.500 10</span></span>
<spanid="cb5-10"><ahref="#cb5-10"></a><spanclass="co"># C 0.586 0.726 0.74 0.757 0.763 0.804 10</span></span></code></pre></div>
<p>So going from <code><ahref="../reference/mo_property.html">mo_name("Staphylococcus aureus")</a></code> to <code>"Staphylococcus aureus"</code> takes 0.0008 seconds - it doesn’t even start calculating <em>if the result would be the same as the expected resulting value</em>. That goes for all helper functions:</p>
<spanid="cb6-12"><ahref="#cb6-12"></a><spanclass="co"># expr min lq mean median uq max neval</span></span>
<spanid="cb6-13"><ahref="#cb6-13"></a><spanclass="co"># A 0.499 0.511 0.516 0.517 0.522 0.544 10</span></span>
<spanid="cb6-14"><ahref="#cb6-14"></a><spanclass="co"># B 0.532 0.539 0.550 0.542 0.563 0.592 10</span></span>
<spanid="cb6-15"><ahref="#cb6-15"></a><spanclass="co"># C 0.718 0.787 0.832 0.843 0.889 0.904 10</span></span>
<spanid="cb6-16"><ahref="#cb6-16"></a><spanclass="co"># D 0.538 0.548 0.566 0.567 0.571 0.607 10</span></span>
<spanid="cb6-17"><ahref="#cb6-17"></a><spanclass="co"># E 0.503 0.509 0.515 0.513 0.516 0.549 10</span></span>
<spanid="cb6-18"><ahref="#cb6-18"></a><spanclass="co"># F 0.502 0.504 0.514 0.511 0.519 0.539 10</span></span>
<spanid="cb6-19"><ahref="#cb6-19"></a><spanclass="co"># G 0.493 0.513 0.538 0.514 0.536 0.684 10</span></span>
<spanid="cb6-20"><ahref="#cb6-20"></a><spanclass="co"># H 0.499 0.501 0.509 0.505 0.516 0.531 10</span></span></code></pre></div>
<spanid="cb6-13"><ahref="#cb6-13"></a><spanclass="co"># A 0.374 0.381 0.389 0.389 0.395 0.416 10</span></span>
<spanid="cb6-14"><ahref="#cb6-14"></a><spanclass="co"># B 0.404 0.411 0.422 0.421 0.425 0.452 10</span></span>
<spanid="cb6-15"><ahref="#cb6-15"></a><spanclass="co"># C 0.615 0.711 0.726 0.730 0.751 0.861 10</span></span>
<spanid="cb6-16"><ahref="#cb6-16"></a><spanclass="co"># D 0.405 0.409 0.429 0.428 0.435 0.485 10</span></span>
<spanid="cb6-17"><ahref="#cb6-17"></a><spanclass="co"># E 0.381 0.384 0.392 0.390 0.394 0.429 10</span></span>
<spanid="cb6-18"><ahref="#cb6-18"></a><spanclass="co"># F 0.365 0.366 0.379 0.375 0.383 0.419 10</span></span>
<spanid="cb6-19"><ahref="#cb6-19"></a><spanclass="co"># G 0.362 0.372 0.378 0.380 0.388 0.391 10</span></span>
<spanid="cb6-20"><ahref="#cb6-20"></a><spanclass="co"># H 0.378 0.381 0.403 0.387 0.393 0.556 10</span></span></code></pre></div>
<p>Of course, when running <code><ahref="../reference/mo_property.html">mo_phylum("Firmicutes")</a></code> the function has zero knowledge about the actual microorganism, namely <em>S. aureus</em>. But since the result would be <code>"Firmicutes"</code> anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.</p>