+ Used in more than 100 countries
Since its first public release in early 2018, this package has been downloaded from more than 100 countries (source: CRAN logs). Click the map to enlarge, to see the names of the countries.
+
+
diff --git a/DESCRIPTION b/DESCRIPTION index f7afae4b..afbd4558 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 1.2.0.9029 -Date: 2020-07-08 +Version: 1.2.0.9030 +Date: 2020-07-09 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NEWS.md b/NEWS.md index 0eb541f0..114669f7 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,5 @@ -# AMR 1.2.0.9029 -## Last updated: 08-Jul-2020 +# AMR 1.2.0.9030 +## Last updated: 09-Jul-2020 ### New * Function `ab_from_text()` to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses `as.ab()` internally diff --git a/docs/404.html b/docs/404.html index 03c2b382..ceefe84f 100644 --- a/docs/404.html +++ b/docs/404.html @@ -81,7 +81,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 95c14eb1..2dfb5724 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -81,7 +81,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 7406b5fa..3943c64d 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index a71b113d..13c83362 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 0d88e289..b5299cb9 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index caff3eb9..406c1e5c 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index fb3520ab..49b54965 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 42fb770f..774397ab 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/MDR.html b/docs/articles/MDR.html index f5b31017..5cdee42d 100644 --- a/docs/articles/MDR.html +++ b/docs/articles/MDR.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/PCA.html b/docs/articles/PCA.html index 6694bba8..41c90855 100644 --- a/docs/articles/PCA.html +++ b/docs/articles/PCA.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index fd8a38d6..148cf2f9 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -39,7 +39,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 @@ -186,7 +186,7 @@
vignettes/SPSS.Rmd
SPSS.Rmd
AMR
(for R). Developed at the University of Groningen in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen.
NEWS.md
- R/ab_class_selectors.R
+ Source: R/ab_class_selectors.R
antibiotic_class_selectors.Rd
GNU GENERAL PUBLIC LICENSE +Version 2, June 1991 + +Copyright (C) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +Everyone is permitted to copy and distribute verbatim copies +of this license document, but changing it is not allowed. + +A SUMMARY OF THIS LICENSE BY THE ORIGINAL AUTHORS OF THE AMR R PACKAGE + +This R package, with package name 'AMR': +- May be used for commercial purposes +- May be used for private purposes +- May NOT be used for patent purposes +- May be modified, although: + - Modifications MUST be released under the same license when distributing the package + - Changes made to the code MUST be documented +- May be distributed, although: + - Source code MUST be made available when the package is distributed + - A copy of the license and copyright notice MUST be included with the package. +- Comes with a LIMITATION of liability +- Comes with NO warranty + +END OF THE SUMMARY + + +GNU GENERAL PUBLIC LICENSE +TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + +0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + +1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + +2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + +3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + +4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + +5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + +6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + +7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + +8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + +9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + +10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + +NO WARRANTY + +11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + +12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + +END OF TERMS AND CONDITIONS ++ +
vignettes/AMR.Rmd
+ AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 22 June 2020.
+Conducting antimicrobial resistance analysis unfortunately requires in-depth knowledge from different scientific fields, which makes it hard to do right. At least, it requires:
+Of course, we cannot instantly provide you with knowledge and experience. But with this AMR
package, we aimed at providing (1) tools to simplify antimicrobial resistance data cleaning, transformation and analysis, (2) methods to easily incorporate international guidelines and (3) scientifically reliable reference data, including the requirements mentioned above.
The AMR
package enables standardised and reproducible antimicrobial resistance analysis, with the application of evidence-based rules, determination of first isolates, translation of various codes for microorganisms and antimicrobial agents, determination of (multi-drug) resistant microorganisms, and calculation of antimicrobial resistance, prevalence and future trends.
For this tutorial, we will create fake demonstration data to work with.
+You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:
+date | +patient_id | +mo | +AMX | +CIP | +
---|---|---|---|---|
2020-06-22 | +abcd | +Escherichia coli | +S | +S | +
2020-06-22 | +abcd | +Escherichia coli | +S | +R | +
2020-06-22 | +efgh | +Escherichia coli | +R | +S | +
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
We will also use the cleaner
package, that can be used for cleaning data and creating frequency tables.
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
+With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
+To start with patients, we need a unique list of patients.
+ +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
patients_table <- data.frame(patient_id = patients, + gender = c(rep("M", 135), + rep("F", 125)))
The first 135 patient IDs are now male, the other 125 are female.
+Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
+ +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
+bacteria <- c("Escherichia coli", "Staphylococcus aureus", + "Streptococcus pneumoniae", "Klebsiella pneumoniae")
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
+ +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
sample_size <- 20000 +data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE), + patient_id = sample(patients, size = sample_size, replace = TRUE), + hospital = sample(hospitals, size = sample_size, replace = TRUE, + prob = c(0.30, 0.35, 0.15, 0.20)), + bacteria = sample(bacteria, size = sample_size, replace = TRUE, + prob = c(0.50, 0.25, 0.15, 0.10)), + AMX = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.60, 0.05, 0.35)), + AMC = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.75, 0.10, 0.15)), + CIP = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.80, 0.00, 0.20)), + GEN = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.92, 0.00, 0.08)))
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
data <- data %>% left_join(patients_table)
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 rows of this data set:
head(data)
date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +
---|---|---|---|---|---|---|---|---|
2014-04-15 | +I4 | +Hospital D | +Escherichia coli | +S | +R | +S | +S | +M | +
2011-02-09 | +D1 | +Hospital A | +Escherichia coli | +S | +S | +S | +S | +M | +
2013-12-16 | +K4 | +Hospital C | +Staphylococcus aureus | +S | +S | +R | +S | +M | +
2017-08-23 | +Z9 | +Hospital B | +Escherichia coli | +S | +S | +S | +S | +F | +
2010-01-14 | +N4 | +Hospital A | +Staphylococcus aureus | +R | +S | +S | +S | +M | +
2016-01-31 | +N1 | +Hospital D | +Staphylococcus aureus | +R | +S | +R | +S | +M | +
Now, let’s start the cleaning and the analysis!
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. It freq()
function can be used to create frequency tables.
For example, for the gender
variable:
data %>% freq(gender)
Frequency table
+Class: character
+Length: 20,000
+Available: 20,000 (100%, NA: 0 = 0%)
+Unique: 2
Shortest: 1
+Longest: 1
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +M | +10,328 | +51.64% | +10,328 | +51.64% | +
2 | +F | +9,672 | +48.36% | +20,000 | +100.00% | +
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
Now that we have the microbial ID, we can add some taxonomic properties:
+data <- data %>% + mutate(gramstain = mo_gramstain(bacteria), + genus = mo_genus(bacteria), + species = mo_species(bacteria))
We also need to know which isolates we can actually use for analysis.
+To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode (Hindler et al., Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be selection bias.
+The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:
+++(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
+
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
data <- data %>% + mutate(first = first_isolate(.)) +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `date` as input for `col_date`. +# NOTE: Using column `patient_id` as input for `col_patient_id`.
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
data_1st <- data %>% + filter(first == TRUE)
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
data_1st <- data %>% + filter_first_isolate()
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient N3, sorted on date:
+isolate | +date | +patient_id | +bacteria | +AMX | +AMC | +CIP | +GEN | +first | +
---|---|---|---|---|---|---|---|---|
1 | +2010-03-10 | +N3 | +B_ESCHR_COLI | +I | +S | +S | +S | +TRUE | +
2 | +2010-05-11 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
3 | +2010-05-17 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
4 | +2010-05-18 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
5 | +2010-07-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
6 | +2010-09-15 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +R | +FALSE | +
7 | +2010-10-06 | +N3 | +B_ESCHR_COLI | +S | +S | +R | +S | +FALSE | +
8 | +2010-11-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
9 | +2011-01-27 | +N3 | +B_ESCHR_COLI | +R | +I | +S | +S | +FALSE | +
10 | +2011-01-30 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate(first_weighted = first_isolate(.)) +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `date` as input for `col_date`. +# NOTE: Using column `patient_id` as input for `col_patient_id`. +# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
isolate | +date | +patient_id | +bacteria | +AMX | +AMC | +CIP | +GEN | +first | +first_weighted | +
---|---|---|---|---|---|---|---|---|---|
1 | +2010-03-10 | +N3 | +B_ESCHR_COLI | +I | +S | +S | +S | +TRUE | +TRUE | +
2 | +2010-05-11 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +FALSE | +
3 | +2010-05-17 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +TRUE | +
4 | +2010-05-18 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +FALSE | +
5 | +2010-07-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +TRUE | +
6 | +2010-09-15 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +R | +FALSE | +TRUE | +
7 | +2010-10-06 | +N3 | +B_ESCHR_COLI | +S | +S | +R | +S | +FALSE | +TRUE | +
8 | +2010-11-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +TRUE | +
9 | +2011-01-27 | +N3 | +B_ESCHR_COLI | +R | +I | +S | +S | +FALSE | +TRUE | +
10 | +2011-01-30 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +FALSE | +
Instead of 1, now 7 isolates are flagged. In total, 78.7% of all isolates are marked ‘first weighted’ - 50.4% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
data_1st <- data %>% + filter_first_weighted_isolate()
So we end up with 15,740 isolates for analysis.
+We can remove unneeded columns:
+ +Now our data looks like:
+head(data_1st)
date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +gramstain | +genus | +species | +first_weighted | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014-04-15 | +I4 | +Hospital D | +B_ESCHR_COLI | +R | +R | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
2011-02-09 | +D1 | +Hospital A | +B_ESCHR_COLI | +S | +S | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
2013-12-16 | +K4 | +Hospital C | +B_STPHY_AURS | +S | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
2017-08-23 | +Z9 | +Hospital B | +B_ESCHR_COLI | +S | +S | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
2010-01-14 | +N4 | +Hospital A | +B_STPHY_AURS | +R | +S | +S | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
2016-01-31 | +N1 | +Hospital D | +B_STPHY_AURS | +R | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
Time for the analysis!
+You might want to start by getting an idea of how the data is distributed. It’s an important start, because it also decides how you will continue your analysis. Although this package contains a convenient function to make frequency tables, exploratory data analysis (EDA) is not the primary scope of this package. Use a package like DataExplorer
for that, or read the free online book Exploratory Data Analysis with R by Roger D. Peng.
To just get an idea how the species are distributed, create a frequency table with our freq()
function. We created the genus
and species
column earlier based on the microbial ID. With paste()
, we can concatenate them together.
The freq()
function can be used like the base R language was intended:
Or can be used like the dplyr
way, which is easier readable:
data_1st %>% freq(genus, species)
Frequency table
+Class: character
+Length: 15,740
+Available: 15,740 (100%, NA: 0 = 0%)
+Unique: 4
Shortest: 16
+Longest: 24
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +7,938 | +50.43% | +7,938 | +50.43% | +
2 | +Staphylococcus aureus | +3,883 | +24.67% | +11,821 | +75.10% | +
3 | +Streptococcus pneumoniae | +2,317 | +14.72% | +14,138 | +89.82% | +
4 | +Klebsiella pneumoniae | +1,602 | +10.18% | +15,740 | +100.00% | +
If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations()
function:
data_1st %>% + bug_drug_combinations() %>% + head() # show first 6 rows
# NOTE: Using column `bacteria` as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +AMX | +3808 | +236 | +3894 | +7938 | +
E. coli | +AMC | +6223 | +317 | +1398 | +7938 | +
E. coli | +CIP | +6050 | +0 | +1888 | +7938 | +
E. coli | +GEN | +7130 | +0 | +808 | +7938 | +
K. pneumoniae | +AMX | +0 | +0 | +1602 | +1602 | +
K. pneumoniae | +AMC | +1241 | +61 | +300 | +1602 | +
Using Tidyverse selections, you can also select columns based on the antibiotic class they are in:
+data_1st %>% + select(bacteria, fluoroquinolones()) %>% + bug_drug_combinations()
# Selecting fluoroquinolones: `CIP` (ciprofloxacin)
+# NOTE: Using column `bacteria` as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +CIP | +6050 | +0 | +1888 | +7938 | +
K. pneumoniae | +CIP | +1218 | +0 | +384 | +1602 | +
S. aureus | +CIP | +2967 | +0 | +916 | +3883 | +
S. pneumoniae | +CIP | +1756 | +0 | +561 | +2317 | +
This will only give you the crude numbers in the data. To calculate antimicrobial resistance, we use the resistance()
and susceptibility()
functions.
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
data_1st %>% resistance(AMX) +# [1] 0.535324
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
+hospital | +amoxicillin | +
---|---|
Hospital A | +0.5322921 | +
Hospital B | +0.5393839 | +
Hospital C | +0.5327529 | +
Hospital D | +0.5348690 | +
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX), + available = n_rsi(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
+hospital | +amoxicillin | +available | +
---|---|---|
Hospital A | +0.5322921 | +4738 | +
Hospital B | +0.5393839 | +5421 | +
Hospital C | +0.5327529 | +2412 | +
Hospital D | +0.5348690 | +3169 | +
These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:
+data_1st %>% + group_by(genus) %>% + summarise(amoxiclav = susceptibility(AMC), + gentamicin = susceptibility(GEN), + amoxiclav_genta = susceptibility(AMC, GEN))
# `summarise()` ungrouping output (override with `.groups` argument)
+genus | +amoxiclav | +gentamicin | +amoxiclav_genta | +
---|---|---|---|
Escherichia | +0.8238851 | +0.8982111 | +0.9840010 | +
Klebsiella | +0.8127341 | +0.8951311 | +0.9818976 | +
Staphylococcus | +0.8246201 | +0.9260881 | +0.9863508 | +
Streptococcus | +0.5463962 | +0.0000000 | +0.5463962 | +
To make a transition to the next part, let’s see how this difference could be plotted:
+data_1st %>% + group_by(genus) %>% + summarise("1. Amoxi/clav" = susceptibility(AMC), + "2. Gentamicin" = susceptibility(GEN), + "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% + # pivot_longer() from the tidyr package "lengthens" data: + tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% + ggplot(aes(x = genus, + y = value, + fill = antibiotic)) + + geom_col(position = "dodge2") +# `summarise()` ungrouping output (override with `.groups` argument)
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
ggplot(data = a_data_set, + mapping = aes(x = year, + y = value)) + + geom_col() + + labs(title = "A title", + subtitle = "A subtitle", + x = "My X axis", + y = "My Y axis") + +# or as short as: +ggplot(a_data_set) + + geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or proportion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
# group the data on `genus` +ggplot(data_1st %>% group_by(genus)) + + # create bars with genus on x axis + # it looks for variables with class `rsi`, + # of which we have 4 (earlier created with `as.rsi`) + geom_rsi(x = "genus") + + # split plots on antibiotic + facet_rsi(facet = "antibiotic") + + # set colours to the R/SI interpretations + scale_rsi_colours() + + # show percentages on y axis + scale_y_percent(breaks = 0:4 * 25) + + # turn 90 degrees, to make it bars instead of columns + coord_flip() + + # add labels + labs(title = "Resistance per genus and antibiotic", + subtitle = "(this is fake data)") + + # and print genus in italic to follow our convention + # (is now y axis because we turned the plot) + theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
data_1st %>% + group_by(genus) %>% + ggplot_rsi(x = "genus", + facet = "antibiotic", + breaks = 0:4 * 25, + datalabels = FALSE) + + coord_flip()
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
# use package 'tidyr' to pivot data: +library(tidyr) + +check_FOS <- example_isolates %>% + filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D + select(hospital_id, FOS) %>% # select the hospitals and fosfomycin + group_by(hospital_id) %>% # group on the hospitals + count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id) + pivot_wider(names_from = hospital_id, # transform output so A and D are columns + values_from = value) %>% + select(A, D) %>% # and only select these columns + as.matrix() # transform to a good old matrix for fisher.test() + +check_FOS +# A D +# [1,] 25 77 +# [2,] 24 33
We can apply the test now with:
+# do Fisher's Exact Test +fisher.test(check_FOS) +# +# Fisher's Exact Test for Count Data +# +# data: check_FOS +# p-value = 0.03104 +# alternative hypothesis: true odds ratio is not equal to 1 +# 95 percent confidence interval: +# 0.2111489 0.9485124 +# sample estimates: +# odds ratio +# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
+vignettes/EUCAST.Rmd
+ EUCAST.Rmd
What are EUCAST rules? The European Committee on Antimicrobial Susceptibility Testing (EUCAST) states on their website:
+++EUCAST expert rules are a tabulated collection of expert knowledge on intrinsic resistances, exceptional resistance phenotypes and interpretive rules that may be applied to antimicrobial susceptibility testing in order to reduce errors and make appropriate recommendations for reporting particular resistances.
+
In Europe, a lot of medical microbiological laboratories already apply these rules (Brown et al., 2015). Our package features their latest insights on intrinsic resistance and exceptional phenotypes (version 10.0, 2020). Moreover, the eucast_rules()
function we use for this purpose can also apply additional rules, like forcing
These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.
+Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules()
:
oops <- data.frame(mo = c("Klebsiella", + "Escherichia"), + ampicillin = "S") +oops +# mo ampicillin +# 1 Klebsiella S +# 2 Escherichia S + +eucast_rules(oops, info = FALSE) +# mo ampicillin +# 1 Klebsiella R +# 2 Escherichia S
EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading and is part of the eucast_rules()
function as well:
data <- data.frame(mo = c("Staphylococcus aureus", + "Enterococcus faecalis", + "Escherichia coli", + "Klebsiella pneumoniae", + "Pseudomonas aeruginosa"), + VAN = "-", # Vancomycin + AMX = "-", # Amoxicillin + COL = "-", # Colistin + CAZ = "-", # Ceftazidime + CXM = "-", # Cefuroxime + PEN = "S", # Penicillin G + FOX = "S", # Cefoxitin + stringsAsFactors = FALSE)
data
mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +- | +- | +- | +- | +S | +S | +
Enterococcus faecalis | +- | +- | +- | +- | +- | +S | +S | +
Escherichia coli | +- | +- | +- | +- | +- | +S | +S | +
Klebsiella pneumoniae | +- | +- | +- | +- | +- | +S | +S | +
Pseudomonas aeruginosa | +- | +- | +- | +- | +- | +S | +S | +
eucast_rules(data)
# Warning: Not all columns with antimicrobial results are of class <rsi>.
+# Transform eligible columns to class <rsi> on beforehand: your_data %>% mutate_if(is.rsi.eligible, as.rsi)
+mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +S | +R | +R | +S | +S | +S | +
Enterococcus faecalis | +- | +- | +R | +R | +R | +S | +R | +
Escherichia coli | +R | +- | +- | +- | +- | +R | +S | +
Klebsiella pneumoniae | +R | +R | +- | +- | +- | +R | +S | +
Pseudomonas aeruginosa | +R | +R | +- | +- | +R | +R | +R | +
vignettes/MDR.Rmd
+ MDR.Rmd
With the function mdro()
, you can determine which micro-organisms are multi-drug resistant organisms (MDRO).
The mdro()
function takes a data set as input, such as a regular data.frame
. It tries to automatically determine the right columns for info about your isolates, like the name of the species and all columns with results of antimicrobial agents. See the help page for more info about how to set the right settings for your data with the command ?mdro
.
For WHONET data (and most other data), all settings are automatically set correctly.
+The function support multiple guidelines. You can select a guideline with the guideline
parameter. Currently supported guidelines are (case-insensitive):
guideline = "CMI2012"
(default)
guideline = "EUCAST"
guideline = "TB"
guideline = "MRGN"
guideline = "BRMO"
The Dutch national guideline - Rijksinstituut voor Volksgezondheid en Milieu “WIP-richtlijn BRMO (Bijzonder Resistente Micro-Organismen) [ZKH]” (link)
+The mdro()
function always returns an ordered factor
. For example, the output of the default guideline by Magiorakos et al. returns a factor
with levels ‘Negative’, ‘MDR’, ‘XDR’ or ‘PDR’ in that order.
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis. If we test the MDR/XDR/PDR guideline on this data set, we get:
example_isolates %>% + mdro() %>% + freq() # show frequency table of the result +# NOTE: Using column `mo` as input for `col_mo`. +# NOTE: Auto-guessing columns suitable for analysis...OK. +# NOTE: Reliability would be improved if these antimicrobial results would be available too: ceftaroline (CPT), fusidic acid (FUS), telavancin (TLV), daptomycin (DAP), quinupristin/dalfopristin (QDA), minocycline (MNO), gentamicin-high (GEH), streptomycin-high (STH), doripenem (DOR), levofloxacin (LVX), netilmicin (NET), ticarcillin/clavulanic acid (TCC), ertapenem (ETP), cefotetan (CTT), aztreonam (ATM), ampicillin/sulbactam (SAM), polymyxin B (PLB) +# Warning in mdro(.): NA introduced for isolates where the available percentage of +# antimicrobial classes was below 50% (set with `pct_required_classes`)
Frequency table
+Class: factor > ordered (numeric)
+Length: 2,000
+Levels: 4: Negative < Multi-drug-resistant (MDR) < Extensively drug-resistant …
+Available: 1,711 (85.55%, NA: 289 = 14.45%)
+Unique: 2
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Negative | +1595 | +93.22% | +1595 | +93.22% | +
2 | +Multi-drug-resistant (MDR) | +116 | +6.78% | +1711 | +100.00% | +
For another example, I will create a data set to determine multi-drug resistant TB:
+# a helper function to get a random vector with values S, I and R +# with the probabilities 50% - 10% - 40% +sample_rsi <- function() { + sample(c("S", "I", "R"), + size = 5000, + prob = c(0.5, 0.1, 0.4), + replace = TRUE) +} + +my_TB_data <- data.frame(rifampicin = sample_rsi(), + isoniazid = sample_rsi(), + gatifloxacin = sample_rsi(), + ethambutol = sample_rsi(), + pyrazinamide = sample_rsi(), + moxifloxacin = sample_rsi(), + kanamycin = sample_rsi())
Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:
+my_TB_data <- data.frame(RIF = sample_rsi(), + INH = sample_rsi(), + GAT = sample_rsi(), + ETH = sample_rsi(), + PZA = sample_rsi(), + MFX = sample_rsi(), + KAN = sample_rsi())
The data set now looks like this:
+head(my_TB_data) +# rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin +# 1 S R R S R R +# 2 R S R S R S +# 3 R R S S R S +# 4 S S S S R S +# 5 S R S S R S +# 6 R S R S S S +# kanamycin +# 1 R +# 2 I +# 3 R +# 4 S +# 5 R +# 6 S
We can now add the interpretation of MDR-TB to our data set. You can use:
+mdro(my_TB_data, guideline = "TB")
or its shortcut mdr_tb()
:
my_TB_data$mdr <- mdr_tb(my_TB_data) +# NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis. +# NOTE: Auto-guessing columns suitable for analysis...OK. +# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP)
Create a frequency table of the results:
+freq(my_TB_data$mdr)
Frequency table
+Class: factor > ordered (numeric)
+Length: 5,000
+Levels: 5: Negative < Mono-resistant < Poly-resistant < Multi-drug-resistant <…
+Available: 5,000 (100%, NA: 0 = 0%)
+Unique: 5
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Mono-resistant | +3245 | +64.90% | +3245 | +64.90% | +
2 | +Negative | +678 | +13.56% | +3923 | +78.46% | +
3 | +Multi-drug-resistant | +607 | +12.14% | +4530 | +90.60% | +
4 | +Poly-resistant | +262 | +5.24% | +4792 | +95.84% | +
5 | +Extensively drug-resistant | +208 | +4.16% | +5000 | +100.00% | +
vignettes/PCA.Rmd
+ PCA.Rmd
NOTE: This page will be updated soon, as the pca() function is currently being developed.
+ +For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
library(AMR) +library(dplyr) +glimpse(example_isolates) +# Rows: 2,000 +# Columns: 49 +# $ date <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002… +# $ hospital_id <fct> D, D, B, B, B, B, D, D, B, B, D, D, D, D, D, B, B, B,… +# $ ward_icu <lgl> FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, T… +# $ ward_clinical <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, F… +# $ ward_outpatient <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS… +# $ age <dbl> 65, 65, 45, 45, 45, 45, 78, 78, 45, 79, 67, 67, 71, 7… +# $ gender <chr> "F", "F", "F", "F", "F", "F", "M", "M", "F", "F", "M"… +# $ patient_id <chr> "A77334", "A77334", "067927", "067927", "067927", "06… +# $ mo <mo> "B_ESCHR_COLI", "B_ESCHR_COLI", "B_STPHY_EPDR", "B_STP… +# $ PEN <ord> R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R,… +# $ OXA <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FLC <ord> NA, NA, R, R, R, R, S, S, R, S, S, S, NA, NA, NA, NA,… +# $ AMX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ AMC <ord> I, I, NA, NA, NA, NA, S, S, NA, NA, S, S, I, I, R, I,… +# $ AMP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TZP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CZO <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FEP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CXM <ord> I, I, R, R, R, R, S, S, R, S, S, S, S, S, NA, S, S, R… +# $ FOX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CTX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ CAZ <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, S, … +# $ CRO <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ GEN <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TOB <ord> NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, S, S, N… +# $ AMK <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ KAN <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TMP <ord> R, R, S, S, R, R, R, R, S, S, NA, NA, S, S, S, S, S, … +# $ SXT <ord> R, R, S, S, NA, NA, NA, NA, S, S, NA, NA, S, S, S, S,… +# $ NIT <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FOS <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ LNZ <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… +# $ CIP <ord> NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA,… +# $ MFX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ VAN <ord> R, R, S, S, S, S, S, S, S, S, NA, NA, R, R, R, R, R, … +# $ TEC <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… +# $ TCY <ord> R, R, S, S, S, S, S, S, S, I, S, S, NA, NA, I, R, R, … +# $ TGC <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ DOX <ord> NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R,… +# $ ERY <ord> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R,… +# $ CLI <ord> NA, NA, NA, NA, NA, R, NA, NA, NA, NA, NA, NA, NA, NA… +# $ AZM <ord> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R,… +# $ IPM <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ MEM <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ MTR <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CHL <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ COL <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, … +# $ MUP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ RIF <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
+resistance_data <- example_isolates %>% + group_by(order = mo_order(mo), # group on anything, like order + genus = mo_genus(mo)) %>% # and genus as we do here + summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs + select(order, genus, AMC, CXM, CTX, + CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns + +head(resistance_data) +# # A tibble: 6 x 10 +# # Groups: order [2] +# order genus AMC CXM CTX CAZ GEN TOB TMP SXT +# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +# 1 (unknown order) (unknown genu… NA NA NA NA NA NA NA NA +# 2 Actinomycetales Corynebacteri… NA NA NA NA NA NA NA NA +# 3 Actinomycetales Cutibacterium NA NA NA NA NA NA NA NA +# 4 Actinomycetales Dermabacter NA NA NA NA NA NA NA NA +# 5 Actinomycetales Micrococcus NA NA NA NA NA NA NA NA +# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
pca_result <- pca(resistance_data) +# NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT. +# Total observations available: 7.
The result can be reviewed with the good old summary()
function:
summary(pca_result) +# Importance of components: +# PC1 PC2 PC3 PC4 PC5 PC6 PC7 +# Standard deviation 2.154 1.6809 0.61305 0.33882 0.20755 0.03137 1.602e-16 +# Proportion of Variance 0.580 0.3532 0.04698 0.01435 0.00538 0.00012 0.000e+00 +# Cumulative Proportion 0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00
Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
biplot(pca_result)
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
ggplot_pca(pca_result)
You can also print an ellipse per group, and edit the appearance:
+ggplot_pca(pca_result, ellipse = TRUE) + + ggplot2::labs(title = "An AMR/PCA biplot!")
vignettes/SPSS.Rmd
+ SPSS.Rmd
SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendliness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are comparable statistical packages popular in big industries.
+As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:
+R is highly modular.
+The official R network (CRAN) features almost 14,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitHub. So there may even be a lot more than 14,000 packages out there.
Bottom line is, you can really extend it yourself or ask somebody to do this for you. Take for example our AMR
package. Among other things, it adds reliable reference data to R to help you with the data cleaning and analysis. SPSS, SAS and Stata will never know what a valid MIC value is or what the Gram stain of E. coli is. Or that all species of Klebiella are resistant to amoxicillin and that Floxapen® is a trade name of flucloxacillin. These facts and properties are often needed to clean existing data, which would be very inconvenient in a software package without reliable reference data. See below for a demonstration.
R is extremely flexible.
+Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, arranging, grouping and summarising data, or drawing plots, is endless - with SPSS, SAS or Stata you are bound to their algorithms and format styles. They may be a bit flexible, but you can probably never create that very specific publication-ready plot without using other (paid) software. If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you could do this a lot less time in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
+R can be easily automated.
+Over the last years, R Markdown has really made an interesting development. With R Markdown, you can very easily produce reports, whether the format has to be Word, PowerPoint, a website, a PDF document or just the raw data to Excel. It even allows the use of a reference file containing the layout style (e.g. fonts and colours) of your organisation. I use this a lot to generate weekly and monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.
+For an even more professional environment, you could create Shiny apps: live manipulation of data using a custom made website. The webdesign knowledge needed (JavaScript, CSS, HTML) is almost zero.
+R has a huge community.
+Many R users just ask questions on websites like StackOverflow.com, the largest online community for programmers. At the time of writing, more than 300,000 R-related questions have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
+R understands any data type, including SPSS/SAS/Stata.
+And that’s not vice versa I’m afraid. You can import data from any source into R. For example from SPSS, SAS and Stata (link), from Minitab, Epi Info and EpiData (link), from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases and datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link) or get the results of an API call and transform it into data in only one command (link).
+And the best part - you can export from R to most data formats as well. So you can import an SPSS file, do your analysis neatly in R and export the resulting tables to Excel files for sharing.
+R is completely free and open-source.
+No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying between USD 1,300 and USD 8,500 per user per year. SAS Analytics Pro costs around USD 10,000 per computer. Stata also has a business model with subscription fees, varying between USD 600 and USD 2,800 per computer per year, but lower prices come with a limitation of the number of variables you can work with. And still they do not offer the above benefits of R.
+If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
+R is (nowadays) the preferred analysis software in academic papers.
+At present, R is among the world most powerful statistical languages, and it is generally very popular in science (Bollmann et al., 2017). For all the above reasons, the number of references to R as an analysis method in academic papers is rising continuously and has even surpassed SPSS for academic use (Muenchen, 2014).
+I believe that the thing with SPSS is, that it has always had a great user interface which is very easy to learn and use. Back when they developed it, they had very little competition, let alone from R. R didn’t even had a professional user interface until the last decade (called RStudio, see below). How people used R between the nineties and 2010 is almost completely incomparable to how R is being used now. The language itself has been restyled completely by volunteers who are dedicated professionals in the field of data science. SPSS was great when there was nothing else that could compete. But now in 2020, I don’t see any reason why SPSS would be of any better use than R.
+To demonstrate the first point:
+# not all values are valid MIC values: +as.mic(0.125) +# Class <mic> +# [1] 0.125 +as.mic("testvalue") +# Class <mic> +# [1] <NA> + +# the Gram stain is avaiable for all bacteria: +mo_gramstain("E. coli") +# [1] "Gram-negative" + +# Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST: +klebsiella_test <- data.frame(mo = "klebsiella", + amox = "S", + stringsAsFactors = FALSE) +klebsiella_test # (our original data) +# mo amox +# 1 klebsiella S +eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) +# mo amox +# 1 klebsiella R + +# hundreds of trade names can be translated to a name, trade name or an ATC code: +ab_name("floxapen") +# [1] "Flucloxacillin" +ab_tradenames("floxapen") +# [1] "floxacillin" "floxapen" "floxapen sodium salt" +# [4] "fluclox" "flucloxacilina" "flucloxacillin" +# [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin" +ab_atc("floxapen") +# [1] "J01CF05"
To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also install RStudio Server on a private or corporate server, which brings nothing less than the complete RStudio software to you as a website (at home or at work).
+To import a data file, just click Import Dataset in the Environment tab:
+If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
+In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:
+If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
+SPSS_data +# # A tibble: 4,203 x 4 +# v001 sex status statusage +# <dbl> <dbl+lbl> <dbl+lbl> <dbl> +# 1 10002 1 1 76.6 +# 2 10004 0 1 59.1 +# 3 10005 1 1 54.5 +# 4 10006 1 1 54.1 +# 5 10007 1 1 57.7 +# 6 10008 1 1 62.8 +# 7 10010 0 1 63.7 +# 8 10011 1 1 73.1 +# 9 10017 1 1 56.7 +# 10 10018 0 1 66.6 +# # … with 4,193 more rows + +as_factor(SPSS_data) +# # A tibble: 4,203 x 4 +# v001 sex status statusage +# <dbl> <fct> <fct> <dbl> +# 1 10002 Male alive 76.6 +# 2 10004 Female alive 59.1 +# 3 10005 Male alive 54.5 +# 4 10006 Male alive 54.1 +# 5 10007 Male alive 57.7 +# 6 10008 Male alive 62.8 +# 7 10010 Female alive 63.7 +# 8 10011 Male alive 73.1 +# 9 10017 Male alive 56.7 +# 10 10018 Female alive 66.6 +# # … with 4,193 more rows
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
# download and install the latest version: +install.packages("haven") +# load the package you just installed: +library(haven)
You can now import files as follows:
+To read files from SPSS into R:
+# read any SPSS file based on file extension (best way): +read_spss(file = "path/to/file") + +# read .sav or .zsav file: +read_sav(file = "path/to/file") + +# read .por file: +read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
+ +To read files from SAS into R:
+# read .sas7bdat + .sas7bcat files: +read_sas(data_file = "path/to/file", catalog_file = NULL) + +# read SAS transport files (version 5 and version 8): +read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
+ +To read files from Stata into R:
+# read .dta file: +read_stata(file = "/path/to/file") + +# works exactly the same: +read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
+# save as .dta file, Stata version 14: +# (supports Stata v8 until v15 at the time of writing) +write_dta(data = yourdata, path = "/path/to/file", version = 14)
vignettes/WHONET.Rmd
+ WHONET.Rmd
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
+library(readxl) +data <- read_excel(path = "path/to/your/file.xlsx")
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
+library(dplyr) # part of tidyverse +library(ggplot2) # part of tidyverse +library(AMR) # this package +library(cleaner) # to create frequency tables
We will have to transform some variables to simplify and automate the analysis:
+mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.# transform variables +data <- WHONET %>% + # get microbial ID based on given organism + mutate(mo = as.mo(Organism)) %>% + # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class + mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully.
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
+# our newly created `mo` variable, put in the mo_name() function +data %>% freq(mo_name(mo), nmax = 10)
Frequency table
+Class: character
+Length: 500
+Available: 500 (100%, NA: 0 = 0%)
+Unique: 37
Shortest: 11
+Longest: 40
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +245 | +49.0% | +245 | +49.0% | +
2 | +Coagulase-negative Staphylococcus (CoNS) | +74 | +14.8% | +319 | +63.8% | +
3 | +Staphylococcus epidermidis | +38 | +7.6% | +357 | +71.4% | +
4 | +Streptococcus pneumoniae | +31 | +6.2% | +388 | +77.6% | +
5 | +Staphylococcus hominis | +21 | +4.2% | +409 | +81.8% | +
6 | +Proteus mirabilis | +9 | +1.8% | +418 | +83.6% | +
7 | +Enterococcus faecium | +8 | +1.6% | +426 | +85.2% | +
8 | +Staphylococcus capitis | +8 | +1.6% | +434 | +86.8% | +
9 | +Enterobacter cloacae | +5 | +1.0% | +439 | +87.8% | +
10 | +Streptococcus anginosus | +5 | +1.0% | +444 | +88.8% | +
(omitted 27 entries, n = 56 [11.20%])
+# our transformed antibiotic columns +# amoxicillin/clavulanic acid (J01CR02) as an example +data %>% freq(AMC_ND2)
Frequency table
+Class: factor > ordered > rsi (numeric)
+Length: 500
+Levels: 3: S < I < R
+Available: 481 (96.2%, NA: 19 = 3.8%)
+Unique: 3
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +S | +356 | +74.01% | +356 | +74.01% | +
2 | +R | +103 | +21.41% | +459 | +95.43% | +
3 | +I | +22 | +4.57% | +481 | +100.00% | +
An easy ggplot
will already give a lot of information, using the included ggplot_rsi()
function:
data %>% + group_by(Country) %>% + select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>% + ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
vignettes/benchmarks.Rmd
+ benchmarks.Rmd
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
microbenchmark <- microbenchmark::microbenchmark +library(AMR) +library(dplyr)
In the next test, we try to ‘coerce’ different input values into the microbial code of Staphylococcus aureus. Coercion is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties.
+The actual result is the same every time: it returns its microorganism code B_STPHY_AURS
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
+S.aureus <- microbenchmark( + as.mo("sau"), # WHONET code + as.mo("stau"), + as.mo("STAU"), + as.mo("staaur"), + as.mo("STAAUR"), + as.mo("S. aureus"), + as.mo("S aureus"), + as.mo("Staphylococcus aureus"), # official taxonomic name + as.mo("Staphylococcus aureus (MRSA)"), # additional text + as.mo("Sthafilokkockus aaureuz"), # incorrect spelling + as.mo("MRSA"), # Methicillin Resistant S. aureus + as.mo("VISA"), # Vancomycin Intermediate S. aureus + as.mo("VRSA"), # Vancomycin Resistant S. aureus + as.mo(22242419), # Catalogue of Life ID + times = 10) +print(S.aureus, unit = "ms", signif = 2) +# Unit: milliseconds +# expr min lq mean median uq max +# as.mo("sau") 8.5 11.0 17.0 12.0 12.0 43.0 +# as.mo("stau") 120.0 130.0 150.0 140.0 160.0 180.0 +# as.mo("STAU") 130.0 140.0 150.0 150.0 160.0 170.0 +# as.mo("staaur") 7.7 9.1 13.0 11.0 12.0 38.0 +# as.mo("STAAUR") 8.3 9.3 15.0 10.0 11.0 37.0 +# as.mo("S. aureus") 11.0 12.0 18.0 13.0 14.0 41.0 +# as.mo("S aureus") 8.8 11.0 17.0 12.0 13.0 41.0 +# as.mo("Staphylococcus aureus") 6.4 6.6 7.4 7.6 7.8 9.1 +# as.mo("Staphylococcus aureus (MRSA)") 810.0 870.0 890.0 890.0 900.0 1000.0 +# as.mo("Sthafilokkockus aaureuz") 320.0 340.0 370.0 350.0 400.0 490.0 +# as.mo("MRSA") 9.2 10.0 13.0 11.0 12.0 37.0 +# as.mo("VISA") 12.0 12.0 22.0 13.0 43.0 44.0 +# as.mo("VRSA") 11.0 13.0 21.0 14.0 38.0 41.0 +# as.mo(22242419) 130.0 140.0 150.0 140.0 170.0 200.0 +# neval +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second.
+To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS
), a bug probably never found before in humans:
M.semesiae <- microbenchmark(as.mo("metsem"), + as.mo("METSEM"), + as.mo("M. semesiae"), + as.mo("M. semesiae"), + as.mo("Methanosarcina semesiae"), + times = 10) +print(M.semesiae, unit = "ms", signif = 4) +# Unit: milliseconds +# expr min lq mean median uq max +# as.mo("metsem") 143.400 146.300 156.10 155.400 164.900 176.40 +# as.mo("METSEM") 141.600 146.900 167.00 170.700 185.000 188.00 +# as.mo("M. semesiae") 9.665 9.879 16.50 10.090 11.960 44.29 +# as.mo("M. semesiae") 10.000 10.080 14.46 11.660 13.140 42.01 +# as.mo("Methanosarcina semesiae") 7.161 7.389 10.40 7.542 9.294 33.00 +# neval +# 10 +# 10 +# 10 +# 10 +# 10
Looking up arbitrary codes of less prevalent microorganisms costs the most time. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
+In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):
+Uncommon microorganisms take some more time than common microorganisms. To further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
+Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
# take all MO codes from the example_isolates data set +x <- example_isolates$mo %>% + # keep only the unique ones + unique() %>% + # pick 50 of them at random + sample(50) %>% + # paste that 10,000 times + rep(10000) %>% + # scramble it + sample() + +# got indeed 50 times 10,000 = half a million? +length(x) +# [1] 500000 + +# and how many unique values do we have? +n_distinct(x) +# [1] 50 + +# now let's see: +run_it <- microbenchmark(mo_name(x), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# mo_name(x) 1650 1730 1790 1790 1840 1900 10
So transforming 500,000 values (!!) of 50 unique values only takes 1.79 seconds. You only lose time on your unique input values.
+What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"), + B = mo_name("S. aureus"), + C = mo_name("Staphylococcus aureus"), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# A 5.680 5.820 9.61 6.36 6.850 39.500 10 +# B 9.790 10.000 10.60 10.40 10.900 11.900 10 +# C 0.229 0.259 0.27 0.27 0.286 0.311 10
So going from mo_name("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0003 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
run_it <- microbenchmark(A = mo_species("aureus"), + B = mo_genus("Staphylococcus"), + C = mo_name("Staphylococcus aureus"), + D = mo_family("Staphylococcaceae"), + E = mo_order("Bacillales"), + F = mo_class("Bacilli"), + G = mo_phylum("Firmicutes"), + H = mo_kingdom("Bacteria"), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# A 0.209 0.221 0.236 0.225 0.244 0.311 10 +# B 0.197 0.201 0.215 0.212 0.222 0.266 10 +# C 0.205 0.224 0.243 0.229 0.242 0.383 10 +# D 0.199 0.207 0.216 0.211 0.214 0.270 10 +# E 0.196 0.206 0.218 0.215 0.221 0.270 10 +# F 0.188 0.197 0.212 0.210 0.216 0.269 10 +# G 0.195 0.198 0.213 0.203 0.215 0.299 10 +# H 0.184 0.193 0.205 0.201 0.207 0.252 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system +# [1] "Coagulase-negative Staphylococcus (CoNS)" + +mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system +# [1] "Staphylococcus coagulasa negativo (SCN)" + +mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system +# [1] "Coagulase-negatieve Staphylococcus (CNS)" + +run_it <- microbenchmark(en = mo_name("CoNS", language = "en"), + de = mo_name("CoNS", language = "de"), + nl = mo_name("CoNS", language = "nl"), + es = mo_name("CoNS", language = "es"), + it = mo_name("CoNS", language = "it"), + fr = mo_name("CoNS", language = "fr"), + pt = mo_name("CoNS", language = "pt"), + times = 100) +print(run_it, unit = "ms", signif = 4) +# Unit: milliseconds +# expr min lq mean median uq max neval +# en 9.303 11.59 14.90 12.40 13.63 45.92 100 +# de 10.080 12.39 15.77 13.11 14.45 46.27 100 +# nl 13.200 16.26 20.88 17.80 19.52 49.93 100 +# es 9.957 12.23 15.57 13.12 14.59 51.99 100 +# it 10.210 12.44 19.02 13.34 14.74 52.96 100 +# fr 10.040 12.40 18.90 13.26 15.07 54.40 100 +# pt 10.450 12.67 16.91 13.46 14.68 51.47 100
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
+vignettes/resistance_predict.Rmd
+ resistance_predict.Rmd
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
+# resistance prediction of piperacillin/tazobactam (TZP): +resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial") + +# or: +example_isolates %>% + resistance_predict(col_ab = "TZP", + model "binomial") + +# to bind it to object 'predict_TZP' for example: +predict_TZP <- example_isolates %>% + resistance_predict(col_ab = "TZP", + model = "binomial")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# NOTE: Using column `date` as input for `col_date`.
+This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
predict_TZP +# year value se_min se_max observations observed estimated +# 1 2002 0.20000000 NA NA 15 0.20000000 0.05616378 +# 2 2003 0.06250000 NA NA 32 0.06250000 0.06163839 +# 3 2004 0.08536585 NA NA 82 0.08536585 0.06760841 +# 4 2005 0.05000000 NA NA 60 0.05000000 0.07411100 +# 5 2006 0.05084746 NA NA 59 0.05084746 0.08118454 +# 6 2007 0.12121212 NA NA 66 0.12121212 0.08886843 +# 7 2008 0.04166667 NA NA 72 0.04166667 0.09720264 +# 8 2009 0.01639344 NA NA 61 0.01639344 0.10622731 +# 9 2010 0.05660377 NA NA 53 0.05660377 0.11598223 +# 10 2011 0.18279570 NA NA 93 0.18279570 0.12650615 +# 11 2012 0.30769231 NA NA 65 0.30769231 0.13783610 +# 12 2013 0.06896552 NA NA 58 0.06896552 0.15000651 +# 13 2014 0.10000000 NA NA 60 0.10000000 0.16304829 +# 14 2015 0.23636364 NA NA 55 0.23636364 0.17698785 +# 15 2016 0.22619048 NA NA 84 0.22619048 0.19184597 +# 16 2017 0.16279070 NA NA 86 0.16279070 0.20763675 +# 17 2018 0.22436641 0.1938710 0.2548618 NA NA 0.22436641 +# 18 2019 0.24203228 0.2062911 0.2777735 NA NA 0.24203228 +# 19 2020 0.26062172 0.2191758 0.3020676 NA NA 0.26062172 +# 20 2021 0.28011130 0.2325557 0.3276669 NA NA 0.28011130 +# 21 2022 0.30046606 0.2464567 0.3544755 NA NA 0.30046606 +# 22 2023 0.32163907 0.2609011 0.3823771 NA NA 0.32163907 +# 23 2024 0.34357130 0.2759081 0.4112345 NA NA 0.34357130 +# 24 2025 0.36619175 0.2914934 0.4408901 NA NA 0.36619175 +# 25 2026 0.38941799 0.3076686 0.4711674 NA NA 0.38941799 +# 26 2027 0.41315710 0.3244399 0.5018743 NA NA 0.41315710 +# 27 2028 0.43730688 0.3418075 0.5328063 NA NA 0.43730688 +# 28 2029 0.46175755 0.3597639 0.5637512 NA NA 0.46175755 +# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
+We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
ggplot_rsi_predict(predict_TZP)
+# choose for error bars instead of a ribbon +ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
+example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% + ggplot_rsi_predict() +# NOTE: Using column `date` as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
+You can define the model with the model
parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
Valid values are:
+Input values | +Function used by R | +Type of model | +
---|---|---|
+"binomial" or "binom" or "logit"
+ |
+glm(..., family = binomial) |
+Generalised linear model with binomial distribution | +
+"loglin" or "poisson"
+ |
+glm(..., family = poisson) |
+Generalised linear model with poisson distribution | +
+"lin" or "linear"
+ |
+lm() |
+Linear model | +
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:
+example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% + ggplot_rsi_predict() +# NOTE: Using column `date` as input for `col_date`.
This seems more likely, doesn’t it?
+The model itself is also available from the object, as an attribute
:
model <- attributes(predict_TZP)$model + +summary(model)$family +# +# Family: binomial +# Link function: logit + +summary(model)$coefficients +# Estimate Std. Error z value Pr(>|z|) +# (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05 +# year 0.09883005 0.02295317 4.305725 1.664395e-05
inst/CITATION
+ Berends MS, Luz CF et al. (2019). AMR - An R Package for Working with Antimicrobial Resistance Data. bioRxiv, https://doi.org/10.1101/810622
+@Article{, + title = {AMR - An R Package for Working with Antimicrobial Resistance Data}, + author = {M S Berends and C F Luz and A W Friedrich and B N M Sinha and C J Albers and C Glasner}, + journal = {bioRxiv}, + publisher = {Cold Spring Harbor Laboratory}, + year = {2019}, + url = {https://doi.org/10.1101/810622}, +}+ +
Matthijs S. Berends. Author, maintainer. +
+Christian F. Luz. Author, contributor. +
+Alexander W. Friedrich. Author, thesis advisor. +
+Bhanu N. M. Sinha. Author, thesis advisor. +
+Casper J. Albers. Author, thesis advisor. +
+Corinna Glasner. Author, thesis advisor. +
+Judith M. Fonville. Contributor. +
+Erwin E. A. Hassing. Contributor. +
+Eric H. L. C. M. Hazenberg. Contributor. +
+Annick Lenglet. Contributor. +
+Bart C. Meijer. Contributor. +
+Sofia Ny. Contributor. +
+Dennis Souverein. Contributor. +
+AMR
(for R). Developed at the University of Groningen in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen.
++METHODS PAPER PREPRINTED
+
+A methods paper about this package has been preprinted at bioRxiv (DOI: 10.1101/810622). Please click here for the paper on bioRxiv’s publishers page.
AMR
(for R)?(To find out how to conduct AMR analysis, please continue reading here to get started.)
+AMR
is a free, open-source and independent R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. Our aim is to provide a standard for clean and reproducible antimicrobial resistance data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.
After installing this package, R knows ~70,000 distinct microbial species and all ~550 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
+This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen. This R package is actively maintained and is free software (see Copyright).
+
+ Used in more than 100 countries
Since its first public release in early 2018, this package has been downloaded from more than 100 countries (source: CRAN logs). Click the map to enlarge, to see the names of the countries.
+
This package can be used for:
+This package is available here on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R from CRAN by using the command:
+install.packages("AMR")
It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
+Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
+The latest and unpublished development version can be installed from GitHub using:
+install.packages("remotes") +remotes::install_github("msberends/AMR")
To find out how to conduct AMR analysis, please continue reading here to get started or click the links in the ‘How to’ menu.
+This package contains the complete taxonomic tree of almost all ~70,000 microorganisms from the authoritative and comprehensive Catalogue of Life (CoL, www.catalogueoflife.org), supplemented by data from the List of Prokaryotic names with Standing in Nomenclature (LPSN, lpsn.dsmz.de). This supplementation is needed until the CoL+ project is finished, which we await. With catalogue_of_life_version()
can be checked which version of the CoL is included in this package.
Read more about which data from the Catalogue of Life in our manual.
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD, oral and IV) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+Read more about the data from WHOCC in our manual.
+We support WHONET and EARS-Net data. Exported files from WHONET can be imported into R and can be analysed easily using this package. For education purposes, we created an example data set WHONET
with the exact same structure as a WHONET export file. Furthermore, this package also contains a data set antibiotics with all EARS-Net antibiotic abbreviations, and knows almost all WHONET abbreviations for microorganisms. When using WHONET data as input for analysis, all input parameters will be set automatically.
Read our tutorial about how to work with WHONET data here.
+The AMR
package basically does four important things:
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). By installing this package, you teach R everything about microbiology that is needed for analysis. These functions all use intelligent rules to guess results that you would expect:
+as.mo()
to get a microbial ID. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNMN” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AURS”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using intelligent rules combined with the included Catalogue of Life data set. It only takes milliseconds to find results, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.ab()
to get an antibiotic ID. Like microbial IDs, these IDs are also human readable based on those used by EARS-Net. For example, the ID of amoxicillin is AMX
and the ID of gentamicin is GEN
. The as.ab()
function also uses intelligent rules to find results like accepting misspelling, trade names and abbrevations used in many laboratory systems. For instance, the values “Furabid”, “Furadantin”, “nitro” all return the ID of Nitrofurantoine. To accomplish this, the package contains a database with most LIS codes, official names, trade names, ATC codes, defined daily doses (DDD) and drug categories of antibiotics.as.rsi()
to get antibiotic interpretations based on raw MIC values (in mg/L) or disk diffusion values (in mm), or transform existing values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.It enhances existing data and adds new data from data sets included in this package.
+eucast_rules()
to apply EUCAST expert rules to isolates (not the translation from MIC to R/SI values, use as.rsi()
for that).first_isolate()
to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
+mdro()
to determine which micro-organisms are multi-drug resistant organisms (MDRO). It supports a variety of international guidelines, such as the MDR-paper by Magiorakos et al. (2012, PMID 21793988), the exceptional phenotype definitions of EUCAST and the WHO guideline on multi-drug resistant TB. It also supports the national guidelines of the Netherlands and Germany.mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. Use mo_snomed()
to look up any SNOMED CT code associated with a microorganism. As all these function use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.ab_name()
, ab_group()
, ab_atc()
, ab_loinc()
and ab_tradenames()
to look up values. The ab_*
functions use as.ab()
internally so they support the same intelligent rules to guess the most probable result. For example, ab_name("Fluclox")
, ab_name("Floxapen")
and ab_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.It analyses the data with convenient functions that use well-known methods.
+susceptibility()
and resistance()
functions, or be even more specific with the proportion_R()
, proportion_IR()
, proportion_I()
, proportion_SI()
and proportion_S()
functions. Similarly, the number of isolates can be determined with the count_resistant()
, count_susceptible()
and count_all()
functions. All these functions can be used with the dplyr
package (e.g. in conjunction with summarise()
)geom_rsi()
, a function made for the ggplot2
packageresistance_predict()
functionIt teaches the user how to use all the above actions.
+example_isolates
data set. This data set contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.WHONET
data set. This data set only contains fake data, but with the exact same structure as files exported by WHONET. Read more about WHONET on its tutorial page.This R package is free, open-source software and licensed under the GNU General Public License v2.0 (GPL-2). In a nutshell, this means that this package:
+May be used for commercial purposes
May be used for private purposes
May not be used for patent purposes
May be modified, although:
+May be distributed, although:
+Comes with a LIMITATION of liability
Comes with NO warranty
NEWS.md
+ Function ab_from_text()
to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses as.ab()
internally
Tidyverse selections for antibiotic classes, that help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations. They can be used in any function that allows Tidyverse selections, like dplyr::select()
and tidyr::pivot_longer()
:
library(dplyr) + +# Columns 'IPM' and 'MEM' are in the example_isolates data set +example_isolates %>% + select(carbapenems()) +#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem)
Added mo_domain()
as an alias to mo_kingdom()
Added function filter_penicillins()
to filter isolates on a specific result in any column with a name in the antimicrobial ‘penicillins’ class (more specific: ATC subgroup Beta-lactam antibacterials, penicillins)
Added official antimicrobial names to all filter_ab_class()
functions, such as filter_aminoglycosides()
Added antibiotics code “FOX1” for cefoxitin screening (abbreviation “cfsc”) to the antibiotics
data set
Added Monuril as trade name for fosfomycin
susceptibility()
and resistance()
and all count_*()
, proportion_*()
functions:
+dplyr::all_of()
) now works againas.ab()
:
+as.ab()
, making many more input errors translatable, such as digitalised health care records, using too few or too many vowels or consonants and many moreas.ab()
would return an error on invalid input valuesas.ab()
function will now throw a note if more than 1 antimicrobial drug could be retrieved from a single input value.eucast_rules()
would not work on a tibble when the tibble
or dplyr
package was loaded*_join_microorganisms()
functions and bug_drug_combinations()
now return the original data class (e.g. tibble
s and data.table
s)rsi_df()
, proportion_df()
and count_df()
, and fixed a bug where not all different antimicrobial results were added as rows<mo>
and <Date>
+bug_drug_combinations()
for when only one antibiotic was in the input data<mo>
, to highlight the %SI vs. %RRemoved code dependency on all other R packages, making this package fully independent of the development process of others. This is a major code change, but will probably not be noticeable by most users.
+Making this package independent of especially the tidyverse (e.g. packages dplyr
and tidyr
) tremendously increases sustainability on the long term, since tidyverse functions change quite often. Good for users, but hard for package maintainers. Most of our functions are replaced with versions that only rely on base R, which keeps this package fully functional for many years to come, without requiring a lot of maintenance to keep up with other packages anymore. Another upside it that this package can now be used with all versions of R since R-3.0.0 (April 2013). Our package is being used in settings where the resources are very limited. Fewer dependencies on newer software is helpful for such settings.
Negative effects of this change are:
+freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.mo
or rsi
in a tibble will no longer be in colour and printing rsi
in a tibble will show the class <ord>
, not <rsi>
anymore. This is purely a visual effect.mo_*
family (like mo_name()
and mo_gramstain()
) are noticeably slower when running on hundreds of thousands of rows.mo
and ab
now both also inherit class character
, to support any data transformation. This change invalidates code that checks for class length == 1.first_isolate()
), since some bacterial names might be renamed to other genera or other (sub)species. This is expected behaviour.eucast_rules()
function no longer applies “other” rules at default that are made available by this package (like setting ampicillin = R when ampicillin + enzyme inhibitor = R). The default input value for rules
is now c("breakpoints", "expert")
instead of "all"
, but this can be changed by the user. To return to the old behaviour, set options(AMR.eucast_rules = "all")
.antibiotics
data set these two rules:
+eucast_rules()
+ab_url()
to return the direct URL of an antimicrobial agent from the official WHO websiteas.ab()
, so that e.g. as.ab("ampi sul")
and ab_name("ampi sul")
workab_atc()
and ab_group()
now return NA
if no antimicrobial agent could be foundset_mo_source()
to make sure that column mo
will always be the second columnp.symbol()
- it was replaced with p_symbol()
+read.4d()
, that was only useful for reading data from an old test database.pca()
functionggplot_pca()
functionas.mo()
(and consequently all mo_*
functions, that use as.mo()
internally):
+SPE
for species, like "ESCSPE"
for Escherichia coli
+antibiotics
data setas.rsi()
for years 2010-2019 (thanks to Anthony Underwood)Fixed important floating point error for some MIC comparisons in EUCAST 2020 guideline
Interpretation from MIC values (and disk zones) to R/SI can now be used with mutate_at()
of the dplyr
package:
Added antibiotic abbreviations for a laboratory manufacturer (GLIMS) for cefuroxime, cefotaxime, ceftazidime, cefepime, cefoxitin and trimethoprim/sulfamethoxazole
Added uti
(as abbreviation of urinary tract infections) as parameter to as.rsi()
, so interpretation of MIC values and disk zones can be made dependent on isolates specifically from UTIs
Info printing in functions eucast_rules()
, first_isolate()
, mdro()
and resistance_predict()
will now at default only print when R is in an interactive mode (i.e. not in RMarkdown)
This software is now out of beta and considered stable. Nonetheless, this package will be developed continually.
+as.rsi()
and inferred resistance and susceptibility using eucast_rules()
.Support for LOINC codes in the antibiotics
data set. Use ab_loinc()
to retrieve LOINC codes, or use a LOINC code for input in any ab_*
function:
Support for SNOMED CT codes in the microorganisms
data set. Use mo_snomed()
to retrieve SNOMED codes, or use a SNOMED code for input in any mo_*
function:
mo_snomed("S. aureus") +#> [1] 115329001 3092008 113961008 +mo_name(115329001) +#> [1] "Staphylococcus aureus" +mo_gramstain(115329001) +#> [1] "Gram-positive"
as.mo()
function previously wrote to the package folder to improve calculation speed for previously calculated results. This is no longer the case, to comply with CRAN policies. Consequently, the function clear_mo_history()
was removed.as.rsi()
+as.mo()
(and consequently all mo_*
functions, that use as.mo()
internally):
+as.mo("Methicillin-resistant S.aureus")
+as.disk()
limited to a maximum of 50 millimeterstidyverse
+as.ab()
: support for drugs starting with “co-” like co-amoxiclav, co-trimoxazole, co-trimazine and co-trimazole (thanks to Peter Dutey)antibiotics
data set (thanks to Peter Dutey):
+RIF
) to rifampicin/isoniazid (RFI
). Please note that the combination rifampicin/isoniazid has no DDDs defined, so e.g. ab_ddd("Rimactazid")
will now return NA
.SMX
) to trimethoprim/sulfamethoxazole (SXT
)microorganisms
data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like Morganellaceae and Yersiniaceae). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with mdro()
will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family.
+
+Functions susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.
library(dplyr) +example_isolates %>% + group_by(bug = mo_name(mo)) %>% + summarise(amoxicillin = resistance(AMX), + amox_clav = resistance(AMC)) %>% + filter(!is.na(amoxicillin) | !is.na(amox_clav))
Support for a new MDRO guideline: Magiorakos AP, Srinivasan A et al. “Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance.” Clinical Microbiology and Infection (2012).
+mdro()
functionmdro(...., verbose = TRUE)
) returns an informative data set where the reason for MDRO determination is given for every isolate, and an list of the resistant antimicrobial agentsData set antivirals
, containing all entries from the ATC J05 group with their DDDs for oral and parenteral treatment
as.mo()
:
+Now allows “ou” where “au” should have been used and vice versa
More intelligent way of coping with some consonants like “l” and “r”
Added a score (a certainty percentage) to mo_uncertainties()
, that is calculated using the Levenshtein distance:
as.mo(c("Stafylococcus aureus", + "staphylokok aureuz")) +#> Warning: +#> Results of two values were guessed with uncertainty. Use mo_uncertainties() to review them. +#> Class 'mo' +#> [1] B_STPHY_AURS B_STPHY_AURS + +mo_uncertainties() +#> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%) +#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%)
as.atc()
- this function was replaced by ab_atc()
+portion_*
functions to proportion_*
. All portion_*
functions are still available as deprecated functions, and will return a warning when used.as.rsi()
over a data set, it will now print the guideline that will be used if it is not specified by the usereucast_rules()
:
+eucast_rules()
are now applied first and not as last anymore. This is to improve the dependency on certain antibiotics for the official EUCAST rules. Please see ?eucast_rules
.as.rsi()
where the input is NA
+mdro()
and eucast_rules()
+antibiotics
data setexample_isolates
data set to better reflect realitymo_info()
+clean
to cleaner
, as this package was renamed accordingly upon CRAN requestDetermination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
first_isolate(..., include_unknown = TRUE)
For WHONET users, this means that all records/isolates with organism code "con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.
For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
# how it works in base R: +x <- factor("A") +x[1] <- "B" +#> Warning message: +#> invalid factor level, NA generated + +# how it now works similarly for classes 'mo' and 'ab': +x <- as.mo("E. coli") +x[1] <- "testvalue" +#> Warning message: +#> invalid microorganism code, NA generated
This is important, because a value like "testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Renamed data set septic_patients
to example_isolates
Function bug_drug_combinations()
to quickly get a data.frame
with the results of all bug-drug combinations in a data set. The column containing microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates) +#> NOTE: Using column `mo` as input for `col_mo`. +x[1:4, ] +#> mo ab S I R total +#> 1 A. baumannii AMC 0 0 3 3 +#> 2 A. baumannii AMK 0 0 0 0 +#> 3 A. baumannii AMP 0 0 3 3 +#> 4 A. baumannii AMX 0 0 3 3 +#> NOTE: Use 'format()' on this result to get a publicable/printable format. + +# change the transformation with the FUN argument to anything you like: +x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain) +#> NOTE: Using column `mo` as input for `col_mo`. +x[1:4, ] +#> mo ab S I R total +#> 1 Gram-negative AMC 469 89 174 732 +#> 2 Gram-negative AMK 251 0 2 253 +#> 3 Gram-negative AMP 227 0 405 632 +#> 4 Gram-negative AMX 227 0 405 632 +#> NOTE: Use 'format()' on this result to get a publicable/printable format.
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
format(x, combine_IR = FALSE)
Additional way to calculate co-resistance, i.e. when using multiple antimicrobials as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibility of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# -------------------------------------------------------------------- +# only_all_tested = FALSE only_all_tested = TRUE +# ----------------------- ----------------------- +# Drug A Drug B include as include as include as include as +# numerator denominator numerator denominator +# -------- -------- ---------- ----------- ---------- ----------- +# S or I S or I X X X X +# R S or I X X X X +# <NA> S or I X X - - +# S or I R X X X X +# R R - X - X +# <NA> R - - - - +# S or I <NA> X X - - +# R <NA> - - - - +# <NA> <NA> - - - - +# --------------------------------------------------------------------
Since this is a major change, usage of the old also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.
tibble
printing support for classes rsi
, mic
, disk
, ab
mo
. When using tibble
s containing antimicrobial columns, values S
will print in green, values I
will print in yellow and values R
will print in red. Microbial IDs (class mo
) will emphasise on the genus and species, not on the kingdom.
as.mo()
(of which some led to additions to the microorganisms
data set). Many thanks to all contributors that helped improving the algorithms.
+B_ENTRC_FAE
could have been both E. faecalis and E. faecium. Its new code is B_ENTRC_FCLS
and E. faecium has become B_ENTRC_FACM
. Also, the Latin character æ (ae) is now preserved at the start of each genus and species abbreviation. For example, the old code for Aerococcus urinae was B_ARCCC_NAE
. This is now B_AERCC_URIN
. IMPORTANT: Old microorganism IDs are still supported, but support will be dropped in a future version. Use as.mo()
on your old codes to transform them to the new format. Using functions from the mo_*
family (like mo_name()
and mo_gramstain()
) on old codes, will throw a warning.as.ab()
, including bidirectional language supportmdro()
function, to determine multi-drug resistant organismseucast_rules()
:
+eucast_rules(..., verbose = TRUE)
) returns more informative and readable outputAMR:::get_column_abx()
)atc
- using as.atc()
is now deprecated in favour of ab_atc()
and this will return a character, not the atc
class anymoreabname()
, ab_official()
, atc_name()
, atc_official()
, atc_property()
, atc_tradenames()
, atc_trivial_nl()
+mo_shortname()
+mo_*
functions where the coercion uncertainties and failures would not be available through mo_uncertainties()
and mo_failures()
anymorecountry
parameter of mdro()
in favour of the already existing guideline
parameter to support multiple guidelines within one countryname
of RIF
is now Rifampicin instead of Rifampinantibiotics
data set is now sorted by name and all cephalosporins now have their generation between bracketsguess_ab_col()
which is now 30 times faster for antibiotic abbreviationsfilter_ab_class()
to be more reliable and to support 5th generation cephalosporinsavailability()
now uses portion_R()
instead of portion_IR()
, to comply with EUCAST insightsage()
and age_groups()
now have a na.rm
parameter to remove empty valuesp.symbol()
to p_symbol()
(the former is now deprecated and will be removed in a future version)x
in age_groups()
will now introduce NA
s and not return an error anymorekey_antibiotics()
on foreign systemsmdr_tb()
+as.mic()
)Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
+All these lead to the microbial ID of E. coli:
+as.mo("UPEC") +# B_ESCHR_COL +mo_name("UPEC") +# "Escherichia coli" +mo_gramstain("EHEC") +# "Gram-negative"
Function mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
Function mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
count_df()
and portion_df()
are now lowercaseas.ab()
and as.mo()
to understand even more severely misspelled inputas.ab()
now allows spaces for coercing antibiotics namesggplot2
methods for automatically determining the scale type of classes mo
and ab
+"bacteria"
from getting coerced by as.ab()
because Bacterial is a brand name of trimethoprim (TMP)eucast_rules()
and mdro()
+latest_annual_release
from the catalogue_of_life_version()
functionPVM1
from the antibiotics
data set as this was a duplicate of PME
+as.mo()
+plot()
and barplot()
for MIC and RSI classesas.mo()
+as.rsi()
on an MIC value (created with as.mic()
), a disk diffusion value (created with the new as.disk()
) or on a complete date set containing columns with MIC or disk diffusion values.mo_name()
as alias of mo_fullname()
+mdr_tb()
) and added a new vignette about MDR. Read this tutorial here on our website.Fixed a critical bug in first_isolate()
where missing species would lead to incorrect FALSEs. This bug was not present in AMR v0.5.0, but was in v0.6.0 and v0.6.1.
Fixed a bug in eucast_rules()
where antibiotics from WHONET software would not be recognised
Completely reworked the antibiotics
data set:
All entries now have 3 different identifiers:
+ab
contains a human readable EARS-Net code, used by ECDC and WHO/WHONET - this is the primary identifier used in this packageatc
contains the ATC code, used by WHO/WHOCCcid
contains the CID code (Compound ID), used by PubChemBased on the Compound ID, almost 5,000 official brand names have been added from many different countries
All references to antibiotics in our package now use EARS-Net codes, like AMX
for amoxicillin
Functions atc_certe
, ab_umcg
and atc_trivial_nl
have been removed
All atc_*
functions are superceded by ab_*
functions
All output will be translated by using an included translation file which can be viewed here.
+Please create an issue in one of our repositories if you want additions in this file.
+Improvements to plotting AMR results with ggplot_rsi()
:
colours
to set the bar colourstitle
, subtitle
, caption
, x.title
and y.title
to set titles and axis descriptionsImproved intelligence of looking up antibiotic columns in a data set using guess_ab_col()
Added ~5,000 more old taxonomic names to the microorganisms.old
data set, which leads to better results finding when using the as.mo()
function
This package now honours the new EUCAST insight (2019) that S and I are but classified as susceptible, where I is defined as ‘increased exposure’ and not ‘intermediate’ anymore. For functions like portion_df()
and count_df()
this means that their new parameter combine_SI
is TRUE at default. Our plotting function ggplot_rsi()
also reflects this change since it uses count_df()
internally.
The age()
function gained a new parameter exact
to determine ages with decimals
Removed deprecated functions guess_mo()
, guess_atc()
, EUCAST_rules()
, interpretive_reading()
, rsi()
Frequency tables (freq()
):
speed improvement for microbial IDs
fixed factor level names for R Markdown
when all values are unique it now shows a message instead of a warning
support for boxplots:
+ +Removed all hardcoded EUCAST rules and replaced them with a new reference file which can be viewed here.
+Please create an issue in one of our repositories if you want changes in this file.
+Added ceftazidim intrinsic resistance to Streptococci
Changed default settings for age_groups()
, to let groups of fives and tens end with 100+ instead of 120+
Fix for freq()
for when all values are NA
Fix for first_isolate()
for when dates are missing
Improved speed of guess_ab_col()
Function as.mo()
now gently interprets any number of whitespace characters (like tabs) as one space
Function as.mo()
now returns UNKNOWN
for "con"
(WHONET ID of ‘contamination’) and returns NA
for "xxx"
(WHONET ID of ‘no growth’)
Small algorithm fix for as.mo()
Removed viruses from data set microorganisms.codes
and cleaned it up
Fix for mo_shortname()
where species would not be determined correctly
eucast_rules()
with verbose = TRUE
+New website!
+We’ve got a new website: https://msberends.gitlab.io/AMR (built with the great pkgdown
)
BREAKING: removed deprecated functions, parameters and references to ‘bactid’. Use as.mo()
to identify an MO code.
Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The microorganisms
data set now contains:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales (covering at least like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton)
All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like Strongyloides and Taenia)
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The responsible author(s) and year of scientific publication
+This data is updated annually - check the included version with the new function catalogue_of_life_version()
.
Due to this change, some mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.
New function mo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)
New function mo_url()
to get the direct URL of a species from the Catalogue of Life
Support for data from WHONET and EARS-Net (European Antimicrobial Resistance Surveillance Network):
+first_isolate()
and eucast_rules()
, all parameters will be filled in automatically.antibiotics
data set now contains a column ears_net
.as.mo()
now knows all WHONET species abbreviations too, because almost 2,000 microbial abbreviations were added to the microorganisms.codes
data set.New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
+filter_aminoglycosides() +filter_carbapenems() +filter_cephalosporins() +filter_1st_cephalosporins() +filter_2nd_cephalosporins() +filter_3rd_cephalosporins() +filter_4th_cephalosporins() +filter_fluoroquinolones() +filter_glycopeptides() +filter_macrolides() +filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
septic_patients %>% filter_glycopeptides(result = "R") +# Filtering on glycopeptide antibacterials: any of `vanc` or `teic` is R +septic_patients %>% filter_glycopeptides(result = "R", scope = "all") +# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property() +ab_name -> atc_name() +ab_official -> atc_official() +ab_trivial_nl -> atc_trivial_nl() +ab_certe -> atc_certe() +ab_umcg -> atc_umcg() +ab_tradenames -> atc_tradenames()
These functions use as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
New functions set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functions
Support for the upcoming dplyr
version 0.8.0
New function guess_ab_col()
to find an antibiotic column in a table
New function mo_failures()
to review values that could not be coerced to a valid MO code, using as.mo()
. This latter function will now only show a maximum of 10 uncoerced values and will refer to mo_failures()
.
New function mo_uncertainties()
to review values that could be coerced to a valid MO code using as.mo()
, but with uncertainty.
New function mo_renamed()
to get a list of all returned values from as.mo()
that have had taxonomic renaming
New function age()
to calculate the (patients) age in years
New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
x <- resistance_predict(septic_patients, col_ab = "amox") +plot(x) +ggplot_rsi_predict(x)
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
septic_patients %>% filter_first_isolate(...) +# or +filter_first_isolate(septic_patients, ...)
is equal to:
+septic_patients %>% + mutate(only_firsts = first_isolate(septic_patients, ...)) %>% + filter(only_firsts == TRUE) %>% + select(-only_firsts)
New function availability()
to check the number of available (non-empty) results in a data.frame
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
eucast_rules()
:
+septic_patients
now reflects these changeseucast_rules(..., verbose = TRUE)
to get a data set with all changed per bug and drug combination.microorganisms.oldDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganismsDT
since they were no longer needed and only contained info already available in the microorganisms
data setantibiotics
data set, from the Pharmaceuticals Community Register of the European Commissionatc_group1_nl
and atc_group2_nl
from the antibiotics
data setatc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versionsguess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versionsas.mo()
:
+Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
# mo_fullname() uses as.mo() internally + +mo_fullname("Sthafilokockus aaureuz") +#> [1] "Staphylococcus aureus" + +mo_fullname("S. klossi") +#> [1] "Staphylococcus kloosii"
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
# equal: +as.mo(..., allow_uncertain = TRUE) +as.mo(..., allow_uncertain = 2) + +# also equal: +as.mo(..., allow_uncertain = FALSE) +as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
All microbial IDs that found are now saved to a local file ~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
mo_genus("qwerty", language = "es") +# Warning: +# one unique value (^= 100.0%) could not be coerced and is considered 'unknown': "qwerty". Use mo_failures() to review it. +#> [1] "(género desconocido)"
Fix for vector containing only empty values
Finds better results when input is in other languages
Better handling for subspecies
Better handling for Salmonellae, especially the ‘city like’ serovars like Salmonella London
Understanding of highly virulent E. coli strains like EIEC, EPEC and STEC
There will be looked for uncertain results at default - these results will be returned with an informative warning
Manual (help page) now contains more info about the algorithms
Progress bar will be shown when it takes more than 3 seconds to get results
Support for formatted console text
Console will return the percentage of uncoercable input
first_isolate()
:
+septic_patients
data set this yielded a difference of 0.15% more isolatescol_patientid
), when this parameter was left blankcol_keyantibiotics()
), when this parameter was left blankoutput_logical
, the function will now always return a logical valuefilter_specimen
to specimen_group
, although using filter_specimen
will still workportion
functions, that low counts can influence the outcome and that the portion
functions may camouflage this, since they only return the portion (albeit being dependent on the minimum
parameter)microorganisms.certe
and microorganisms.umcg
into microorganisms.codes
+mo_taxonomy()
now contains the kingdom toois.rsi.eligible()
using the new threshold
parameterscale_rsi_colours()
+mo
will now return the top 3 and the unique count, e.g. using summary(mo)
+rsi
and mic
+as.rsi()
:
+"HIGH S"
will return S
+freq()
function):
+Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
+# Determine genus of microorganisms (mo) in `septic_patients` data set: +# OLD WAY +septic_patients %>% + mutate(genus = mo_genus(mo)) %>% + freq(genus) +# NEW WAY +septic_patients %>% + freq(mo_genus(mo)) + +# Even supports grouping variables: +septic_patients %>% + group_by(gender) %>% + freq(mo_genus(mo))
Header info is now available as a list, with the header
function
The parameter header
is now set to TRUE
at default, even for markdown
Added header info for class mo
to show unique count of families, genera and species
Now honours the decimal.mark
setting, which just like format
defaults to getOption("OutDec")
The new big.mark
parameter will at default be ","
when decimal.mark = "."
and "."
otherwise
Fix for header text where all observations are NA
New parameter droplevels
to exclude empty factor levels when input is a factor
Factor levels will be in header when present in input data (maximum of 5)
Fix for using select()
on frequency tables
scale_y_percent()
now contains the limits
parametermdro()
, key_antibiotics()
and eucast_rules()
+resistance_predict()
function)as.mic()
to support more values ending in (several) zeroes%like%
, it will now return the callcount_all
to get all available isolates (that like all portion_*
and count_*
functions also supports summarise
and group_by
), the old n_rsi
is now an alias of count_all
+get_locale
to determine language for language-dependent output for some mo_*
functions. This is now the default value for their language
parameter, by which the system language will be used at default.microorganismsDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganisms.oldDT
to improve the speed of as.mo
. They are for reference only, since they are primarily for internal use of as.mo
.read.4D
to read from the 4D database of the MMB department of the UMCGmo_authors
and mo_year
to get specific values about the scientific reference of a taxonomic entryFunctions MDRO
, BRMO
, MRGN
and EUCAST_exceptional_phenotypes
were renamed to mdro
, brmo
, mrgn
and eucast_exceptional_phenotypes
EUCAST_rules
was renamed to eucast_rules
, the old function still exists as a deprecated function
Big changes to the eucast_rules
function:
rules
to specify which rules should be applied (expert rules, breakpoints, others or all)verbose
which can be set to TRUE
to get very specific messages about which columns and rows were affectedseptic_patients
now reflects these changespipe
for piperacillin (J01CA12), also to the mdro
functionAdded column kingdom
to the microorganisms data set, and function mo_kingdom
to look up values
Tremendous speed improvement for as.mo
(and subsequently all mo_*
functions), as empty values wil be ignored a priori
Fewer than 3 characters as input for as.mo
will return NA
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
as.mo("E. species") # B_ESCHR +mo_fullname("E. spp.") # "Escherichia species" +as.mo("S. spp") # B_STPHY +mo_fullname("S. species") # "Staphylococcus species"
Added parameter combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
Fix for portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
Added parameter also_single_tested
for portion_*
and count_*
functions to also include cases where not all antibiotics were tested but at least one of the tested antibiotics includes the target antimicribial interpretation, see ?portion
Using portion_*
functions now throws a warning when total available isolate is below parameter minimum
Functions as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymore
Frequency tables - freq()
:
Support for grouping variables, test with:
+septic_patients %>% + group_by(hospital_id) %>% + freq(gender)
Support for (un)selecting columns:
+septic_patients %>% + freq(hospital_id) %>% + select(-count, -cum_count) # only get item, percent, cum_percent
Check for hms::is.hms
Now prints in markdown at default in non-interactive sessions
No longer adds the factor level column and sorts factors on count again
Support for class difftime
New parameter na
, to choose which character to print for empty values
New parameter header
to turn the header info off (default when markdown = TRUE
)
New parameter title
to manually setbthe title of the frequency table
first_isolate
now tries to find columns to use as input when parameters are left blank
Improvements for MDRO algorithm (function mdro
)
Data set septic_patients
is now a data.frame
, not a tibble anymore
Removed diacritics from all authors (columns microorganisms$ref
and microorganisms.old$ref
) to comply with CRAN policy to only allow ASCII characters
Fix for mo_property
not working properly
Fix for eucast_rules
where some Streptococci would become ceftazidime R in EUCAST rule 4.5
Support for named vectors of class mo
, useful for top_freq()
ggplot_rsi
and scale_y_percent
have breaks
parameter
AI improvements for as.mo
:
"CRS"
-> Stenotrophomonas maltophilia
+"CRSM"
-> Stenotrophomonas maltophilia
+"MSSA"
-> Staphylococcus aureus
+"MSSE"
-> Staphylococcus epidermidis
+Fix for join
functions
Speed improvement for is.rsi.eligible
, now 15-20 times faster
In g.test
, when sum(x)
is below 1000 or any of the expected values is below 5, Fisher’s Exact Test will be suggested
ab_name
will try to fall back on as.atc
when no results are found
Removed the addin to view data sets
Percentages will now will rounded more logically (e.g. in freq
function)
The data set microorganisms
now contains all microbial taxonomic data from ITIS (kingdoms Bacteria, Fungi and Protozoa), the Integrated Taxonomy Information System, available via https://itis.gov. The data set now contains more than 18,000 microorganisms with all known bacteria, fungi and protozoa according ITIS with genus, species, subspecies, family, order, class, phylum and subkingdom. The new data set microorganisms.old
contains all previously known taxonomic names from those kingdoms.
New functions based on the existing function mo_property
:
mo_phylum
, mo_class
, mo_order
, mo_family
, mo_genus
, mo_species
, mo_subspecies
+mo_fullname
, mo_shortname
+mo_type
, mo_gramstain
+mo_ref
+They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
+mo_gramstain("E. coli") +# [1] "Gram negative" +mo_gramstain("E. coli", language = "de") # German +# [1] "Gramnegativ" +mo_gramstain("E. coli", language = "es") # Spanish +# [1] "Gram negativo" +mo_fullname("S. group A", language = "pt") # Portuguese +# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
+mo_gramstain("Esc blattae") +# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) +# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
count_df
(which works like portion_df
) to get all counts of S, I and R of a data set with antibiotic columns, with support for grouped variablesFunction is.rsi.eligible
to check for columns that have valid antimicrobial results, but do not have the rsi
class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
as.mo("E. coli") +# [1] B_ESCHR_COL +as.mo("MRSA") +# [1] B_STPHY_AUR +as.mo("S group A") +# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
+thousands_of_E_colis <- rep("E. coli", 25000) +microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") +# Unit: seconds +# min median max neval +# 0.01817717 0.01843957 0.03878077 100
Added parameter reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
Renamed all previous references to bactid
to mo
, like:
EUCAST_rules
, first_isolate
and key_antibiotics
+microorganisms
and septic_patients
+Function labels_rsi_count
to print datalabels on a RSI ggplot2
model
Functions as.atc
and is.atc
to transform/look up antibiotic ATC codes as defined by the WHO. The existing function guess_atc
is now an alias of as.atc
.
Function ab_property
and its aliases: ab_name
, ab_tradenames
, ab_certe
, ab_umcg
and ab_trivial_nl
Introduction to AMR as a vignette
Removed clipboard functions as it violated the CRAN policy
Renamed septic_patients$sex
to septic_patients$gender
Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
For first_isolate
, rows will be ignored when there’s no species available
Function ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
Fix for as.mic
for values ending in zeroes after a real number
Small fix where B. fragilis would not be found in the microorganisms.umcg
data set
Added prevalence
column to the microorganisms
data set
Added parameters minimum
and as_percent
to portion_df
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
Edited ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
Fix for ggplot_rsi
when the ggplot2
package was not loaded
Added datalabels function labels_rsi_count
to ggplot_rsi
Added possibility to set any parameter to geom_rsi
(and ggplot_rsi
) so you can set your own preferences
Fix for joins, where predefined suffices would not be honoured
Added parameter quote
to the freq
function
Added generic function diff
for frequency tables
Added longest en shortest character length in the frequency table (freq
) header of class character
Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
+my_list = list(age = septic_patients$age, gender = septic_patients$gender) +my_list %>% freq(age) +my_list %>% freq(gender)
rsi_df
was removed in favour of new functions portion_R
, portion_IR
, portion_I
, portion_SI
and portion_S
to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi
function. The old function still works, but is deprecated.
+portion_df
to get all portions of S, I and R of a data set with antibiotic columns, with support for grouped variablesggplot2
+geom_rsi
, facet_rsi
, scale_y_percent
, scale_rsi_colours
and theme_rsi
+ggplot_rsi
to apply all above functions on a data set:
+septic_patients %>% select(tobr, gent) %>% ggplot_rsi
will show portions of S, I and R immediately in a pretty plot?ggplot_rsi
+as.bactid
and is.bactid
to transform/ look up microbial ID’s.guess_bactid
is now an alias of as.bactid
+kurtosis
and skewness
that are lacking in base R - they are generic functions and have support for vectors, data.frames and matricesg.test
to perform the Χ2 distributed G-test, which use is the same as chisq.test
+ratio
to transform a vector of values to a preset ratioratio(c(10, 500, 10), ratio = "1:2:1")
would return 130, 260, 130
%in%
or %like%
(and give them keyboard shortcuts), or to view the datasets that come with this packagep.symbol
to transform p values to their related symbols: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+clipboard_import
and clipboard_export
as helper functions to quickly copy and paste from/to software like Excel and SPSS. These functions use the clipr
package, but are a little altered to also support headless Linux servers (so you can use it in RStudio Server)freq
):
+rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
+hist
and plot
to use a frequency table as input: hist(freq(df$age))
+as.vector
, as.data.frame
, as_tibble
and format
+freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
+top_freq
function to return the top/below n items as vectoroptions(max.print.freq = n)
where n is your preset valueresistance_predict
and added more examplesseptic_patients
data set to better reflect the realitymic
and rsi
classes now returns all values - use freq
to check distributionskey_antibiotics
function are now generic: 6 for broadspectrum ABs, 6 for Gram-positive specific and 6 for Gram-negative specific ABsabname
function%like%
now supports multiple patternsdata.frame
s with altered console printing to make it look like a frequency table. Because of this, the parameter toConsole
is not longer needed.freq
where the class of an item would be lostseptic_patients
dataset and the column bactid
now has the new class "bactid"
+microorganisms
dataset (especially for Salmonella) and the column bactid
now has the new class "bactid"
+rsi
and mic
functions:
+as.rsi("<=0.002; S")
will return S
+as.mic("<=0.002; S")
will return <=0.002
+as.mic("<= 0.002")
now worksrsi
and mic
do not add the attribute package.version
anymore"groups"
option for atc_property(..., property)
. It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups
is a convenient wrapper around this.atc_property
as it requires the host set by url
to be responsivefirst_isolate
algorithm to exclude isolates where bacteria ID or genus is unavailable924b62
) from the dplyr
package v0.7.5 and aboveguess_bactid
(now called as.bactid
)
+yourdata %>% select(genus, species) %>% as.bactid()
now also worksn_rsi
to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise
, see ?rsiguess_bactid
to determine the ID of a microorganism based on genus/species or known abbreviations like MRSAguess_atc
to determine the ATC of an antibiotic based on name, trade name, or known abbreviationsfreq
to create frequency tables, with additional info in a headerMDRO
to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.
+BRMO
and MRGN
are wrappers for Dutch and German guidelines, respectively"points"
or "keyantibiotics"
, see ?first_isolate
+tibble
s and data.table
srsi
class for vectors that contain only invalid antimicrobial interpretationsablist
to antibiotics
+bactlist
to microorganisms
+antibiotics
datasetmicroorganisms
datasetseptic_patients
+join
functions%like%
to make it case insensitivefirst_isolate
and EUCAST_rules
column names are now case-insensitiveas.rsi
and as.mic
now add the package name and version as attributesREADME.md
with more examplestestthat
packageEUCAST_rules
applies for amoxicillin even if ampicillin is missingrsi
and mic
classesThese functions are so-called 'Deprecated'. They will be removed in a future release. Using the functions will give a warning with the name of the function it has been replaced by (if there is one).
+portion_R(...) + +portion_IR(...) + +portion_I(...) + +portion_SI(...) + +portion_S(...) + +portion_df(...)+ + +
+The lifecycle of this function is retired. A retired function is no longer under active development, and (if appropiate) a better alternative is available. No new arguments will be added, and only the most critical bugs will be fixed. In a future version, this function will be removed.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +AMR
Package — AMR • AMR (for R)Welcome to the AMR
package.
AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any table format, including WHONET/EARS-Net data.
We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation.
+This package can be used for:
Reference for the taxonomy of microorganisms, since the package contains all microbial (sub)species from the Catalogue of Life
Interpreting raw MIC and disk diffusion values, based on the latest CLSI or EUCAST guidelines
Retrieving antimicrobial drug names, doses and forms of administration from clinical health care records
Determining first isolates to be used for AMR analysis
Calculating antimicrobial resistance
Determining multi-drug resistance (MDR) / multi-drug resistant organisms (MDRO)
Calculating (empirical) susceptibility of both mono therapy and combination therapies
Predicting future antimicrobial resistance using regression models
Getting properties for any microorganism (like Gram stain, species, genus or family)
Getting properties for any antibiotic (like name, EARS-Net code, ATC code, PubChem code, defined daily dose or trade name)
Plotting antimicrobial resistance
Getting SNOMED codes of a microorganism, or get its name associated with a SNOMED code
Getting LOINC codes of an antibiotic, or get its name associated with a LOINC code
Machine reading the EUCAST and CLSI guidelines from 2011-2020 to translate MIC values and disk diffusion diameters to R/SI
Principal component analysis for AMR
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+For suggestions, comments or questions, please contact us at:
+Matthijs S. Berends
+m.s.berends [at] umcg [dot] nl
+University of Groningen
+Department of Medical Microbiology
+University Medical Center Groningen
+Post Office Box 30001
+9700 RB Groningen
+The Netherlands
If you have found a bug, please file a new issue at:
+https://github.com/msberends/AMR/issues
All antimicrobial drugs and their official names, ATC codes, ATC groups and defined daily dose (DDD) are included in this package, using the WHO Collaborating Centre for Drug Statistics Methodology.
+
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission (http://ec.europa.eu/health/documents/community-register/html/atc.htm).
These have become the gold standard for international drug utilisation monitoring and research.
+The WHOCC is located in Oslo at the Norwegian Institute of Public Health and funded by the Norwegian government. The European Commission is the executive of the European Union and promotes its general interest.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +as.ab("meropenem") +ab_name("J01DH02") + +ab_tradenames("flucloxacillin")+
This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The data itself was based on our example_isolates data set.
+WHONET
+
+
+ A data.frame
with 500 observations and 53 variables:
Identification number
ID of the sample
Specimen number
ID of the specimen
Organism
Name of the microorganism. Before analysis, you should transform this to a valid microbial class, using as.mo()
.
Country
Country of origin
Laboratory
Name of laboratory
Last name
Last name of patient
First name
Initial of patient
Sex
Gender of patient
Age
Age of patient
Age category
Age group, can also be looked up using age_groups()
Date of admission
Date of hospital admission
Specimen date
Date when specimen was received at laboratory
Specimen type
Specimen type or group
Specimen type (Numeric)
Translation of "Specimen type"
Reason
Reason of request with Differential Diagnosis
Isolate number
ID of isolate
Organism type
Type of microorganism, can also be looked up using mo_type()
Serotype
Serotype of microorganism
Beta-lactamase
Microorganism produces beta-lactamase?
ESBL
Microorganism produces extended spectrum beta-lactamase?
Carbapenemase
Microorganism produces carbapenemase?
MRSA screening test
Microorganism is possible MRSA?
Inducible clindamycin resistance
Clindamycin can be induced?
Comment
Other comments
Date of data entry
Date this data was entered in WHONET
AMP_ND10:CIP_EE
28 different antibiotics. You can lookup the abbreviations in the antibiotics data set, or use e.g. ab_name("AMP")
to get the official name immediately. Before analysis, you should transform this to a valid antibiotic class, using as.rsi()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +R/ab_from_text.R
+ ab_from_text.Rd
Use this function on e.g. clinical texts from health care records. It returns a list with all antimicrobial drugs, doses and forms of administration found in the texts.
+ab_from_text( + text, + type = c("drug", "dose", "administration"), + collapse = NULL, + translate_ab = FALSE, + thorough_search = NULL, + ... +)+ +
text | +text to analyse |
+
---|---|
type | +type of property to search for, either |
+
collapse | +character to pass on to |
+
translate_ab | +if |
+
thorough_search | +logical to indicate whether the input must be extensively searched for misspelling and other faulty input values. Setting this to |
+
... | +parameters passed on to |
+
A list, or a character if collapse
is not NULL
This function is also internally used by as.ab()
, although it then only searches for the first drug name and will throw a note if more drug names could have been returned.
type
At default, the function will search for antimicrobial drug names. All text elements will be searched for official names, ATC codes and brand names. As it uses as.ab()
internally, it will correct for misspelling.
With type = "dose"
(or similar, like "dosing", "doses"), all text elements will be searched for numeric values that are higher than 100 and do not resemble years. The output will be numeric. It supports any unit (g, mg, IE, etc.) and multiple values in one clinical text, see Examples.
With type = "administration"
(or abbreviations, like "admin", "adm"), all text elements will be searched for a form of drug administration. It supports the following forms (including common abbreviations): buccal, implant, inhalation, instillation, intravenous, nasal, oral, parenteral, rectal, sublingual, transdermal and vaginal. Abbreviations for oral (such as 'po', 'per os') will become "oral", all values for intravenous (such as 'iv', 'intraven') will become "iv". It supports multiple values in one clinical text, see Examples.
collapse
Without using collapse
, this function will return a list. This can be convenient to use e.g. inside a mutate()
):
+df %>% mutate(abx = ab_from_text(clinical_text))
The returned AB codes can be transformed to official names, groups, etc. with all ab_property()
functions like ab_name()
and ab_group()
, or by using the translate_ab
parameter.
With using collapse
, this function will return a character:
+df %>% mutate(abx = ab_from_text(clinical_text, collapse = "|"))
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +# mind the bad spelling of amoxicillin in this line, +# straight from a true health care record: +ab_from_text("28/03/2020 regular amoxicilliin 500mg po tds") + +ab_from_text("500 mg amoxi po and 400mg cipro iv") +ab_from_text("500 mg amoxi po and 400mg cipro iv", type = "dose") +ab_from_text("500 mg amoxi po and 400mg cipro iv", type = "admin") + +ab_from_text("500 mg amoxi po and 400mg cipro iv", collapse = ", ") + +# if you want to know which antibiotic groups were administered, do e.g.: +abx <- ab_from_text("500 mg amoxi po and 400mg cipro iv") +ab_group(abx[[1]]) + +if (require(dplyr)) { + tibble(clinical_text = c("given 400mg cipro and 500 mg amox", + "started on doxy iv today")) %>% + mutate(abx_codes = ab_from_text(clinical_text), + abx_doses = ab_from_text(clinical_text, type = "doses"), + abx_admin = ab_from_text(clinical_text, type = "admin"), + abx_coll = ab_from_text(clinical_text, collapse = "|"), + abx_coll_names = ab_from_text(clinical_text, + collapse = "|", + translate_ab = "name"), + abx_coll_doses = ab_from_text(clinical_text, + type = "doses", + collapse = "|"), + abx_coll_admin = ab_from_text(clinical_text, + type = "admin", + collapse = "|")) + +}+
Use these functions to return a specific property of an antibiotic from the antibiotics data set. All input values will be evaluated internally with as.ab()
.
ab_name(x, language = get_locale(), tolower = FALSE, ...) + +ab_atc(x, ...) + +ab_cid(x, ...) + +ab_synonyms(x, ...) + +ab_tradenames(x, ...) + +ab_group(x, language = get_locale(), ...) + +ab_atc_group1(x, language = get_locale(), ...) + +ab_atc_group2(x, language = get_locale(), ...) + +ab_loinc(x, ...) + +ab_ddd(x, administration = "oral", units = FALSE, ...) + +ab_info(x, language = get_locale(), ...) + +ab_url(x, open = FALSE, ...) + +ab_property(x, property = "name", language = get_locale(), ...)+ +
x | +any (vector of) text that can be coerced to a valid microorganism code with |
+
---|---|
language | +language of the returned text, defaults to system language (see |
+
tolower | +logical to indicate whether the first character of every output should be transformed to a lower case character. This will lead to e.g. "polymyxin B" and not "polymyxin b". |
+
... | +other parameters passed on to |
+
administration | +way of administration, either |
+
units | +a logical to indicate whether the units instead of the DDDs itself must be returned, see Examples |
+
open | +browse the URL using |
+
property | +one of the column names of one of the antibiotics data set |
+
An integer
in case of ab_cid()
A named list
in case of ab_info()
and multiple ab_synonyms()
/ab_tradenames()
A double
in case of ab_ddd()
A character
in all other cases
All output will be translated where possible.
+The function ab_url()
will return the direct URL to the official WHO website. A warning will be returned if the required ATC code is not available.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology: https://www.whocc.no/atc_ddd_index/
+WHONET 2019 software: http://www.whonet.org/software.html
+European Commission Public Health PHARMACEUTICALS - COMMUNITY REGISTER: http://ec.europa.eu/health/documents/community-register/html/atc.htm
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# all properties: +ab_name("AMX") # "Amoxicillin" +ab_atc("AMX") # J01CA04 (ATC code from the WHO) +ab_cid("AMX") # 33613 (Compound ID from PubChem) +ab_synonyms("AMX") # a list with brand names of amoxicillin +ab_tradenames("AMX") # same +ab_group("AMX") # "Beta-lactams/penicillins" +ab_atc_group1("AMX") # "Beta-lactam antibacterials, penicillins" +ab_atc_group2("AMX") # "Penicillins with extended spectrum" +ab_url("AMX") # link to the official WHO page + +# smart lowercase tranformation +ab_name(x = c("AMC", "PLB")) # "Amoxicillin/clavulanic acid" "Polymyxin B" +ab_name(x = c("AMC", "PLB"), + tolower = TRUE) # "amoxicillin/clavulanic acid" "polymyxin B" + +# defined daily doses (DDD) +ab_ddd("AMX", "oral") # 1 +ab_ddd("AMX", "oral", units = TRUE) # "g" +ab_ddd("AMX", "iv") # 1 +ab_ddd("AMX", "iv", units = TRUE) # "g" + +ab_info("AMX") # all properties as a list + +# all ab_* functions use as.ab() internally, so you can go from 'any' to 'any': +ab_atc("AMP") # ATC code of AMP (ampicillin) +ab_group("J01CA01") # Drug group of ampicillins ATC code +ab_loinc("ampicillin") # LOINC codes of ampicillin +ab_name("21066-6") # "Ampicillin" (using LOINC) +ab_name(6249) # "Ampicillin" (using CID) +ab_name("J01CA01") # "Ampicillin" (using ATC) + +# spelling from different languages and dyslexia are no problem +ab_atc("ceftriaxon") +ab_atc("cephtriaxone") +ab_atc("cephthriaxone") +ab_atc("seephthriaaksone")+
Calculates age in years based on a reference date, which is the sytem date at default.
+age(x, reference = Sys.Date(), exact = FALSE, na.rm = FALSE)+ +
x | +date(s), will be coerced with |
+
---|---|
reference | +reference date(s) (defaults to today), will be coerced with |
+
exact | +a logical to indicate whether age calculation should be exact, i.e. with decimals. It divides the number of days of year-to-date (YTD) of |
+
na.rm | +a logical to indicate whether missing values should be removed |
+
An integer (no decimals) if exact = FALSE
, a double (with decimals) otherwise
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+To split ages into groups, use the age_groups()
function.
# 10 random birth dates +df <- data.frame(birth_date = Sys.Date() - runif(10) * 25000) +# add ages +df$age <- age(df$birth_date) +# add exact ages +df$age_exact <- age(df$birth_date, exact = TRUE) + +df+
Split ages into age groups defined by the split
parameter. This allows for easier demographic (antimicrobial resistance) analysis.
age_groups(x, split_at = c(12, 25, 55, 75), na.rm = FALSE)+ +
x | +age, e.g. calculated with |
+
---|---|
split_at | +values to split |
+
na.rm | +a logical to indicate whether missing values should be removed |
+
Ordered factor
+To split ages, the input for the split_at
parameter can be:
A numeric vector. A vector of e.g. c(10, 20)
will split on 0-9, 10-19 and 20+. A value of only 50
will split on 0-49 and 50+.
+The default is to split on young children (0-11), youth (12-24), young adults (25-54), middle-aged adults (55-74) and elderly (75+).
A character:
"children"
or "kids"
, equivalent of: c(0, 1, 2, 4, 6, 13, 18)
. This will split on 0, 1, 2-3, 4-5, 6-12, 13-17 and 18+.
"elderly"
or "seniors"
, equivalent of: c(65, 75, 85)
. This will split on 0-64, 65-74, 75-84, 85+.
"fives"
, equivalent of: 1:20 * 5
. This will split on 0-4, 5-9, 10-14, ..., 90-94, 95-99, 100+.
"tens"
, equivalent of: 1:10 * 10
. This will split on 0-9, 10-19, 20-29, ..., 80-89, 90-99, 100+.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+To determine ages, based on one or more reference dates, use the age()
function.
ages <- c(3, 8, 16, 54, 31, 76, 101, 43, 21) + +# split into 0-49 and 50+ +age_groups(ages, 50) + +# split into 0-19, 20-49 and 50+ +age_groups(ages, c(20, 50)) + +# split into groups of ten years +age_groups(ages, 1:10 * 10) +age_groups(ages, split_at = "tens") + +# split into groups of five years +age_groups(ages, 1:20 * 5) +age_groups(ages, split_at = "fives") + +# split specifically for children +age_groups(ages, "children") +# same: +age_groups(ages, c(1, 2, 4, 6, 13, 17)) + +if (FALSE) { +# resistance of ciprofloxacine per age group +library(dplyr) +example_isolates %>% + filter_first_isolate() %>% + filter(mo == as.mo("E. coli")) %>% + group_by(age_group = age_groups(age)) %>% + select(age_group, CIP) %>% + ggplot_rsi(x = "age_group") +}+
Use these selection helpers inside any function that allows Tidyverse selections, like dplyr::select()
or tidyr::pivot_longer()
. They help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations.
ab_class(ab_class) + +aminoglycosides() + +carbapenems() + +cephalosporins() + +cephalosporins_1st() + +cephalosporins_2nd() + +cephalosporins_3rd() + +cephalosporins_4th() + +cephalosporins_5th() + +fluoroquinolones() + +glycopeptides() + +macrolides() + +penicillins() + +tetracyclines()+ +
ab_class | +an antimicrobial class, like |
+
---|
All columns will be searched for known antibiotic names, abbreviations, brand names and codes (ATC, EARS-Net, WHO, etc.). This means that a selector like e.g. aminoglycosides()
will pick up column names like 'gen', 'genta', 'J01GB03', 'tobra', 'Tobracin', etc.
These functions only work if the tidyselect
package is installed, that comes with the dplyr
package. An error will be thrown if tidyselect
package is not installed, or if the functions are used outside a function that allows Tidyverse selections like select()
or pivot_longer()
.
filter_ab_class()
for the filter()
equivalent.
if (require("dplyr")) { + + # this will select columns 'IPM' (imipenem) and 'MEM' (meropenem): + example_isolates %>% + select(carbapenems()) + + # this will select columns 'mo', 'AMK', 'GEN', 'KAN' and 'TOB': + example_isolates %>% + select(mo, aminoglycosides()) + + # this will select columns 'mo' and all antimycobacterial drugs ('RIF'): + example_isolates %>% + select(mo, ab_class("mycobact")) + + + # get bug/drug combinations for only macrolides in Gram-positives: + example_isolates %>% + filter(mo_gramstain(mo) %like% "pos") %>% + select(mo, macrolides()) %>% + bug_drug_combinations() %>% + format() + + + data.frame(irrelevant = "value", + J01CA01 = "S") %>% # ATC code of ampicillin + select(penicillins()) # so the 'J01CA01' column is selected + +}+
Two data sets containing all antibiotics/antimycotics and antivirals. Use as.ab()
or one of the ab_property()
functions to retrieve values from the antibiotics data set. Three identifiers are included in this data set: an antibiotic ID (ab
, primarily used in this package) as defined by WHONET/EARS-Net, an ATC code (atc
) as defined by the WHO, and a Compound ID (cid
) as found in PubChem. Other properties in this data set are derived from one or more of these codes.
antibiotics + +antivirals+ + +
data.frame
with 456 observations and 14 variables:ab
Antibiotic ID as used in this package (like AMC
), using the official EARS-Net (European Antimicrobial Resistance Surveillance Network) codes where available
atc
ATC code (Anatomical Therapeutic Chemical) as defined by the WHOCC, like J01CR02
cid
Compound ID as found in PubChem
name
Official name as used by WHONET/EARS-Net or the WHO
group
A short and concise group name, based on WHONET and WHOCC definitions
atc_group1
Official pharmacological subgroup (3rd level ATC code) as defined by the WHOCC, like "Macrolides, lincosamides and streptogramins"
atc_group2
Official chemical subgroup (4th level ATC code) as defined by the WHOCC, like "Macrolides"
abbr
List of abbreviations as used in many countries, also for antibiotic susceptibility testing (AST)
synonyms
Synonyms (often trade names) of a drug, as found in PubChem based on their compound ID
oral_ddd
Defined Daily Dose (DDD), oral treatment
oral_units
Units of oral_ddd
iv_ddd<