Note: some changes in this version were suggested by anonymous reviewers from the journal we submitted our manuscipt to. We are those reviewers very grateful for going through our code so thoroughly!
@ -39,10 +39,10 @@ Note: some changes in this version were suggested by anonymous reviewers from th
#> [1] 24 24
```
* Improvements for `as.mo()`:
* Any user input value that could mean more than one taxonomic entry is now considered 'uncertain'. Instead of a warning, a message will be thrown and the accompanying `mo_uncertainties()` has been changed completely; it now prints all possible candidates with their score.
* Any user input value that could mean more than one taxonomic entry is now considered 'uncertain'. Instead of a warning, a message will be thrown and the accompanying `mo_uncertainties()` has been changed completely; it now prints all possible candidates with their matching score.
* Big speed improvement for already valid microorganism ID. This also means an significant speed improvement for using `mo_*` functions like `mo_name()` on microoganism IDs.
* Added parameter `ignore_pattern` to `as.mo()` which can also be given to `mo_*` functions like `mo_name()`, to exclude known non-relevant input from analysing. This can also be set with the option `AMR_ignore_pattern`.
* `get_locale()` now uses `Sys.getlocale()` instead of `Sys.getlocale("LC_COLLATE")`
* `get_locale()` now uses at default `Sys.getenv("LANG")` or, if `LANG` is not set, `Sys.getlocale()`. This can be overwritten by setting the option `AMR_locale`.
* Speed improvement for `eucast_rules()`
* Overall speed improvement by tweaking joining functions
* Function `mo_shortname()` now returns the genus for input where the species is unknown
#' @param allow_uncertain a number between `0` (or `"none"`) and `3` (or `"all"`), or `TRUE` (= `2`) or `FALSE` (= `0`) to indicate whether the input should be checked for less probable results, please see *Details*
#' @param reference_df a [`data.frame`] to be used for extra reference when translating `x` to a valid [`mo`]. See [set_mo_source()] and [get_mo_source()] to automate the usage of your own codes (e.g. used in your analysis or organisation).
#' @param ignore_pattern a regular expression (case-insensitive) of which all matches in `x` must return `NA`. This can be convenient to exclude known non-relevant input and can also be set with the option `AMR_ignore_pattern`, e.g. `options(AMR_ignore_pattern = "(not reported|contaminated flora)")`.
#' @param language language to translate text like "no growth", which defaults to the system language (see [get_locale()])
#' @param ... other parameters passed on to functions
#' @rdname as.mo
#' @aliases mo
@ -86,7 +87,7 @@
#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (``r as.mo("Neisseria gonorrhoeae")``) needs review.
#'
#' There are three helper functions that can be run after using the [as.mo()] function:
#' - Use [mo_uncertainties()] to get a [`data.frame`] that prints in a pretty format with all taxonomic names that were guessed. The output contains a score that is based on the human pathogenic prevalence and the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between the full taxonomic name and the user input.
#' - Use [mo_uncertainties()] to get a [`data.frame`] that prints in a pretty format with all taxonomic names that were guessed. The output contains a score that is based on the human pathogenic prevalence and the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between the user input and the full taxonomic name.
#' - Use [mo_failures()] to get a [`character`] [`vector`] with all values that could not be coerced to a valid value.
#' - Use [mo_renamed()] to get a [`data.frame`] with all values that could be coerced based on old, previously accepted taxonomic names.
#'
@ -175,6 +176,7 @@ as.mo <- function(x,
allow_uncertain=TRUE,
reference_df=get_mo_source(),
ignore_pattern=getOption("AMR_ignore_pattern"),
language=get_locale(),
...){
check_dataset_integrity()
@ -186,7 +188,7 @@ as.mo <- function(x,
# is.mo() won't work - codes might change between package versions
return(to_class_mo(x))
}
if (tryCatch(all(tolower(x)%in%MO_lookup$fullname_lower,na.rm=TRUE)
# param dyslexia_mode logical - also check for characters that resemble others
# param debug logical - show different lookup texts while searching
# param reference_data_to_use data.frame - the data set to check for
# param actual_uncertainty - (only for initial_search = FALSE) the actual uncertainty level used in the function for score calculation (sometimes passed as 2 or 3 by uncertain_fn())
# param actual_input - (only for initial_search = FALSE) the actual, original input
# param language - used for translating "no growth", etc.
cat("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (8) check for unknown yeasts/fungi\n")
cat(font_bold("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (8) check for unknown yeasts/fungi\n"))
}
if (b.x_trimmed%like_case%"yeast"){
found<-"F_YEAST"
@ -1202,7 +1229,7 @@ exec_as.mo <- function(x,
}
# (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome) ----
if (isTRUE(debug)){
cat("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome)\n")
cat(font_bold("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome)\n"))
}
x_strip<-a.x_backup%>%strsplit("[ .]")%>%unlist()
if (length(x_strip)>1&nchar(g.x_backup_without_spp)>=6){
# (10) try to strip off one element from start and check the remains (any text size) ----
if (isTRUE(debug)){
cat("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (10) try to strip off one element from start and check the remains (any text size)\n")
cat(font_bold("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (10) try to strip off one element from start and check the remains (any text size)\n"))
}
x_strip<-a.x_backup%>%strsplit("[ .]")%>%unlist()
if (length(x_strip)>1&nchar(g.x_backup_without_spp)>=6){
# (11) try to strip off one element from end and check the remains (any text size) ----
# (this is in fact 7 but without nchar limit of >=6)
if (isTRUE(debug)){
cat("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (11) try to strip off one element from end and check the remains (any text size)\n")
cat(font_bold("\n[ UNCERTAINTY LEVEL",now_checks_for_uncertainty_level,"] (11) try to strip off one element from end and check the remains (any text size)\n"))
msg<-paste0("Result",plural[1]," of ",nr2char(NROW(uncertainties))," value",plural[1],
msg<-paste0("Result",plural[1]," of ",nr2char(length(uncertainties$input))," value",plural[1],
" ",plural[3]," guessed with uncertainty. Use mo_uncertainties() to review ",plural[2],".")
message(font_blue(msg))
}
@ -1501,6 +1533,11 @@ exec_as.mo <- function(x,
print(mo_renamed())
}
if (NROW(uncertainties)>0&initial_search==FALSE){
# this will save the uncertain items as attribute, so they can be bound to `uncertainties` in the uncertain_fn() function
x<-structure(x,uncertainties=uncertainties)
}
if (old_mo_warning==TRUE&property!="mo"){
warning("The input contained old microorganism IDs from previous versions of this package.\nPlease use `as.mo()` on these old IDs to transform them to the new format.\nSUPPORT FOR THIS WILL BE DROPPED IN A FUTURE VERSION.",call.=FALSE)
#' For language-dependent output of AMR functions, like [mo_name()], [mo_gramstain()], [mo_type()] and [ab_name()].
#' @inheritSection lifecycle Stable lifecycle
#' @details Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: <https://github.com/msberends/AMR/blob/master/data-raw/translations.tsv>. This file will be read by all functions where a translated output can be desired, like all [mo_property()] functions ([mo_name()], [mo_gramstain()], [mo_type()], etc.).
#' @details Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: <https://github.com/msberends/AMR/blob/master/data-raw/translations.tsv>. This file will be read by all functions where a translated output can be desired, like all [mo_property()] functions ([mo_name()], [mo_gramstain()], [mo_type()], etc.) and [ab_property()] functions ([ab_name()], [ab_group()] etc.).
#'
#' Currently supported languages are: `r paste(sort(gsub(";.*", "", ISOcodes::ISO_639_2[which(ISOcodes::ISO_639_2$Alpha_2 %in% LANGUAGES_SUPPORTED), "Name"])), collapse = ", ")`. Please note that currently not all these languages have translations available for all antimicrobial agents and colloquial microorganism names.
#'
#' Please suggest your own translations [by creating a new issue on our repository](https://github.com/msberends/AMR/issues/new?title=Translations).
#'
#' The system language will be used at default (as returned by [Sys.getlocale()]), if that language is supported. The language to be used can be overwritten by setting the option `AMR_locale`, e.g. `options(AMR_locale = "de")`.
#' ## Changing the default language
#' The system language will be used at default (as returned by [Sys.getenv("LANG")] or, if `LANG` is not set, [Sys.getlocale()]), if that language is supported. But the language to be used can be overwritten in two ways and will be checked in this order:
#'
#' 1. Setting the R option `AMR_locale`, e.g. by running `options(AMR_locale = "de")`
#' 2. Setting the system variable `LANGUAGE` or `LANG`, e.g. by adding `LANGUAGE="de_DE.utf8"` to your `.Renviron` file in your home directory
#'
#' So if the R option `AMR_locale` is set, the system variables `LANGUAGE` and `LANG` will be ignored.
#' @inheritSection AMR Read more on our website!
#' @rdname translate
#' @name translate
@ -73,17 +79,24 @@ get_locale <- function() {
if (lang%in%LANGUAGES_SUPPORTED){
return(lang)
}else{
stop_("unsupported language: '",lang,"' - use one of: ",
stop_("unsupported language set as option 'AMR_locale': '",lang,"' - use one of: ",