Header text in R with exception list

* Apologies, I should have been more clear (I really appreciate all the help though!)

I am extracting a .csv file from the database. This file contains a list of place names. I use INITCAP when I retrieve them, so all of them are a proper mixed case. However, some of these place names should remain capitalized, as they are known for abbreviations such as universities, etc. The end result will be that I will return it back to the database in the adjusted format.

I am new to R and stuck the problem a bit. I retrieve the data, all in the capitals, but I need this to be the right case, that is, change "THIS IS ALL CARDS" to "These are all caps," but I need to be able to exclude certain words. Things like "FYI" and other abbreviations should remain capitalized. I was able to solve some of my problems with a letter library, in particular str_ucfirst. My only remaining problem is part of the exception. We appreciate any suggestions. Thank.

+4
source share
4 answers

By creating the @akrun solution (now remote), you can create an exception vector, which then paste0d into the regular expression using (*SKIP)(*FAIL):

string <- "THIS IS ALL CAPS"
exceptions <- c("FYI", "THIS")
pattern <- sprintf("(?:%s)(*SKIP)(*FAIL)|\\b([A-Z])(\\w+)", paste0(exceptions, collapse = "|"))
gsub(pattern, "\\1\\L\\2", string, perl = TRUE)

What gives

[1] "THIS Is All Caps"

THIS, .


unimportant|not_important|(very important)

...(*SKIP)(*FAIL)|what_i_want_to_match

\b      # a word boundary
([A-Z]) # uppercase letters
(\w+)   # [a-zA-Z0-9_]+

.

+4

gsub("\\b([A-Z])(\\w+)", "\\1\\L\\2", str1, perl = TRUE)
#[1] "This Is All Caps"

stri_trans_totitle stringi

library(stringi)
stri_trans_totitle(str1)
#[1] "This Is All Caps"

str1 <- "THIS IS ALL CAPS"
+2

Using a package stringrto convert to a camcorder without using regular expressions:

library(stringr)

string <- "CONVERT THIS TO CAMELCASE, YO"
exceptions <- c("YO", "THIS")

paste(sapply(unlist(str_split(string, " ")), 
             function(word){ ifelse(word %in% exceptions, 
                                    word, 
                                    str_to_title(word))}),
      collapse = " ")

Conclusion:

[1] "Convert THIS To Camelcase, YO"
+1
source

A somewhat slow and primitive but very acceptable base-R solution:

string <- c("THIS IS ALL CAPS", "FYI only some words should be not all CAPS")
except <- c("fyi", "all")

string2 <- gsub("([A-Za-z])([A-Za-z]+)", "\\U\\1\\L\\2", string, perl = TRUE)
string2
[1] "This Is All Caps"                           "Fyi Only Some Words Should Be Not All Caps"

string3 <- string2
for (word in except) {
  string3 <- gsub(
    paste0("(", word ,")"), 
    "\\U\\1",
    string3, 
    perl = TRUE, 
    ignore.case = TRUE
  )
}
string3
[1] "This Is ALL Caps"                           "FYI Only Some Words Should Be Not ALL Caps"
0
source

Source: https://habr.com/ru/post/1696320/


All Articles