Remove hexadecimal values ​​from data.table in R

I have a large data table called Site (300,000 rows, 100 columns). Hexadecimal values ​​are presented throughout the data table, for example: "\ x96" or "\ xc9". I want all these values ​​to be deleted. They correspond to the "\ x" format, followed by two characters (numbers or letters).

Here is a function that replaces values. I can do each separately, as shown below, but I need a general command that will get rid of all the hexadecimal values ​​in the table.

Site<- as.data.table(apply(Site, 2, function(x) gsub("\x8e", "", x)))

I tried using the regex syntax "\ x .." but got this error:

Error: '\x' used without hex digits in character string starting ""\x"

How to remove these hexadecimal values? Any help is much appreciated!

Here is an example of reproducibility:

dt <- data.table(A = c("Th\xa1is","is","the","first\x12"), B = c("This","\x45is","the","second"))

, "\ xa1", "\ x12" "\ x45" , :

       A      B
1:  This   This
2:    is     is
3:   the    the
4: first second
+4
1

. . . , .

. ?Quotes . "\x" "\x", -, 1 2 ( a f), R-.

"\x01" "\x7f" "" ASCII. identical("\x30", "0"), identical("\x39", "9"), identical("\x41", "A"), identical("\x5A", "Z"), , TRUE.

128 , , "\x80" "\xff" " 1" .

Unicode UTF-8.

, " ", , , "\x80" "\xff", , . , . . - . , :

dt[, lapply(.SD, gsub, pattern = "[\x80-\xff]", replacement = "")]

. , ASCII: dt[, lapply(.SD, gsub, pattern = "[^\x01-\x7f]", replacement = "")].

: R ( Python) , , , . Python "\\" r"\", , R . : "\\". regex101 Th\xa1is. , R, "Th\xa1is".

+2

Source: https://habr.com/ru/post/1690148/


All Articles