Javascript regular expression leave only words (international version)

I am trying to cut a string to leave the remaining words. For anything using the latin alphabet, I can easily deal with it with

str = str.replace(/\W/g, '').replace(/[0-9]/g, ''); 

(I guess I probably don't need both replace s, but I'm very new to regular expressions and not sure what I'm doing)

However, it also supplants foreign characters such as Chinese or Arabic.

How do I write a function for this?

 strOne = "test!(Β£)98* string"; strTwo = "δ½ ε₯½οΌŒ325!# δΈ–η•Œ"; cleanUp (strOne); // Output: "test string" cleanUp (strTwo); // Output: "您ε₯½ δΈ–η•Œ" 

(In case someone wonders, Chinese is my "hello world" through an online translator)

In the library note, I do not know if this value matters, but I use dojo and would like to avoid jquery if possible.

+4
source share
2 answers

you will need a regular expression pattern using the unicode character properties , namely \P{Letter} .

Unfortunately, the built-in jj regex mechanism does not support these constructs (see mdn docs ). however there is (at least) this third-party library that includes the js plugin , adding support.

code example:

 var regex, str; str = "whatever"; regex = XRegExp('\\P{Letter}'); str = XRegExp.replace(str, regex, ''); 
+4
source

\W equivalent to [^a-zA-Z_0-9]

instead, you need to list all the characters you want to delete.

str = str.replace(/[ enter the characters you want to free here ]*/g, '');

0
source

Source: https://habr.com/ru/post/1500894/


All Articles