Javascript regular expression leave only words (international version)

Question

Javascript regular expression leave only words (international version)

I am trying to cut a string to leave the remaining words. For anything using the latin alphabet, I can easily deal with it with

str = str.replace(/\W/g, '').replace(/[0-9]/g, '');

(I guess I probably don't need both replace s, but I'm very new to regular expressions and not sure what I'm doing)

However, it also supplants foreign characters such as Chinese or Arabic.

How do I write a function for this?

 strOne = "test!(£)98* string"; strTwo = "你好，325!# 世界"; cleanUp (strOne); // Output: "test string" cleanUp (strTwo); // Output: "您好 世界"

(In case someone wonders, Chinese is my "hello world" through an online translator)

In the library note, I do not know if this value matters, but I use dojo and would like to avoid jquery if possible.

+4

javascript regex

Emma Sep 06 '13 at 10:18

source share

2 answers

\W equivalent to [^a-zA-Z_0-9]

instead, you need to list all the characters you want to delete.

str = str.replace(/[ enter the characters you want to free here ]*/g, '');

0

dark_ruby Sep 06 '13 at 10:24

source share

collapsar · Accepted Answer · 2013-09-06T10:33:58+0000

you will need a regular expression pattern using the unicode character properties , namely \P{Letter} .

Unfortunately, the built-in jj regex mechanism does not support these constructs (see mdn docs ). however there is (at least) this third-party library that includes the js plugin , adding support.

code example:

 var regex, str; str = "whatever"; regex = XRegExp('\\P{Letter}'); str = XRegExp.replace(str, regex, '');

Javascript regular expression leave only words (international version)

More articles: