Filter bad words with special characters

I use https://www.npmjs.com/package/bad-words and I created regex for special filter characters.

const Filter = require('bad-words');
const badWordsFilter = new Filter({replaceRegex:  /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g});
badWordsFilter.addWords(['badword', 'şğ'])

If the word does not contain a Turkish character, it works. But if I write a Turkish character, for example, ş or ğ, it is not filtered.

Is my regex wrong?

I found this code in the documentation:

var filter = new Filter({ regex: /\*|\.|$/gi });
var filter = new Filter({ replaceRegex:  /[A-Za-z0-9가-힣_]/g }); 
//multilingual support for word filtering
+4
source share
4 answers

You obviously have a problem with the encoding, since your regular expression works from your application, see here: https://regex101.com/r/VpItfH/3/ .

, , :

: https://regex101.com/r/VpItfH/4/


PCRE (https://regex101.com/r/VpItfH/5):

/[A-Za-z0-9\x{f6}\x{d6}\x{c7}\x{e7}\x{15e}\x{15f}\x{11e}\x{11f}\x{130}\x{131}\x{dc}\x{fc}_]/g

javascript- {, } , , , \x \u0. . \x{15e} \u015e

, /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g.

. , "Ğ".charCodeAt(0).toString(16); \x \u0.

, , , , , .:)

+2

:

var filter = new Filter({ replaceRegex: /(\w+)/gi });

, replaceRegex.


.

/(\w+)/gi ( regex101):

  • 1- (\ w +).
    • \w + ( [a-zA-Z0-9 _])
    • + . , ().
    • : . ( [a-zA-Z])
    • g : . ( )
+1

Unicode-aware, u. , /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/gu ( u). ( , , Internet Explorer). , , .

+1

javascript utf-8 :

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

, .

0

Source: https://habr.com/ru/post/1672042/


All Articles