We are trying to find a regular expression that allows us to divide sentences into words. Of course, the immediate answer is to use \w, except that it does not break into _which we need. Then we tried [a-zA-Z0-9](we would like to allow numbers inside words), the problem is that it is broken down into accents, which are quite common in many languages ...
So, ideally, which regular expression should be used to split the following sentence into the following words:
"Je ne déguste pas d'asperges, car je n'aime pas ça"
about
["Je", "n", "déguste", "pa", "d", "asperges", "car", "je", "n", "aime", "pas", "ça"]
source
share