Replacing multiple occurrences with one occurrence per line

I have data like "11223311" and I want all multiple occurrences to be replaced with one occurrence, i.e. The above should turn into "123". I work at SAP HANA.

But using the logic below, I get "1231" from "11223311".

SELECT REPLACE_REGEXPR('(.)\1+' IN '11223331' WITH '\1' OCCURRENCE ALL) FROM DUMMY; 
+5
source share
2 answers

Your regular expression replaces only a few consecutive occurrences of characters; something that does \1+ immediately after matching it (.) .

You can use look-ahead to remove all characters that also occur somewhere after this match. Note that this saves the last occurrence, not the first:

 SELECT REPLACE_REGEXPR('(.)(?=.*\1)' IN '11223331' WITH '' OCCURRENCE ALL) FROM DUMMY 

This returns: 231

If you want to keep the first occurrence, I don't see the possibility with just one regex (I could be wrong though). Using look-behind does not work in the same way, because it must be of variable length, which is not supported in HANA and most other implementations. Often \ K is recommended as an alternative, but something like (.).*\K\1 will not work with replacing all, because all characters before \ K are still consumed as substitutions. If you could run the same regular expression in a loop, it could work, but then why not use a non-regex loop (such as the user-defined HANA function, for example).

+3
source

Please, try

 SELECT REPLACE_REGEXPR(concat(concat('[^','11223331'),']') IN '0123456789' WITH '' OCCURRENCE ALL) FROM DUMMY; 
0
source

Source: https://habr.com/ru/post/1264149/


All Articles