Escape function for regular expressions or LIKE patterns

To stop reading the whole problem, my main question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?

I researched the documentation, but could not find such a function.

Here is the complete problem:

In a PostgreSQL database, I have a column with unique names. I also have a process that periodically inserts names into this field, and to prevent duplication, if it needs to enter a name that already exists, it adds a space and parentheses with the quantity to the end.

i.e. name, name (1), name (2), name (3), etc.

In its current form, I use the following code to find the following number to add to the series (written in plpgsql):

var_name_id := 1; SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) INTO var_last_name_id FROM my_table.names a WHERE a.name LIKE var_name || ' (%)' ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC LIMIT 1; IF var_last_name_id IS NOT NULL THEN var_name_id = var_last_name_id + 1; END IF; var_new_name := var_name || ' (' || var_name_id || ')'; 

( var_name contains the name I'm trying to insert.)

So far this works, but the problem is the WHERE statement:

 WHERE a.name LIKE var_name || ' (%)' 

This check does not verify that the % question is a number, and does not take into account several parentheses, as in something like "Name ((1))", and if in any case there was a cast exception, be thrown.

The WHERE statement should really be something like:

 WHERE a.r1_name ~* var_name || E' \\(\\d+\\)' 

But var_name may contain var_name characters, which leads to the question above: is there a function in PostgreSQL that escapes regex characters in a string, so I could do something like:

 WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)' 

Any suggestions are welcome, including a possible rework of my solution with duplicate names.

+7
source share
3 answers

how to try something like this, substituting var_name for my hard-coded 'John Bernard' :

 create table my_table(name text primary key); insert into my_table(name) values ('John Bernard'), ('John Bernard (1)'), ('John Bernard (2)'), ('John Bernard (3)'); select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) from my_table where substring(name, 1, 12)='John Bernard' and substring(name, 13)~'^ \([1-9][0-9]*\)$'; max ----- 4 (1 row) 

one caveat: I assume single-user access to the database while this process is working (and you too are in your approach). If this is not the case, then the max(n)+1 approach will not be good.

+1
source

To answer the question above:

Regular expression escape function

Let's start with a complete list of characters with a special meaning in regular expression patterns :

 !$()*+.:<=>?[\]^{|}- 

Wrapped in parenthesized expressions, most of them lose their special meaning - with a few exceptions:

  • - Must be first or last, or it denotes a range of characters.
  • ] and \ must be escaped with \ (also in replacement).

After adding the bracket capture for the backlink below, we get this regex pattern:

 ([!$()*+.:<=>?[\\\]^{|}-]) 

Using it, this function escapes all special characters with a backslash ( \ ) - thereby removing the special value:

 CREATE OR REPLACE FUNCTION f_regexp_escape(text) RETURNS text AS $func$ SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g') $func$ LANGUAGE sql IMMUTABLE; 

Demo

 SELECT f_regexp_escape('test(1) > Foo*'); 

Returns:

 test\(1\) \> Foo\* 

In the meantime:

 SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*'; 

returns FALSE , which may come as a surprise to naive users,

 SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*'); 

Returns TRUE , as it is now.

LIKE exit function

For completeness, a pendant for LIKE templates, where only three characters are special:

 \%_ 

Guide:

The default escape character is the backslash, but you can select another using the ESCAPE clause.

This function takes a default value:

 CREATE OR REPLACE FUNCTION f_like_escape(text) RETURNS text AS $func$ SELECT replace(replace(replace($1 , '\', '\\') -- must come 1st , '%', '\%') , '_', '\_'); $func$ LANGUAGE sql IMMUTABLE; 

We could also use the more elegant regexp_replace() here too, but with just a few characters, the cascade of replace() functions is faster.

Demo

 SELECT f_like_escape('20% \ 50% low_prices'); 

Returns:

 20\% \\ 50\% low\_prices 
+8
source

Can you change the circuit? I think the problem will go away if you can use a composite primary key:

 name text not null, number integer not null, primary key (name, number) 

Then, Fred # 0 as "Fred", Fred # 1 as "Fred (1)", & c. Will be displayed on the screen layer.

If you like, you can create an idea of ​​this responsibility. Here is the data:

 => select * from foo; name | number --------+-------- Fred | 0 Fred | 1 Barney | 0 Betty | 0 Betty | 1 Betty | 2 (6 rows) 

View:

 create or replace view foo_view as select *, case when number = 0 then name else name || ' (' || number || ')' end as name_and_number from foo; 

And the result:

 => select * from foo_view; name | number | name_and_number --------+--------+----------------- Fred | 0 | Fred Fred | 1 | Fred (1) Barney | 0 | Barney Betty | 0 | Betty Betty | 1 | Betty (1) Betty | 2 | Betty (2) (6 rows) 
0
source

Source: https://habr.com/ru/post/1341665/


All Articles