Escape function for regular expressions or LIKE patterns

Question

Escape function for regular expressions or LIKE patterns

To stop reading the whole problem, my main question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?

I researched the documentation, but could not find such a function.

Here is the complete problem:

In a PostgreSQL database, I have a column with unique names. I also have a process that periodically inserts names into this field, and to prevent duplication, if it needs to enter a name that already exists, it adds a space and parentheses with the quantity to the end.

i.e. name, name (1), name (2), name (3), etc.

In its current form, I use the following code to find the following number to add to the series (written in plpgsql):

var_name_id := 1; SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) INTO var_last_name_id FROM my_table.names a WHERE a.name LIKE var_name || ' (%)' ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC LIMIT 1; IF var_last_name_id IS NOT NULL THEN var_name_id = var_last_name_id + 1; END IF; var_new_name := var_name || ' (' || var_name_id || ')';

( var_name contains the name I'm trying to insert.)

So far this works, but the problem is the WHERE statement:

 WHERE a.name LIKE var_name || ' (%)'

This check does not verify that the % question is a number, and does not take into account several parentheses, as in something like "Name ((1))", and if in any case there was a cast exception, be thrown.

The WHERE statement should really be something like:

 WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

But var_name may contain var_name characters, which leads to the question above: is there a function in PostgreSQL that escapes regex characters in a string, so I could do something like:

 WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

Any suggestions are welcome, including a possible rework of my solution with duplicate names.

+7

regex pattern-matching escaping plpgsql postgresql

Benny Feb 28 '11 at 15:36

source share

3 answers

To answer the question above:

Regular expression escape function

Let's start with a complete list of characters with a special meaning in regular expression patterns :

 !$()*+.:<=>?[\]^{|}-

Wrapped in parenthesized expressions, most of them lose their special meaning - with a few exceptions:

- Must be first or last, or it denotes a range of characters.
] and \ must be escaped with \ (also in replacement).

After adding the bracket capture for the backlink below, we get this regex pattern:

 ([!$()*+.:<=>?[\\\]^{|}-])

Using it, this function escapes all special characters with a backslash ( \ ) - thereby removing the special value:

 CREATE OR REPLACE FUNCTION f_regexp_escape(text) RETURNS text AS $func$ SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g') $func$ LANGUAGE sql IMMUTABLE;

Demo

 SELECT f_regexp_escape('test(1) > Foo*');

Returns:

 test\(1\) \> Foo\*

In the meantime:

 SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

returns FALSE , which may come as a surprise to naive users,

 SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

Returns TRUE , as it is now.

`LIKE` exit function

For completeness, a pendant for LIKE templates, where only three characters are special:

\%_

Guide:

The default escape character is the backslash, but you can select another using the ESCAPE clause.

This function takes a default value:

 CREATE OR REPLACE FUNCTION f_like_escape(text) RETURNS text AS $func$ SELECT replace(replace(replace($1 , '\', '\\') -- must come 1st , '%', '\%') , '_', '\_'); $func$ LANGUAGE sql IMMUTABLE;

We could also use the more elegant regexp_replace() here too, but with just a few characters, the cascade of replace() functions is faster.

Demo

 SELECT f_like_escape('20% \ 50% low_prices');

Returns:

 20\% \\ 50\% low\_prices

+8

Erwin brandstetter Aug 17 '17 at 17:33

source share

Can you change the circuit? I think the problem will go away if you can use a composite primary key:

 name text not null, number integer not null, primary key (name, number)

Then, Fred # 0 as "Fred", Fred # 1 as "Fred (1)", & c. Will be displayed on the screen layer.

If you like, you can create an idea of this responsibility. Here is the data:

 => select * from foo; name | number --------+-------- Fred | 0 Fred | 1 Barney | 0 Betty | 0 Betty | 1 Betty | 2 (6 rows)

View:

 create or replace view foo_view as select *, case when number = 0 then name else name || ' (' || number || ')' end as name_and_number from foo;

And the result:

 => select * from foo_view; name | number | name_and_number --------+--------+----------------- Fred | 0 | Fred Fred | 1 | Fred (1) Barney | 0 | Barney Betty | 0 | Betty Betty | 1 | Betty (1) Betty | 2 | Betty (2) (6 rows)

0

Wayne conrad Feb 28 '11 at 20:06

source share

user533832 · Accepted Answer · 2011-02-28T16:44:52+0000

how to try something like this, substituting var_name for my hard-coded 'John Bernard' :

 create table my_table(name text primary key); insert into my_table(name) values ('John Bernard'), ('John Bernard (1)'), ('John Bernard (2)'), ('John Bernard (3)'); select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) from my_table where substring(name, 1, 12)='John Bernard' and substring(name, 13)~'^ \([1-9][0-9]*\)$'; max ----- 4 (1 row)

one caveat: I assume single-user access to the database while this process is working (and you too are in your approach). If this is not the case, then the max(n)+1 approach will not be good.

Escape function for regular expressions or LIKE patterns

Regular expression escape function

Demo

LIKE exit function

Demo

More articles:

`LIKE` exit function