Regular expression to retrieve SQL query

Is there a regular expression that extracts SQL queries from a string? I am NOT interested in checking any SQL syntax, not just choosing SQL commands. This allows flexible analysis of a given SQL file / string.

The following is an example SQL / string file:

SELECT * FROM test_table WHERE test_row = 'Testing ; semicolon'; SELECT * FROM another_test_table; INSERT INTO table_name VALUES (value1,'value which contains semicolon ;;;;',value3,...); 

Some pseudo-code example: ^(UPDATE|SELECT|INSERT INTO)(.*)(;)$ . In the future I want to expand this with all the (possible) commands.

  • Look for an initial match with: (UPDATE | SELECT | INSERT | INTO)
  • Zero or more of any character (including spaces and newlines)
  • Stop on ; which restricts the SQL query.

Whenever possible with a regular expression, the following java code can retrieve all the SQL commands:

 final String regex = "LOOKING_FOR_THIS_ONE"; final Pattern p = Pattern.compile(regex, Pattern.MULTILINE); final Matcher matcher = p.matcher(content); while (matcher.find()) { // matcher.group() now contains the full SQL command } 

Thanks in advance!

+2
source share
5 answers

I will begin by saying that this is not a good way to do this, and urge you to find another way to do this, preferably marking it right where statements are made so that you do not end up in this situation.

At the same time, SQL requires that it starts with one of the following: DELETE , SELECT , WITH , UPDATE or INSERT INTO . It also requires that the entry end with ; .

We can use this to capture all sequences matching SQL, as follows:

 final String regex = "^(INSERT INTO|UPDATE|SELECT|WITH|DELETE)(?:[^;']|(?:'[^']+'))+;\\s*$"; final Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL); 

In group 1, the operational word is now executed if you want to filter valid SQL by UPDATE or SELECT .

See the regex in action, as well as the cave here:

https://regex101.com/r/dt9XTK/2

+2
source

You can match it β€œcorrectly” if the semicolon is the last character with no space in this line.

 final String regex = ^(SELECT|UPDATE|INSERT)[\s\S]+?\;\s*?$ final Pattern p = Pattern.compile(regex, Pattern.MULTILINE); final Matcher matcher = p.matcher(content); 
+1
source

(?m)^(UPDATE|SELECT|INSERT INTO).*;$ should work. This will expand the pattern to match newlines. It should be able to scroll and find all your SQL.

Looking at your example, it will match your commands before ; . You can see the example used for testing here .

0
source

If you are dealing with a language, create a lexer that symbolizes your string. Use JFlex , which is a lexical analyzer generator. It generates a Java class that breaks the string into tokens based on the grammar specified in a special file. Take the appropriate grammar rules from this file .

Analysis is a separate process than tokenization (or lexical analysis). You might want to use the parser generator after lexical analysis if lexical analysis is not enough.

0
source

SQL is complex enough that you need a context to search for all statements, which means you cannot do this with a regular expression.

For instance:

 SELECT Model FROM Product WHERE ManufacturerID IN (SELECT ManufacturerID FROM Manufacturer WHERE Manufacturer = 'Dell') 

(example obtained from http://www.sql-tutorial.com/sql-nested-queries-sql-tutorial/ ). Nested queries can be inserted multiple times, start with different values, etc. If you could write a regular expression for the subset you are interested in, it would be unreadable.

ANTLR has a SQL 2003 grammar (I have not tried it).

0
source

Source: https://habr.com/ru/post/1482958/


All Articles