Regular expression to select semicolons that are not enclosed in double quotes

I have a line like

a;b;"aaa;;;bccc";deef 

I want to split a string based on a separator ; , only if ; not inside double quotes. So, after the split, it will be

  a b "aaa;;;bccc" deef 

I tried using look-behind, but I cannot find the correct regular expression to separate.

+4
source share
4 answers

Regular expressions are probably not suitable for this. If possible, you should use the CSV library, specify ; as a delimiter and as a quotation mark, this should give you the exact fields you are looking for.

Here we look at one approach that works by ensuring that there is an even number of quotes between ; which we consider as division at the end of a line.

 ;(?=(([^"]*"){2})*[^"]*$) 

Example: http://www.rubular.com/r/RyLQyR8F19

This will break if you can escape the quotes in the string, for example a;"foo\"bar";c .

Here is a cleaner example using the Python csv module :

 import csv, StringIO reader = csv.reader(StringIO.StringIO('a;b;"aaa;;;bccc";deef'), delimiter=';', quotechar='"') for row in reader: print '\n'.join(row) 
+8
source

This is ugly, but if you donโ€™t have \ "inside your quoted strings (this means you donโ€™t have lines that look like this (" foo bar \ "badoo \" goo "), you can split it into" first ", and then suppose all of your even elements in the array are, in fact, strings (and break the odd numbered elements into their component parts on the token).

If you have * in your lines, you need to first convert them to another temporary token, which you will convert later after your operation.

Here is the violin ...

http://jsfiddle.net/VW9an/

  var str = 'abc;def;ghi"some other dogs say \\"bow; wow; wow\\". yes they do!"and another; and a fifth' var strCp = str.replace(/\\"/g,"--##--"); var parts = strCp.split(/"/); var allPieces = new Array(); for(var i in parts){ if(i % 2 == 0){ var innerParts = parts[i].split(/\;/) for(var j in innerParts) allPieces.push(innerParts[j]) } else{ allPieces.push('"' + parts[i] +'"') } } for(var a in allPieces){ allPieces[a] = allPieces[a].replace(/--##--/g,'\\"'); } console.log(allPieces) 
+2
source

A regular expression will only be more erratic and break even with minor changes. You are better off using the csv parser with any scripting language. Perl is a built-in module (so you donโ€™t need to load from CPAN if there are any restrictions) called Text :: ParseWords allows you to specify a separator so that you are not limited. Here is an example snippet:

 #!/usr/local/bin/perl use strict; use warnings; use Text::ParseWords; my $string = 'a;b;"aaa;;;bccc";deef'; my @ary = parse_line(q{;}, 0, $string); print "$_\n" for @ary; 

Output

 a b aaa;;;bccc deef 
+2
source

Combine everything instead of splitting

Responding long after the battle because no one used the method that seems easiest to me.

Once you understand that Match All and Split are two sides of the same coin , you can use this simple regular expression:

 "[^"]*"|[^";]+ 

See matches in the Regex Demo .

  • Left side of alternation | matches the full quotation mark
  • The right side matches any characters that are not ; and "
+1
source

Source: https://habr.com/ru/post/1488812/


All Articles