Parsing and regex for conditional logical string

I need tokenize a conditional string expression:

Arithmetic operators = +, -, *, /,%

Boolean operators: & &, ||

Conditional operators = ==,> =,>, <, <=, <,! =

Example expression: = (x + 3> 5 * y) && (z> = 3 || k! = X)

What I want is to tokenize this string = operators + operands.

Due to ">" and "> =" and "=" and "! =" [Contains the same line] I have problems with tokenization.

PS1 : I do not want to do complex lexical analysis. Just simply analyze if possible with explanatory expressions.

PS2: Or, in other words, I'm looking for a regular expression that is given a figurative expression wihout whitespace =

(x+3>5*y)&&(z>=3 || k!=x) 

and will produce each token, separated by a space, for example:

 ( x + 3 > 5 * y ) && ( z >= 3 || k != x ) 
+4
source share
2 answers

Not a regular expression, but a basic tokenizer that can just work (note that you do not need to do string.Join - you can use IEnumerable<string> via foreach ):

 using System; using System.Collections.Generic; using System.Linq; using System.Text; static class Program { static void Main() { // and will produce each token is separated with a white space like : ( x + 3 > 5 * y ) && ( z >= 3 || k != x ) string recombined = string.Join(" ", Tokenize("(x+3>5*y)&&(z>=3 || k!=x)")); // output: ( x + 3 > 5 * y ) && ( z >= 3 || k != x ) } public static IEnumerable<string> Tokenize(string input) { var buffer = new StringBuilder(); foreach (char c in input) { if (char.IsWhiteSpace(c)) { if (buffer.Length > 0) { yield return Flush(buffer); } continue; // just skip whitespace } if (IsOperatorChar(c)) { if (buffer.Length > 0) { // we have back-buffer; could be a>b, but could be >= // need to check if there is a combined operator candidate if (!CanCombine(buffer, c)) { yield return Flush(buffer); } } buffer.Append(c); continue; } // so here, the new character is *not* an operator; if we have // a back-buffer that *is* operators, yield that if (buffer.Length > 0 && IsOperatorChar(buffer[0])) { yield return Flush(buffer); } // append buffer.Append(c); } // out of chars... anything left? if (buffer.Length != 0) yield return Flush(buffer); } static string Flush(StringBuilder buffer) { string s = buffer.ToString(); buffer.Clear(); return s; } static readonly string[] operators = { "+", "-", "*", "/", "%", "=", "&&", "||", "==", ">=", ">", "<", "<=", "!=", "(",")" }; static readonly char[] opChars = operators.SelectMany(x => x.ToCharArray()).Distinct().ToArray(); static bool IsOperatorChar(char newChar) { return Array.IndexOf(opChars, newChar) >= 0; } static bool CanCombine(StringBuilder buffer, char c) { foreach (var op in operators) { if (op.Length <= buffer.Length) continue; // check starts with same plus this one bool startsWith = true; for (int i = 0; i < buffer.Length; i++) { if (op[i] != buffer[i]) { startsWith = false; break; } } if (startsWith && op[buffer.Length] == c) return true; } return false; } } 
+4
source

If you can predefine all the statements you are going to use, something like this might work for you.

Be sure to put the two-character operators before in the regular expression so that you try to match the '<' before you compare the '<='.

 using System; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = "!=|<=|>=|\\|\\||\\&\\&|\\d+|[az()+\\-*/<>]"; string sentence = "(x+35>5*y)&&(z>=3 || k!=x)"; foreach (Match match in Regex.Matches(sentence, pattern)) Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index); } } 

Output:

 Found '(' at position 0 Found 'x' at position 1 Found '+' at position 2 Found '35' at position 3 Found '>' at position 5 Found '5' at position 6 Found '*' at position 7 Found 'y' at position 8 Found ')' at position 9 Found '&&' at position 10 Found '(' at position 12 Found 'z' at position 13 Found '>=' at position 14 Found '3' at position 16 Found '||' at position 18 Found 'k' at position 21 Found '!=' at position 22 Found 'x' at position 24 Found ')' at position 25 
+1
source

Source: https://habr.com/ru/post/1490442/


All Articles