Toxialization of algebraic expression in string format

I am trying to take a string that represents a complete algebraic pullout, such as x = 15 * 6/3, which is a string, and label it in its individual components. Thus, the first will be x, then =, then 15, then *, 6, / and finally 3.

The problem I ran into is actually parsing the string and looking at individual characters. I can't figure out how to do this without a lot of if statements. Of course, there should be a better way, defining each individual case and testing for it.

+4
source share
4 answers

For each type of token, you need to figure out how to determine:

  • when you start reading a certain token
  • if you continue to read the same token, or if you run another

Take your example: x=15*6/3 . Suppose you cannot rely on the fact that there are gaps between each token. In this case, this is trivial: your new token begins when you reach the gap.

You can break character types into letters, numbers, and symbols. Let us call the types of tokens Variable, Operator and Number.

The letter indicates that the Variable token is running. It continues until you read the nebukta.

The symbol indicates the beginning of the operator token. I see only individual characters, but you can have groups of characters corresponding to different Operator tokens.

The number indicates the beginning of the token. (Suppose integers are now.) The number token continues until you read a number.

Basically, this is how a simple symbolic parser works. Now, if you add negative numbers (where the β€œ-" symbol can have several meanings) or parentheses or function names (for example, sin(x) ), then everything becomes more complicated, but it matches the same set of rules, now just with a lot of choice.

+3
source

This is from my early expression evaluator, which takes an infix expression like yours and turns it into postfix for evaluation. There are methods that help the parser, but I think they are pretty self-documenting. Mine uses character tables to check markers. It also allows you to use user-defined symbols and nested assignments and other things that you may not need / need. But it shows how I dealt with your problem without using subtleties like regex, which would simplify this task. In addition, everything shown is my own implementation - the stack and the queue - that's it. So if something looks abnormal (unlike Java imps), it is because it is.

This section of code is important in order not to answer your immediate question, but to show the necessary work, to determine the type of marker you are dealing with. In my case, I had three different types of operators and two different types of operands. Based on the well-known rules or rules that I chose to enforce (when necessary), it was easy to find out when something was a number (starts with a number), a variable / user symbol / mathematical function (starts with a letter), or a mathematical operator (is: /, *, -, +). Note that only the first char is required to get the correct extraction rules. From your example, if all your cases are simple, you will only need to process two types: operator or operand. However, the same logic will apply.

 protected Queue<Token> inToPostParse(String exp) { // local vars inputExp = exp; offset = 0; strLength = exp.length(); String tempHolder = ""; char c; // the program runs in a loop so make sure you're dealing // with an empty queue q1.reset(); for (int i = offset; tempHolder != null && i < strLength; ++i) { c = exp.charAt(i); // Spaces are useless so skip them if (c == ' ') { continue; } // If c is a letter if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) { // Here we know it must be a user symbol possibly undefined // at this point or an function like SIN, ABS, etc // We extract, based on obvious rules, the op tempHolder = extractPhrase(i); // Used to be append sequence if (ut.isTrigOp(tempHolder) || ut.isAdditionalOp(tempHolder)) { s1.push(new Operator(tempHolder, "Function")); } else { // If not some math function it is a user defined symbol q1.insert(new Token(tempHolder, "User")); } i += tempHolder.length() - 1; tempHolder = ""; // if c begins with a number } else if (c >= '0' && c <= '9') { try { // Here we know that it must be a number // so we extract until we reach a non number tempHolder = extractNumber(i); q1.insert(new Token(tempHolder, "Number")); i += tempHolder.length() - 1; tempHolder = ""; } catch (NumberFormatException nfe) { return null; } // if c is in the math symbol table } else if (ut.isMathOp(String.valueOf(c))) { String C = String.valueOf(c); try { // This is where the magic happens // Here we determine the "intersection" of the // current C and the top of the stack // Based on the intersection we take action // ie, in math do you want to * or + first? // Depending on the state you may have to move // some tokens to the queue before pushing onto the stack takeParseAction(C, ut.findIntersection (C, s1.showTop().getSymbol())); } catch (NullPointerException npe) { s1(C); } // it must be an invalid expression } else { return null; } } u2(); s1.reset(); return q1; } 

Basically I have a stack (s1) and a queue (q1). All variables or numbers are queued. Any trig, math, parens, etc. operators Go to the stack. If the current token needs to be pushed onto the stack, you should check the status (at the top) to determine what parsing action to take (i.e., what to do based on mathematical priority). Sorry if this seems like useless information. I assume that if you parse a mathematical expression because at some point you plan to evaluate it. IMHO, the postfix is ​​the easiest, so I, regardless of the input format, change it to a post and evaluate it using one method. If your O is different, do what you like.

Edit: Implementations
The statement formula and numerical methods that interest you most are as follows:

 protected String extractPhrase(int it) { String phrase = new String(); char c; for ( ; it < inputExp.length(); ++it) { c = inputExp.charAt(it); if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) { phrase += String.valueOf(c); } else { break; } } return phrase; } protected String extractNumber(int it) throws NumberFormatException { String number = new String(); int decimals = 0; char c; for ( ; it < strLength; ++it) { c = inputExp.charAt(it); if (c >= '0' && c <= '9') { number += String.valueOf(c); } else if (c == '.') { ++decimals; if (decimals < 2) { number += "."; } else { throw new NumberFormatException(); } } else { break; } } return number; } 

Remember - by the time of input of these methods I was already able to deduce what type. This avoids the endless chain of while-if-else.

+2
source
  • create a regular expression for each possible element: integer, variable, operator, parentheses.
  • combine them using the regex operator | into one large regular expression with capture groups to determine which one matches.
  • in a loop, coincide with the header of the remaining line and interrupt the agreed part as a token. The type of token depends on which sub-expression matches, as described in 2.

or

use lexer library e.g. in antlr or javacc

+2
source

Are components always separated by a spatial symbol, as in your question? If so, use algebricExpression.split(" ") to get String[] components.

If such restrictions cannot be accepted, a possible solution would be to iterate through the input, and switch Character.getType () of the current index, something like this:

 ArrayList<String> getExpressionComponents(String exp) { ArrayList<String> components = new ArrayList<String>(); String current = ""; int currentSequenceType = Character.UNASSIGNED; for (int i = 0 ; i < exp.length() ; i++) { if (currentSequenceType != Character.getType(exp.charAt(i))) { if (current.length() > 0) components.add(current); current = ""; currentSequenceType = Character.getType(exp.charAt(i)); } switch (Character.getType(exp.charAt(i))) { case Character.DECIMAL_DIGIT_NUMBER: case Character.MATH_SYMBOL: case Character.START_PUNCTUATION: case Character.END_PUNCTUATION: case Character.LOWERCASE_LETTER: case Character.UPPERCASE_LETTER: // add other required types current = current.concat(new String(new char[] {exp.charAt(i)})); currentSequenceType = Character.getType(exp.charAt(i)); break; default: current = ""; currentSequenceType = Character.UNASSIGNED; break; } } return components; } 

You can easily change cases to fit other requirements, such as dividing non-digital characters into separate components, etc.

+1
source

Source: https://habr.com/ru/post/1486717/


All Articles