C ++: removing all asterisks from a string where asterisks are not multiplication symbols

Basically, I can have a line that looks like this: "Hey, this is line *, this line is awesome 97 * 3 = 27 * this line is cool."

However, this line can be huge. I am trying to remove all asterisks from a string, unless that asterisk is a multiplication. Efficiency is somewhat important here, and I am having problems with a good algorithm for removing all stars without multiplication.

To determine if an asterisk is for multiplication, I can obviously just check if it is sandwiched between two numbers.

So I thought I could do something like (pseudocode):

wasNumber = false Loop through string if number set wasNumber = true else set wasNumber = false if asterisk if wasNumber if the next word is a number do nothing else remove asterisk else remove asterisk 

However, ^ is ugly and inefficient on a huge line. Can you come up with a better way to do this in C ++?

Also, how could I check if a word is a number? This allowed to be decimal. I know there is a function to check if a character is a number ...

+6
source share
4 answers

Fully valid code:

 #include <iostream> #include <string> using namespace std; string RemoveAllAstericks(string); void RemoveSingleAsterick(string&, int); bool IsDigit(char); int main() { string myString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool"; string newString = RemoveAllAstericks(myString); cout << "Original: " << myString << "\n"; cout << "Modified: " << newString << endl; system("pause"); return 0; } string RemoveAllAstericks(string s) { int len = s.size(); int pos; for(int i = 0; i < len; i++) { if(s[i] != '*') continue; pos = i - 1; char cBefore = s[pos]; while(cBefore == ' ') { pos--; cBefore = s[pos]; } pos = i + 1; char cAfter = s[pos]; while(cAfter == ' ') { pos++; cAfter = s[pos]; } if( IsDigit(cBefore) && IsDigit(cAfter) ) RemoveSingleAsterick(s, i); } return s; } void RemoveSingleAsterick(string& s, int i) { s[i] = ' '; // Replaces * with a space, but you can do whatever you want } bool IsDigit(char c) { return (c <= 57 && c >= 48); } 

Top Level Overview:

The code searches for the string until it encounters * . Then he looks at the first character without spaces before AND after * . If both characters are numeric, the code decides that this is a multiplication operation and removes the asterisk. Otherwise, it is ignored.

See the change history of this message if you would like other information.

Important notes:

  • You should seriously consider adding boundary checks to the row (i.e. don't try to access an index that is less than 0 or more than len
  • If you are worried about parentheses, change the condition that checks for spaces to also check for parentheses.
  • Checking whether each individual character is a bad idea is a number . . At the very least, this will require two logical checks (see my IsDigit() function). (My code checks for "*", which is one logical operation.) However, some of the published sentences were very poorly thought out. Do not use regular expressions to check if a character is numeric.

Since you mentioned efficiency in your question and I don't have enough comments to comment on other answers:

The switch statement, which checks for '0' '1' '2' ..., means that every character that is NOT a digit must go through 10 logical operations. With all due respect, please, since char maps to int s, just check the borders (char <= '9' && char >= '0')

+4
source

You can start by implementing a slow version, it can be much faster than you think. But let them say it too slowly. Then this is an optimization problem. Where does inefficiency lie?

  • "if a number" is easy, you can use a regular expression or anything that stops when it finds something that is not a number.
  • “if the next word is a number” is just as easy to implement effectively.

Now this is the “remove asterisk” part, which is your problem. The key point here is that you do not need to duplicate the line: you can change it in place, as you only delete items.

Try to do this visually before trying to implement it.

Keep two integers or iterators, the first one says that you are currently reading your line, and the second says where you are writing your line now. Since you are only deleting material, the read will always be ahead of the written.

If you decide to keep the current line, you just need to advance each of the integers / iterators one by one and copy accordingly. If you do not want to store it, just push the reading line! Then you only need to cut the string by the number of stars you have deleted. Complexity is just O (n), without using any extra buffer.

Also note that your algorithm will be simpler (but equivalent) if it is written as follows:

 wasNumber = false Loop through string if number set wasNumber = true else set wasNumber = false if asterisk and wasNumber and next word is a number do nothing // using my algorithm, "do nothing" actually copies what you intend to keep else remove asterisk 
+3
source

I found your little problem interesting, and I wrote ( and tested ) a small and simple function that will do just that on std::string . Here u go:

 // TestStringsCpp.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include <string> #include <iostream> using namespace std; string& ClearAsterisk(string& iString) { bool bLastCharNumeric = false; string lString = "0123456789"; for (string::iterator it = iString.begin(); it != iString.end() ; ++it) { switch (*it) { case ' ': break;//ignore whitespace characters case '*': if (bLastCharNumeric) { //asterisk is preceded by numeric character. we have to check if //the following non space character is numeric also for (string::iterator it2 = it + 1; it2 != iString.end() ; ++it2) { if (*it2 != ' ') { if (*it2 <= '9' && *it2 >= '0') break; else iString.erase(it); break; //exit current for } } } else iString.erase(it);; break; default: if (*it <= '9' && *it >= '0') bLastCharNumeric= true; else bLastCharNumeric = false; //reset flag } } return iString; } int _tmain(int argc, _TCHAR* argv[]) { string testString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool"; cout<<ClearAsterisk(testString).c_str(); cin >> testString; //this is just for the app to pause a bit :) return 0; } 

It will work fine with your sample line, but it will fail if you have this text: "this is a happy 5 * 3day menu" , because it only checks the first space character after "*". But honestly, I can’t imagine many cases when you have such a design in a sentence.

NTN
JP.

+3
source

A regular expression will not necessarily be more efficient, but it will allow you to rely on someone else to do parsing and string manipulation.

Personally, if I were concerned about efficiency, I would implement your pseudocode version while limiting unnecessary memory allocations. I could even mmap enter the file. I doubt very much that you will get much faster than that.

0
source

Source: https://habr.com/ru/post/893796/


All Articles