Effective way to remove specified characters from a string

For example, if str " Stackoverflow for everyone " and delete "aeiou", the function should convert str to " Stckvrflw s fr vry n ".

I have one array of char: str [] and one array of char characters to delete: delete []

My solution: Loop str [] is looking for everyone in the character in remove []. Shift str [] one place left time. I'm sure a better hack is possible.

+3
source share
6 answers

Offsetting the entire line left in one place will make this O (n ^ 2) algorithm efficient. You can do this locally in linear time:

void Remove (char * src, const char * match) {
   char * dest = src;
   for (;;) { 
      char ch = *src++; 
      if (!strchr (match, ch)) *dest++ = ch;  // Copy chars that don't match
      if (!ch) break;                         // Stop when we copy over a null  
   }
}

I assume they have zero termination. If this is not the case, then you also need to go the length and change the algorithm accordingly. In particular, you cannot use strchr. Just for completeness, here is a version that works with char arrays (not null terminated).

// Removes from str[] (of length strlen), all chars that are found
// in match[] (of length matchlen). Modifies str in place, and returns
// the updated (shortened) length of str. 
int Remove (char[] str, int srclen, char[] match, int matchlen) {
   int dst = 0, found;
   for (int src = 0; src < srclen; src++) { 
      char ch = str[src];  
      found = 0;           // Search if this char is found in match
      for (int i = 0; i < matchlen && !found; i++) 
         if (match[i] == ch) found = 1;
      if (!found) str[dst++] = ch;
   }
   return dst;
}

And finally, it is as close as possible to O (n), as we are going to get, I think. I assume 8-bit characters here and create a lookup table so that it runs in O (n) + O (m), where m is the length of the match string.

int Remove (char* str, int srclen, char* match, int matchlen) {
   bool found[256];
   for (int i = 0; i < 256; i++) found[i] = 0;
   for (int i = 0; i < matchlen; i++) found[match[i]] = 1; 

   int dst = 0;
   for (int src = 0; src < srclen; src++) { 
      char ch = str[src];  
      if (!found[ch]) str[dst++] = ch;
   }
   return dst;
}
+5
source

I think this is one of those "classic" puzzles.

, "match" .

'src' , char .

O (n) .

- :

   static char bits[32];  // Not thread-safe, but avoids extra stack allocation
   char * dest = src;
   memset(bits, sizeof(bits), 0);  
   for (; *remove; remove++)
   {
      bitfields[*match >> 3] |= *remove & 7;
   }

   for (;*src; src++) 
   {
      if (!((bits[*src >> 3] & (*src & 7)) == (*src & 7)))
      { 
        *dest++ = *src;
      }
   }

, ischr(), isdigit(), isspace() .. , .

+2

, if :

#include <stdio.h>
#include <string.h>

int main( void ){
  unsigned char str[]    = "Stackoverflow is for every one";
  unsigned char remove[] = "aeiou";

  unsigned char table[256] = { [ 0 ... 255 ] = 1 };
  for( unsigned char *r=remove; *r; r++ ){ table[*r]=0; }

  unsigned char *source=str, *dest=str;
  while( (*dest = *source++) ) dest += table[*dest];

  printf( "str: '%s'\n", str );
}
+2

, : Loop str [] remove [], .

0

str [] , remove [] (, new_str []). new_str [] str [].

-1

Using regular expressions to find and replace is a more compact solution. Use the GNU C library or find another one that supports searching and replacing regular expressions. Of course, if the characters change every time, you will have to create a regular expression at runtime. If you stick to your current approach, divide it into functions.

I also like the approach of Taridon. Its less work!

-1
source

Source: https://habr.com/ru/post/1730744/


All Articles