C ++ substring compatibility

Question

C ++ substring compatibility

I have two lines, for example, “hello” “eo”, and I want to find duplicate characters between the two lines, namely: “e” and “o” in this example.

My algorithm will go this way

void find_duplicate(char* str_1, char* str_2, int len1, int len2) { char c ; if(len1 < len2) { int* idx_1 = new int[len1]; // record elements in little string // that are matched in big string for(int k = 0 ; k < len1 ; k++) idx_1[k] = 0; int* idx_2 = new int[len2]; // record if element in str_2 has been // matched already or not for(int k = 0 ; k < len2 ; k++) idx_2[k] = 0; for(int i = 0 ; i < len2 ; i++) { c = str_1[i]; for(int j = 0 ; j < len1 ; j++) { if(str_2[j] == c) { if(idx_2[j] == 0) // this element in str_2 has not been matched yet { idx_1[i] = j + 1; // mark ith element in idx as matched in string 2 at pos j idx_2[j] = 1; } } } } // now idx_1 and idx_2 contain matches info, let remove matches. char* str_1_new = new char[len1]; char* str_2_new = new char[len2]; int kn = 0; for(int k = 0 ; k < len1 ; k++) { if(idx_1[k] > 0) { str_1_new[kn] = str_1[k]; kn++; } } kn = 0; for(int k = 0 ; k < len2 ; k++) { if(idx_2[k] > 0) { str_2_new[kn] = str_2[k]; kn++; } } } else { // same here, switching roles (do it yourself) } }

I feel that my decision is inconvenient: - the symmetry of both cases in the first case, if / else and code duplication - time complexity: 2 * len1 * len2 operations to find duplicates, then len1 + len2 operations to delete - space complexity: two len1 and two len2 char *.

What to do if len1 and len2 not specified (with and without the use of the STL vector)?

can you provide you with an implementation of this algorithm?

thanks

+4

c ++ string substring string-matching algorithm

octoback Oct 6 '12 at 9:25

source share

2 answers

 std::vector<char> duplicates; for (auto c1: std::string(str_1)) for (auto c2: std::string(str_2)) if (c1 == c2) duplicates.push_back(c1);

or if you do not have a C ++ 11 compiler.

 std::vector<char> duplicates; std::string s1(str_1); std::string s2(str_2); for (std::size_t i = 0; i < s1.size(); i++) for (std::size_t j = 0; j < s2.size(); j++) if (s1[i] == s2[j]) duplicates.push_back(s1[i]);

- based on the answer Zarota

 std::vector<int> count(256,0); for (auto c : std::string(str_1)) count[c] += 1; for (auto c : std::string(str_2)) if (count[c] > 0) duplicates.push_back(c);

0

log0 Oct 6 '12 at 9:37

source share

Zaroth · Accepted Answer · 2012-10-06T10:10:35+0000

First of all, it is not a substring matching problem - it is a problem of finding common characters between two strings.

Your solution works in O (n * m) , where n = len1 and m = len2 in your code. You can easily solve the same problem in O (n + m + c) time by counting the characters in each of the lines (where c is equal to the size of the character set). This algorithm is called sorting counting .

Example code that implements this in your case:

 #include <iostream> #include <cstring> // for strlen and memset const int CHARLEN = 256; //number of possible chars using namespace std; // returns table of char duplicates char* find_duplicates(const char* str_1, const char* str_2, const int len1, const int len2) { int *count_1 = new int[CHARLEN]; int *count_2 = new int[CHARLEN]; char *duplicates = new char[CHARLEN+1]; // we hold duplicate chars here int dupl_len = 0; // length of duplicates table, we insert '\0' at the end memset(count_1,0,sizeof(int)*CHARLEN); memset(count_2,0,sizeof(int)*CHARLEN); for (int i=0; i<len1; ++i) { ++count_1[str_1[i]]; } for (int i=0; i<len2; ++i) { ++count_2[str_2[i]]; } for (int i=0; i<CHARLEN; ++i) { if (count_1[i] > 0 && count_2[i] > 0) { duplicates[dupl_len] = i; ++dupl_len; } } duplicates[dupl_len]='\0'; delete count_1; delete count_2; return duplicates; } int main() { const char* str_1 = "foobar"; const char* str_2 = "xro"; char* dup = find_duplicates(str_1, str_2, strlen(str_1), strlen(str_2)); cout << "str_1: \"" << str_1 << "\" str_2: \"" << str_2 << "\"\n"; cout << "duplicates: \"" << dup << "\"\n"; delete dup; return 0; }

Note that I also sort the output here. If you do not want to do this, you can skip counting characters in the second line and just start comparing duplicates on the go.

If, however, you intend to find several duplicates of the same letter (for example, if "banana" and "arena" should output "aan" instead of "an"), you can simply subtract the amount calculated in the current solution and adjust the output accordingly.

C ++ substring compatibility

More articles: