Two lines such as XYZ and XZY

Question

Two lines such as XYZ and XZY

I have two lines, the same lengths. And I have to check if they can be expressed as XYZ and XZY, where Y and Z are not empty.

My solution is to "eat" the very first letters of both lines, and then find the longest common substring for the rest. Then check if the leftovers of the first and second rows are equal (without LCS). The problem is that I heard about the O (N) memory complexity algorithm for this, but all I found is O (MN). I have limited memory, so this is important to me.

The second solution is to use "(*) (. +) (. +) \ 1 \ 3 \ 2" regex, but this is a very bad solution.

Does anyone have any other ideas?

+6

c ++ lcs

nonono Jun 22 '15 at 18:51

source share

3 answers

First, stop copying material:

 template<class T> struct array_view { T* b = 0; T* e = 0; T* begin()const{return b;} T* end()const{return e;} // defaults: array_view()=default; array_view(array_view const&)=default; array_view& operator=(array_view const&)=default; ~array_view()=default; array_view( T* s, size_t n ):array_view(s, s+n){} array_view( T* s, T* f ):b(s),e(f){} using mutable_T = typename std::remove_const<T>::type; template<size_t N> array_view( T(&arr)[N] ):array_view(arr, N){} template<size_t N> array_view( std::array<T,N>&arr ):array_view(arr.data(), N){} template<size_t N> array_view( std::array<mutable_T,N>const&arr ):array_view(arr.data(), N){} // similar for std::vector: template<class...Ts> array_view( std::basic_string<mutable_T, Ts...> const& src): array_view(src.data(), src.size()) {} template<class...Ts> array_view( std::basic_string<T, Ts...>& src): array_view( src.empty()? array_view(): array_view(std::addressof(src[0]),src.size()) ) {} T& back() const { return *std::prev(end()); } T& front() const { return *begin(); } size_t size() const { return end()-begin(); } bool empty() const { return begin()==end(); } // slicing functions: array_view front( size_t n ) const { if (size() <= n) return *this; return {begin(), n}; } array_view back( size_t n ) const { if (size() <= n) return *this; return {end()-n, n}; } array_view trim_front( size_t n ) const { return back( size()-n ); } array_view trim_back( size_t n ) const { return front( size()-n ); } array_view sub( size_t start, size_t len ) const { if (start >= size()) return {}; len = (std::min)( size()-start, len ); return {begin()+start, len}; } // comparisons: friend bool operator==( array_view lhs, array_view rhs ) { if (lhs.size()!=rhs.size()) return false; return std::equal( lhs.begin(), lhs.end(), rhs.begin() ); } friend bool operator!=( array_view lhs, array_view rhs ) { return !(lhs==rhs); } friend bool operator<( array_view lhs, array_view rhs ) { return std::lexicographical_compare( lhs.begin(), lhs.end(), rhs.begin(), rhs.end() ); } friend bool operator>( array_view lhs, array_view rhs ) { return rhs<lhs; } friend bool operator<=( array_view lhs, array_view rhs ) { return !(lhs>rhs); } friend bool operator>=( array_view lhs, array_view rhs ) { return !(lhs<rhs); } };

an array_view is a range without ownership. It does not support char features, but I don't care.

 using string_view = array_view<const char>; size_t common_prefix( string_view lhs, string_view rhs ) { auto itl = lhs.begin(); auto itr = rhs.begin(); while (itl != lhs.end() && itr != rhs.end()) { if (*itl != *itr) break; ++itl; ++itr; } return itl-lhs.begin(); }

gives us the longest common prefix lhs and rhs .

Now all we need to do is quickly and efficiently recognize YZ vs ZY .

 bool is_yz_zy( string_view lhs, string_view rhs ) { if (lhs.size() < 2) return false; if (lhs.size() != rhs.size()) return false; for (size_t i = 1; i < lhs.size(); ++i) { if (lhs.front(i)==rhs.back(i)) { if (lhs.trim_front(i) == rhs.trim_back(i)) { return true; } } } return false; }

and line:

 bool is_xyz_xzy( string_view lhs, string_view rhs ) { size_t max_x = common_prefix(lhs, rhs); for (size_t i = 0; i <= max_x; ++i) { if (is_yz_zy( lhs.trim_front(i), rhs.trim_front(i) )) return true; } return false; }

uses memory O (1).

living example

optimization time.

Do an xor check. Now only lengths x possible, such that xor scans are equal in this index, and xor-scans of all rows are equal.

Similarly, to detect yz zy, the left xor scan in the index i should be equal to the correct xor scan at the length of the index-i xor with the correct xor scan for length i for length y.

A stronger hash with even friendlier properties will make pathological cases less obvious, but xor scan should help a lot.

The xor scan is the xor of all previous characters. This can be done in place within the strings, replacing each char with xor of itself and all previous characters. The operation is easily inverted and takes linear time.

Comparing strings requires a little attention, but you can simply view each scan entry with a previous scan to get the original char.

Here is the implementation of the xor-optimized version. Experimentally, it performs symbol operations ~ 5 n^2 , but it will have n ^ 3 cases.

+1

Yakk Jun 22 '15 at 9:33

source share

Not verified, food for thought.

Turn the second line over and attach it to the first, for example. XYZY'Z'X. (or vice versa)
Find the maximum possible X == X '(some discussion may be needed here), where size (X) <= len (XYZY'Z'X') / 2
Iterate from size (X) to 0. At each iteration, remove X from both ends so that the line is YZY'Z ', and check that YZ == Y'Z' divide the line in the middle and return True if checked.
After iteration, return false.

0

cpp initiator Jun 22 '15 at 20:17

source share

4386427 · Accepted Answer · 2015-06-22T20:21:31+0000

Something like this is possible:

bool test(string& s1, string& s2) { if (s1.size() != s2.size()) { return false; } for (size_t i = 1; i < s1.size()-2; i++) { if (s1.substr(0,i) != s2.substr(0,i)) return false; for (size_t j = 1; j < s1.size()-i; j++) { string x1 =s1.substr(i,j); string x2 =s1.substr(i+j); string y1 = s2.substr(i, s1.size()-i - j); string y2 = s2.substr(s1.size()-j); if ((x1==y2) && (x2==y1)) { cout << "X: " << s1.substr(0,i) << endl; cout << "YZ: " << x1 << " " << x2 << endl; cout << "ZY: "<< y1 << " " << y2 << endl << endl; return true; } } } return false; } int main() { string s1 {"abcbd"}; string s2 {"abdbc"}; if (test(s1, s2)) { cout << "OK" << endl; } else { cout << "NOT OK" << endl; } return 0; }

If you have a big problem, you can avoid substrings by comparing char with char. For instance:

  if (s1.substr(0,i) != s2.substr(0,i)) return false;

can be replaced by

  for (size_t z = 0; z < i; z++) { if (s1[z] != s2[z]) return false; }

Similarly, you can escape the lines x1, x2, y1 and y2 by entering two similar ones for loops, where you compare char -by-char.

The code will be harder to read and understand, but it will use less memory.

 bool test(string& s1, string& s2) { size_t N = s1.size(); if (N != s2.size()) { // s1 and s2 must have same length return false; } // i is length of X for (size_t i = 1; i < s1.size()-2; i++) { // Check that X of length i exists in both s1 and s2 for (size_t k = 0; k < i; k++) { if (s1[k] != s2[k]) return false; } // Now check for YZ in s1[i..N-1] and ZY in s2[i..N-1] // // j is length of Y // Nij is length of Z for (size_t j = 1; j < Ni; j++) { // Is s1[i..i+j-1] == s2[Nj..N-1]? bool b1 = true; for (size_t k = 0; k < j; k++) { if (s1[i+k] != s2[N-j+k]) b1=false; } // Is s1[i+j..N-1] == s2[i..Nj-1]? bool b2 = true; for (size_t k = 0; k < (Nij); k++) { if (s1[i+j+k] != s2[i+k]) b2=false; } if (b1 && b2) { // Success! // For debug the value of i and j // can be used to output the substrings // using for-loops cout << "X="; for (size_t k = 0; k < i; k++) { cout << s1[k]; } cout << endl; cout << "Y="; for (size_t k = i; k < (i+j); k++) { cout << s1[k]; } cout << endl; cout << "Z="; for (size_t k = (i+j); k < N; k++) { cout << s1[k]; } cout << endl; return true; } } } return false; } int main() { string s1 {"abcbd"}; string s2 {"abdbc"}; if (test(s1, s2)) { cout << "OK" << endl; } else { cout << "NOT OK" << endl; } return 0; }

Two lines such as XYZ and XZY

More articles: