Find the number of duplicate subarrays in an array

There is an array indexed from 0-n (i.e. size = n) containing elements from 0-m , where m < n (suppose that m is 100 or 1000 times smaller than n, i.e. m is much less than n), so many elements or an auxiliary array must be repeated, and we need to find the number of such repeated auxiliary arrays of size 1 or more than 1. For example, consider the array 1 2 4 1 2 4 1 5 6 3 5 , here
1 repeated 2 times
2 repeated 1 time
4 repeated 1 time
5 repeated 1 time
1 2 repeated 1 time
2 4 is repeated 1 time
4 1 repeated 1 time
1 2 4 repeated 1 time
2 4 1 repeated 1 time
1 2 4 1 repeated 1 time
Thus, the final answer is the sum of these, i.e. eleven
Can anyone suggest some data structure or efficient algorithm to solve this problem. I thought about hashing m and noted the value of the index where it occurs, but did not formalize it.
Assume that n,m < 10^5

+6
source share
3 answers
  • Create a hash that has integers in the form of keys and extensible lists (linked lists, for example) of integers as values.

  • Read the input list. Consider each input element that is scanned as a key; add the index of the next item to the appropriate list of values.

  • Now your repeated 1-element counter - n - | keys |

  • Then go through your keys. For each key, treat the indexed items in the source list as a new input list. Create a new hash and continue, as in step 1. Note that when we save indexes, we continue to use them in the original list.

Example: 1 2 4 1 2 4 1 5 6 3 5 (suppose we are zero-indexed). Here n = 11.

  • 1 β†’ [1, 4, 7]
  • 2 β†’ [2, 5]
  • 3 β†’ [10]
  • 4 β†’ [3, 6]
  • 5 β†’ [8, nil]
  • 6 β†’ [9]

The number of repeated 1-elts is 11-6 = 5.

Now we go through the keys. Let's start with 1. The list of indices is [1, 4, 7], which corresponds to the elements [2, 2, 5].

  • 2 β†’ [2, 5]
  • 5 β†’ [8]

This displays a 3 - 2 = 1 duplicate 2-element list.

etc...

Runtime: linearly at input + output

+3
source

I used the suffix tree approach from Schenna's book on algorithm development . Suffix trees - a special kind of trie tree , where you insert each suffix of this line in the trie tree. Three trees is a way of storing several words in one tree: the root is an empty string, and each edge adds a character. I used a very simple implementation as a proof of concept below.

 #include <iostream> #include <string> using std::string; using std::cout; class Node { Node* m_descendants[10]; int m_count; public: Node() { for(int i=0; i<10; ++i) { m_descendants[i] = nullptr; } m_count = 0; } void Add(const char* str) { // Increment number of times we've reached this node m_count++; const int firstDigit = str[0] - '0'; if (!(0 <= firstDigit && firstDigit <= 9)) return; // recursion base case // Access descendant node correponding to current digit Node*& descendant = m_descendants[firstDigit]; if (descendant == 0) { // create descendant if necessary descendant = new Node; } // Recurse on rest of string descendant->Add(str +1); } void Print(const string& str, int countAdj=0) const // debug function { for(int nextDigit=0; nextDigit<10; ++nextDigit) { const Node* node = m_descendants[nextDigit]; if (node) { const int adjustedCount = node->Count() - countAdj; if (adjustedCount > 0) { char c = '0' + nextDigit; string strWithC = str; strWithC += c; cout << strWithC << ": " << adjustedCount << "\n"; node->Print(strWithC, countAdj); } } } } int SumRepeated() const { int sumRepeated = 0; for(int nextDigit=0; nextDigit<10; ++nextDigit) { const Node* node = m_descendants[nextDigit]; if (node) { const int repeatCount = node->Count() -1; if (repeatCount > 0) { sumRepeated += repeatCount; sumRepeated += node->SumRepeated(); } } } return sumRepeated; } int Count() const { return m_count; } }; int main(int argc, const char* argv[]) { // Construct Node* const root = new Node; for(const char* str = "12412415635"; *str; ++str) { root->Add(str); } // Print root->Print(string(), 1); cout << "Sum repeated is " << root->SumRepeated() << "\n"; return 0; // (nodes are leaked) } 

Output

 1: 2 12: 1 124: 1 1241: 1 2: 1 24: 1 241: 1 4: 1 41: 1 5: 1 Sum repeated is 11 

Note that there is one additional repeating substring, i.e. 1241.

As I said, this is a proof of concept, therefore, for example, you would like to use a dictionary instead of an array to store descendants with large m . In addition, this implementation is not optimal even with respect to the general algorithm: it is O (n ^ 2) in time and space. You can optimize by folding paths without branching to get O (n) space, and use smart building algorithms to get O (n) time. Also, as indicated in one of the other answers, you don't need to process substrings that are up to half the length of the original.

+1
source

There is something in JavaScript: (the output is more or less immediate, and JavaScript is one of the slowest, I think):

 <script> function f (s, m) { var i = 0, n = s.length, c = 0, H = {}, Hi = {} while (i < n-1) { for (var j=i+1; j<=Math.min (i + 1 + m,n - 1); j++) { if (s[i] == s[j]) { var i_= i, j_= j, tmp = '' while (s[i_] == s[j_] && i_ < j) { tmp += String (s[i_]) i_++ j_++ } var k = tmp.length - 1 while (k >= 0) { if (!H[tmp]) { H[tmp] = 1 Hi[tmp] = i c++ } else if (i == Hi[tmp]) { H[tmp]++ c++ } tmp = tmp.substr(0,k--) } } } i++ } var t1 = '' for (var i in H) t1 += i + ", " return [t1,c] } var a1 = [1,2,4,1,2,4,1,5,6,3,5], a2 = [] for (var i=0;i<100000;i++) a2.push(Math.floor(Math.random()*10)) console.log(a1 +"\n"+ f(a1,10)) console.log(f(a2,10)) </script> 

Output:

 1,2,4,1,2,4,1,5,6,3,5 1, 2, 4, 5, 12, 24, 41, 124, 241, ,10 ["0...6358, 3582, 038, 4304, 2068, 095, 9252, 37866, 3786, 7866, 7893, 8701, 0669, ", 771] 
0
source

Source: https://habr.com/ru/post/949401/


All Articles