Suffix Array Implementation Errors

I encoded an implementation of Suffix Array and found a problem in my implementation. Specifically, I deduced the first few ranks of the suffix array of RA[0..7]this string (length = 10 ^ 5) and had the following output:

80994
84360
87854
91517
95320
99277
83068

But the correct one should have been (everything shifted by 23):

81017
84383
87877
91540
95343
99300
83091

I know two ways to fix this, but I don’t know why it worked.

The first way was to add S[N++] = '$';a function to the beginning buildSA()(then the output was 1 less than the correct one, but that doesn’t matter)

I also found another solution, decreasing the MAX_N constant to 1e5 + 10!

This is so much magic for me, and I really need to know why this error occurred, because I do not want this error again.

#include <cstdio>
#include <cstring>
#include <algorithm>
using std::max;
const int MAX_N = 2e5 + 10;
int SA[MAX_N];   // The ith element is the index of the suffix
int RA[MAX_N];   // The rank of the suffix at i
int tmp[MAX_N];  // A temporary array
int B[MAX_N];    // An array for the buckets
int N;
char S[MAX_N];

void bucketSort(int k){
  int i, m = max(256, N);
  for(i = 0; i < m; i++)
    B[i] = 0;
  for(i = 0; i < N; i++)
    B[i + k < N ? RA[i + k] : 0] ++;
  for(i = 1; i < m; i++)
    B[i] += B[i - 1];
  for(i = N - 1; i >= 0; i--)
    tmp[--B[SA[i] + k < N ? RA[SA[i] + k] : 0]] = SA[i];
  for(i = 0; i < N; i++)
    SA[i] = tmp[i];
}

void buildSA(){
  for(int i = 0; i < N; i++){
    SA[i] = i;
    RA[i] = S[i];
  }
  for(int k = 1; k < N; k <<= 1){
    bucketSort(k);
    bucketSort(0);
    int norder = 0;
    tmp[SA[0]] = 0;
    for(int i = 1; i < N; i++){
      if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
      {} else norder++;
      tmp[SA[i]] = norder;
    }
    for(int i = 0; i < N; i++)
      RA[i] = tmp[i];
    if(norder == N)
      break;
  }
}

void printSA(){
  for(int i = 0; i < N; i++){
    printf("%d: %s\n", SA[i], S + SA[i]);
  }
}

int main(){
  scanf("%s", S);
  N = strlen(S);
  buildSA();
  for(int i = 0; i < 7; i++){
    printf("%d\n",RA[i]);
  }
  return 0;
}
+4
2

:
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
SA[i] + k >= N ( SA[i - 1] + k).
(SA[i] + k) % N.

+1

, . .

"" :

if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
      {} else norder++;

, ( ...), :

abab

0: abab
2: ab
3: b
1: bab

.

k = 2, ab abab, , , k = 2 . ab # 2, k = 2, .

, (, '$') . ( ), SA [i] + k >= N, .

0

Source: https://habr.com/ru/post/1545095/


All Articles