Perform two identifier sequences

Given two identifier sequences, how to find the smallest sequence of operations that converts the first identifier sequence to the second.

The operation may be:

  • Paste the identifier at the given position
  • Remove id from given position
  • Move ID from position to another

Note: identifiers are unique and cannot be displayed twice in sequence

Example:

Sequence1 [1, 2, 3, 4, 5] Sequence2 [5, 1, 2, 9, 3, 7] Result (index are 0 based) : - Remove at 3 - Move from 3 to 0 - Insert '9' at 3 - Insert '7' at 5 

Thanks!

+4
source share
2 answers

Let's start by looking for the longest common subsequence . This identifies the elements that will not move:

 [(1), (2), (3), 4, 5] 

LCS elements are enclosed in parentheses.

Go through both sequences from index 0, writing down the operations necessary for sequence identity. If the current element of the first sequence is not part of LCS, delete it and mark the place where it was before, if you need to insert it later. If the current item is part of LCS, insert the item from the second sequence in front of it. This can be either a simple insert or a move. If the item you are inserting is in the original list, move it; otherwise insert it.

The following is an example of your example. Curly brackets show the current item.

 [{(1)}, (2), (3), 4, 5] vs [{5}, 1, 2, 9, 3, 7] 

1 is a member of LCS, so we must insert 5 . 5 is in the original sequence, so we record the move: MOVE 4 to 0

 [5, {(1)}, (2), (3), 4] vs [5, {1}, 2, 9, 3, 7] 

The elements are the same, so go to the following:

 [5, (1), {(2)}, (3), 4] vs [5, 1, {2}, 9, 3, 7] 

The numbers are the same again - go to the following:

 [5, (1), (2), {(3)}, 4] vs [5, 1, 2, {9}, 3, 7] 

3 is a member of LCS, so we must insert 9 . The source element does not have 9 , so this is a simple insert: INSERT 9 at 3

 [5, (1), (2), 9, {(3)}, 4] vs [5, 1, 2, 9, {3}, 7] 

And yet the numbers are the same - go to the following:

 [5, (1), (2), 9, (3), {4}] vs [5, 1, 2, 9, 3, {7}] 

'4' is not a member of LCS, so it is deleted: DEL at 5

 [5, (1), (2), 9, (3)] vs [5, 1, 2, 9, 3, {7}] 

We have reached the end of the first sequence - we simply add the remaining elements of the second sequence to the first, paying attention to the list of previous deletions. For example, if 7 had been deleted earlier, we would have turned this removal into a move at this time. But since there was no 7 in the original list, we record our final operation: INS 7 at 5 .

+1
source

This metric is called the Levenshtein distance, or rather the Damerau-Levenshtein distance .

There are implementations for almost all possible programming languages ​​that you can use to solve the problem you described.

+1
source

Source: https://habr.com/ru/post/1395237/


All Articles