Combining multiple, randomly sorted lists into one list

3 lists are provided that are randomly sorted by the same, but unknown sort order. Is there an algorithm that combines these lists into one, which is then sorted in the same order?

Example:

List1: bse h

List2: bse h

List3: c d e e

Suppose these lists are sorted, but the sort order used is unknown. I want to combine these lists with a result that does not contain duplicates, but keeps the sort order: b c d e e h

As stated above: It is known that these lists are sorted, but it is not known in which order, but the requirement is that the combined list is still sorted in the same (unknown) order.

In the above example, I know that the element "f" is located between "e" and "h", because from List1 I know that

"c" "f" "h",

from List2 I know that

"c" "e" "h"

and from List3, I know that

"e" "f" and "c" <"E"

which is combined with:

"c" "e" "f" "H"

If the sort order cannot be determined by any of the specified lists, it is permissible to simply add an item to the end of the result list. In addition, if the sort order of the sequence of elements cannot be determined, it is permissible to insert them in any order in the list if they are in the right place (for example, if I know that "b" and "c" should be inserted between "a" and "d", but I don't know if it should be abcd or acbd, then both are valid.)

Of course, this is just an example. Real lists are longer (but contain less than 100 elements), contain not single, but several characters, and the sort order is not alphabetical. In addition, I have up to 5 listings.

I need to implement this algorithm in Delphi (and not: this is not homework, but a real life problem), but I take the algorithm in the language if it does not contain too much compiler magic or complex library functions.

Performance is not a problem because it is done once.

+6
source share
5 answers

Your input lists define a partial order of your positions. According to the answer to Math.SE , what you want is topological. The algorithms are described on Wikipedia.

+8
source

Good question. Although topological sorting may be the most recommended method, you will have to first analyze the input to create a list of dependencies. I thought of a more directional approach based on finding elements that are found in several lists to customize the definition of order.

I cannot predict anything by time complexity, but since you do not care about performance and especially considering that the total number of elements does not exceed 500, I believe that this algorithm should work well.

Algorithm

  • All lists are combined into a temporary list, which is then sorted naturally to identify and filter out all duplicate items. These duplicates, called Keys, make up the only definition of the final sort order.
  • The list of keys is sorted in the sort order of the input data, comparing every two elements: if two Keys are found in the same list of input data, then the first one in this list is also before the second in the output list. If two keys do not occur simultaneously in any input list, then they are considered equal.
  • Then the cycle cycles through the keys.
  • In each cycle in each input list, each element between the previous and current keys is added to the output list. The loop ends by adding the current key.

Implementation

type TSorterStringList = class(TStringList) protected Id: Integer; KeyId: Integer; function Current: String; public constructor Create; end; TSorterStringLists = class(TObjectList) private function GetItem(Index: Integer): TSorterStringList; public property Items[Index: Integer]: TSorterStringList read GetItem; default; end; TSorter = class(TObject) private FInput: TSorterStringLists; FKeys: TStringList; procedure GenerateKeys; function IsKey(const S: String): Boolean; public constructor Create; destructor Destroy; override; procedure Sort(Output: TStrings); property Input: TSorterStringLists read FInput; end; { TSorterStringList } constructor TSorterStringList.Create; begin inherited Create; KeyId := -1; end; function TSorterStringList.Current: String; begin Result := Strings[Id]; end; { TSorterStringLists } function TSorterStringLists.GetItem(Index: Integer): TSorterStringList; begin if Index >= Count then Count := Index + 1; if inherited Items[Index] = nil then inherited Items[Index] := TSorterStringList.Create; Result := TSorterStringList(inherited Items[Index]); end; { TSorter } constructor TSorter.Create; begin inherited Create; FInput := TSorterStringLists.Create(True); FKeys := TStringList.Create; end; destructor TSorter.Destroy; begin FKeys.Free; FInput.Free; inherited Destroy; end; threadvar CurrentSorter: TSorter; function CompareKeys(List: TStringList; Index1, Index2: Integer): Integer; var Input: TSorterStringLists; I: Integer; J: Integer; K: Integer; begin Result := 0; Input := CurrentSorter.Input; for I := 0 to Input.Count - 1 do begin J := Input[I].IndexOf(List[Index1]); K := Input[I].IndexOf(List[Index2]); if (J > - 1) and (K > -1) then begin Result := J - K; Break; end; end; end; procedure TSorter.GenerateKeys; var All: TStringList; I: Integer; begin All := TStringList.Create; try All.Sorted := True; All.Duplicates := dupAccept; for I := 0 to FInput.Count - 1 do All.AddStrings(FInput[I]); for I := 0 to All.Count - 2 do if (All[I] = All[I + 1]) then if (FKeys.Count = 0) or (FKeys[FKeys.Count - 1] <> All[I]) then FKeys.Add(All[I]); finally All.Free; end; CurrentSorter := Self; FKeys.CustomSort(CompareKeys); end; function TSorter.IsKey(const S: String): Boolean; begin Result := FKeys.IndexOf(S) > -1; end; procedure TSorter.Sort(Output: TStrings); var KeyId: Integer; I: Integer; List: TSorterStringList; begin if FInput.Count = 0 then Exit; Output.BeginUpdate; try GenerateKeys; for KeyId := -1 to FKeys.Count - 1 do begin for I := 0 to FInput.Count - 1 do begin List := FInput[I]; if List.KeyId <= KeyId then while (List.Id < List.Count) and not IsKey(List.Current) do begin Output.Add(List.Current); Inc(List.Id); end; while (List.Id < List.Count) and IsKey(List.Current) do begin List.KeyId := FKeys.IndexOf(List.Current); Inc(List.Id); end; end; if KeyId + 1 < FKeys.Count then Output.Add(FKeys[KeyId + 1]); end; finally Output.EndUpdate; end; end; 

Usage example

 procedure TForm1.Button1Click(Sender: TObject); var Sorter: TSorter; begin Sorter := TSorter.Create; try Sorter.Input[0].CommaText := '1, 2, 4, 9, 10, 11, 22, 46, 48, 51, 70, 72'; Sorter.Input[1].CommaText := '3, 9, 23, 43, 44, 45, 47, 48, 51, 71, 90, 91'; Sorter.Input[2].CommaText := '0, 3, 4, 7, 8, 11, 23, 50, 51, 52, 55, 70'; Sorter.Input[3].CommaText := '2, 6, 14, 15, 36, 37, 38, 39, 51, 65, 66, 77'; Sorter.Input[4].CommaText := '5, 27, 120, 130'; ListBox1.Items.Assign(Sorter.Input[0]); ListBox2.Items.Assign(Sorter.Input[1]); ListBox3.Items.Assign(Sorter.Input[2]); ListBox4.Items.Assign(Sorter.Input[3]); ListBox5.Items.Assign(Sorter.Input[4]); Sorter.Sort(ListBox6.Items); // Results in: // 1, 0, 5, 27, 120, 130, 3, 2, 6, 14, 15, 36, 37, 38, 39, 4, 7, 8, 9, 10, // 11, 22, 46, 23, 43, 44, 45, 47, 50, 48, 51, 71, 90, 91, 52, 55, 65, 66, // 77, 70, 72 finally Sorter.Free; end; end; 
+4
source

So you have

 List1: abcfh List2: bceh List3: cdef 

Scroll through the list and enter the graph. So, after the first list, you have:

 A -> B -> C -> F -> H 

Then you start on list 2. B is already there. Then you see that B connects to C, which you already know. Then you know that C connects to E, which is not there yet, so now you have:

 A -> B -> C -> F -> H | E 

Then you know that E connects to H like this:

 A -> B -> C -> F -> H | ^ E --------| 

Then go to list 3. You know that C is there, and it points to D:

  D ^ | A -> B -> C -> F -> H | ^ E --------| 

Then you know that D points to E. Since C-> E has the same ancestor as C → D → E, you can break the link from C-> E, and now you have:

  D -> E ---| ^ | | | A -> B -> C -> F -> H 

And finally, you know that E comes to F. Since you knew that E led directly to H before, and now there is another way to H from E (E-> F-> H), you know that F should be between E and H, and you can remove the link from E → H. So you now have:

  D -> E ^ | | | A -> B -> C -> F -> H 

What you know can be reduced to

 A -> B -> C -> D -> E -> F -> H 

Now suppose you get something like:

  E -> T | | A -> Z | ^ R -> W 

You would not have enough information to find out if E / T goes to R / W, but you know that both of them go to Z and after A. So you would just take one of the paths at random, and then etc. so that you can finish either AETRWZ or ARWETZ. You could even take one randomly from each path, which would guarantee that these legs would still be sorted, and maybe you're lucky that your merge was also sorted. So you can have AREWTZ, which still has E / T relatively sorted and R / W is still relatively sorted, or if you have to start with an E-foot, you're in luck and have AERTWZ

+2
source

Graph theory seems like a good first intuition.

You can build a directed graph where your list items are vertices, and you insert a directed front from each list item into its successor. Then a node A is smaller than another node B if and only if B can be reached from A by moving the graph.

The cycle on the graph (A less than B and B less than A) indicates either a distortion of the input data, or the presence of two equivalent elements with different names.

In the absence of cycles, merging under this smaller ratio should be simple: repeatedly remove nodes that cannot be reached by any other node from the graph, and add them to your list of results.

+1
source

Can a hash table be used? Here is an algorithm for merging two such lists.

 T = new HashMap for(i = 1 to length B) T.put(B[i],i) N = array[length A] for(i = 1 to length A){ if(T containsKey A[i]) N[i] = T.get(A[i]) else N[i] = -1 } R = array[length A + length B] j = 1 k = 1 for(i = 1 to length A){ if(N[i] = -1) R[j++] = N[i] else{ while(k <= N[i]) R[j++] = B[k++] } } while(k <= length B) R[j++] = B[k++] return R[1 ... j-1] 

Elements A [i], where N [i]> 0 correspond to elements B, the rest will be placed in a valid order. There may be a mistake, but this is a general idea.

To merge three arrays, you can combine the first two, and then combine the third into a combined array. When editing this last sentence is false, as @RobKennedy points out. You can probably change the algorithm for processing the three lists, but this is not so simple, and so you may need to use topological sorting.

0
source

Source: https://habr.com/ru/post/955678/


All Articles