A difficult algorithm for sorting characters in an array while maintaining relationships in order

Problem

I have several groups that define character relationships ... for example:

[ABC] [ADE] [XYZ] 

What these groups mean is that (for the first group) the characters A, B, and C are connected to each other. (Second group) Symbols A, D, E are connected with each other .. etc.

Given all this data, I will need to put all the unique characters in a one-dimensional array in which characters that are somehow connected to each other will be located closer to each other. In the above example, the result should look something like this:

 [BCADEXYZ] 

or

 [XYZDEABC] 

In this resulting array, since the symbol A has several relations (namely, with B and C in one group and with D and E in another), it is now located between these symbols, somewhat preserving the connection.

Please note that order is not important. As a result, XYZ can be placed first or last, as these characters are not associated with any other characters. However, the proximity of related characters is important.

I need help in

I need help defining an algorithm that takes a group of character relationships and then outputs a 1-dimensional array using the logic above. I draw my hair out how to do this, since with real data the number of characters in a relationship group can vary, the number of relationship groups is also unlimited, and a symbol can have a relationship with any other symbol.

Further example

To illustrate the trick of my dilemma again, IF you add another relationship group to the example above. Let them talk:

 [CZ] 

The result should now look something like this:

 [XYZCBADE] 

Please note that the characters Z and C are now closer to each other, as their relationship is reinforced by additional data. All previous relationships are still maintained as a result.

+6
source share
4 answers

The first thing you need to do is pinpoint the desired result.

You do this by determining how good the result is so that you know which one is the best. Mathematically, you do this using a cost function. In this case, usually choose the sum of the distances between the connected elements, the sum of the squares of these distances or the maximum distance. Then a list with a small value of the cost function is the desired result.

It is unclear whether in this case it is possible to calculate the best solution using any special method (it is possible if you choose the maximum distance or the sum of the distances as a function of cost).

In any case, it should be easy to find a good approximation using standard methods.

A simple greedy approach would be to insert each element into a position where the resulting cost function for the entire list is minimal.

Once you have a good starting point, you can try to improve it further by changing the list to better solutions, for example, by replacing elements or rotating parts of the list ( local search , climbing a hill , simulated annealing , etc. ).

+5
source

I think, because with large amounts of data and the absence of additional criteria, it is very difficult to do something that finds the best option. Do you think that you are making a greedy algorithm (step by step design your solution in such a way as to give you something close to an ideal solution)? Here is my idea:

Sort the set of related characters by size and start with the largest. Keep them all together, because without any other criteria, we could also say that their proximity is the most important, since it is the largest set. Consider each character in the first โ€œendpointโ€ set, the endpoint of which is a character that you can rearrange and place at any end of your array without affecting your proximity rule (everything in the first set is the endpoint initially, because they can be rearranged any way). Then go to your list, and as soon as one set has one or more characters along with the first set, connect them accordingly. The characters you linked to each other are no longer considered endpoints, but everything else remains. Even if a larger set has only one common character, Iโ€™m going to guess what is better than smaller sets with a more common character, because in this way at least the large set remains together, and not split if it is placed in an array later than smaller ones.

I would continue by updating the list of endpoints that existed so that you can continue to make matches as you go through your set. I would track if I stopped making matches, in which case I just go to the top of the list and just stick to the next big, unbeatable set (it doesn't matter if there are any more matches so go with the most valuable / big association). Drop the old endpoints as they have no matches, and then all the characters in the set you just clicked on are the new endpoints.

This may not have enough time to complete, I'm not sure. But hopefully this gives you some ideas.

Edit: Obviously, within the algorithm, duplicates are duplicated (trivial).

+2
source

The problem described is essentially the problem of drawing a graph in one dimension.

Using relationships, build a graph. Treat unique characters as vertices of the graph. Place a line between any two vertices that match; it would be more difficult to build a weight based on the number of relationships in which a pair of characters will occur.

Graphing algorithms place well-connected vertices closer to each other, which is equivalent to placing the corresponding symbols next to each other. Since only ordering is required, characters can only be ranked based on their position in the drawing.

There are many graphing algorithms. In this case, I would go with Fiedler ordering , which orders the vertices using a specific eigenvector (Fiedler vector) Laplacian graph . Fiedler's order is simple, efficient, and optimal in a well-defined mathematical sense.

+2
source

It looks like you want to do a topological sort: http://en.wikipedia.org/wiki/Topological_sorting

As for the initial order, it looks like you are trying to provide some kind of stability condition, but it is not entirely clear to me what this should be from your question. Could you try to be more precise in your description?

0
source

Source: https://habr.com/ru/post/892546/


All Articles