The text contains keywords and the start / end position of their occurrences. Keywords may partially overlap, for example. "something" β "something" / "some" / "thing":
keywords_occurences = { "key_1": [(11, 59)], "key_2": [(24, 46), (301, 323), (1208, 1230), (1673, 1695)], "key_3": [(24, 56), (1208, 1240)], ... }
I need to choose one position for each keyword so that they do not overlap, to solve this case:
key_1: 11-59 key_2: 301-323 (or 1673-1695, it does not matter) key_3: 1208-1240
If not, select the maximum number of unique non-overlapping keys.
It looks like a "hit" type problem, but I cannot find a description of the algorithm.
source share