I do not know any tools that will help with this, but I can offer an algorithm that will give you pretty decent results. * Edit: OP requested an example code for the index. I use Trove TIntObjectHashMap to save this information, but you can do the same with Java HashMap .
Step 1: find the text for each search word and make an offset index in the text that appears each of them.
TIntObjectHashMap <String> matchIndex = new TIntObjectHashMap <String> ();
// for each word or other string to highlight
// find each instance of each word in the string
// this is pseudocode -v
for (each instance of String searchString appearing at index int x)
matchIndex.put (x, searchString);
Step 2: Go through each combination of index pairs in step 1 and write down the number of characters between the indices and the number of hits.
// class to hold a match
private class Match implements Comparable {
private int x1, x2;
private int hitCount;
public Match (int x1, int x2, int hitCount); // does the obvious
private double sortValue () {
return (double) hitCount / Math.abs (x1, x2);
}
@Override
public int compareTo (Match m) {
double diff = this.sortValue () - m.sortValue ();
if (diff == 0.0) return 0;
return (diff <0.0)? -eleven;
}
}
// go through every combination of keys (string offsets) and record them
// the treeset will automatically sort the results
TreeSet <Match> matches = new TreeSet <Match> ();
int [] keys = matchIndex.keys ();
for (int x1 = 0; x1 <keys.length; x1 ++)
for (int x2 = x1 + 1; x2 <keys.length; x2 ++)
matches.put (new Match (keys [x1],
keys [x2] + matchIndex.get (keys [x2]). length (),
1 + x2 - x1));
Step 3: Take the list generated in step 2 and sort them by the number of hits per character length.
// nicely done by the TreeSet
Step 4: Start at the top of the list in step 3 and mark each item as enabled. Remember to combine overlapping results into one larger result. Stop when the next item presses the total string length by 255 (or so) characters.
Step 5: display each of the selected items in step 4 to “in between”. Be sure to indicate what markup is needed to highlight the search words themselves in each element.
source share