Question
I use the result myisam_ftdump to create a search suggestion table. This process went smoothly, but many words appear in the index several times. Clearly, I could have just SELECT distinct term FROM suggestions ORDER BY weight, but does that not punish words for appearing more than once?
If so, is there a concise formula for concatenating strings?
If this is not the case, which lines should I keep (for example, with the highest weight, with the lowest weight)?
Data examples
+-----+------------+----------+
| id | word | weight |
+-----+------------+----------+
| 670 | young | 0.416022 |
| 669 | york | 0.54944 |
| 668 | years | 0.281683 |
| 667 | years | 0.416022 |
| 666 | wrote | 0.416022 |
| 665 | written | 0.35841 |
| 664 | writing | 0.29518 |
| 663 | wright | 0.281683 |
| 662 | witness | 0.281683 |
| 661 | wiesenthal | 0.452452 |
| 660 | white | 0.35841 |
| 659 | white | 0.281683 |
| 658 | wgbh | 0.369332 |
| 657 | weighs | 0.35841 |
+-----+------------+----------+
See especially the “whites” and “years”.
source
share