Suppose I have this table:
ID | description
-------------------
5 | The bird flew over the tree.
2 | The birds, flew over the tree
These two lines have “similar” content. How to remove # 2?
- What algorithm should be used for "similar" text?
- How do I do this with Python?
Thanks!
source
share