These things are not trivial, and you should provide more examples.
As already mentioned, the distance of Daniel Levenshtein is the way to go, but for your example you can pre-process the lines if you know that you can safely drop certain words - for example, from your example it is clear that the word is generation. can be dropped.
Distance levenshtein will consider any four-word word instead of gen. like the gene. which may not be what you want.
In addition, if your dataset comes from different data sources, you might consider creating a synonym dictionary and exploring existing standard taxonomies for your domain. Perhaps for example this ?
source share