You do not use Lucene to obtain similarities between texts. There are several measures available depending on the length of the text, the type of lines, etc., and you will need to experiment, which gives you the best results.
A pretty good and complete collection of algorithms is available in SimMetrics - this is the F / OSS library that offers an extensive collection of similarity algorithms and their associated cost functions.
source share