For the Lucene marker to work, you need to add two fields to your document that you are indexing. One field must be with Vector Vector enabled and another field without using Term Vector. For simplicity, I will show you a piece of code:
FieldType type = new FieldType(); type.setIndexed(true); type.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); type.setStored(true); type.setStoreTermVectors(true); type.setTokenized(true); type.setStoreTermVectorOffsets(true); Field field = new Field("content", "This is fragment. Highlters", type); doc.add(field); //this field has term Vector enabled. //without term vector enabled. doc.add(new StringField("ncontent","This is fragment. Highlters", Field.Store.YES));
After enabling them, add this document to your index. Now, to use the lucene marker, use the method below (it uses Lucene 4.2, I have not tested it with Lucene 4.3.1):
public void highLighter() throws IOException, ParseException, InvalidTokenOffsetsException { IndexReader reader = DirectoryReader.open(FSDirectory.open(new File("INDEXDIRECTORY"))); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_42); IndexSearcher searcher = new IndexSearcher(reader); QueryParser parser = new QueryParser(Version.LUCENE_42, "content", analyzer); Query query = parser.parse("Highlters"); //your search keyword TopDocs hits = searcher.search(query, reader.maxDoc()); System.out.println(hits.totalHits); SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter(); Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query)); for (int i = 0; i < reader.maxDoc(); i++) { int id = hits.scoreDocs[i].doc; Document doc = searcher.doc(id); String text = doc.get("ncontent"); TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "ncontent", analyzer); TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 4); for (int j = 0; j < frag.length; j++) { if ((frag[j] != null) && (frag[j].getScore() > 0)) { System.out.println((frag[j].toString())); } } //Term vector text = doc.get("content"); tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "content", analyzer); frag = highlighter.getBestTextFragments(tokenStream, text, false, 10); for (int j = 0; j < frag.length; j++) { if ((frag[j] != null) && (frag[j].getScore() > 0)) { System.out.println((frag[j].toString())); } } System.out.println("-------------"); } }
source share