Lucene: how to index file names

I am a newbie to lucene and am trying to get some basics now.

I have three files:

  • apache_empty.txt (empty file),
  • apache.txt (contains many tokens of 'apache' ),
  • other.txt (contains only one token - 'apache' )

When I try to search for 'apache' , I only get apache.txt and other.txt as a result, but I want to even get the apache_empty.txt file that has the search word in its name ...

And how do I add documents to the index:

 protected Document getDocument(File f) throws Exception { Document doc = new Document(); Field contents = new Field("contents", new FileReader(f)); Field parent = new Field("parent", f.getParent(), Field.Store.YES, Field.Index.NOT_ANALYZED); Field filename = new Field("filename", f.getName(), Field.Store.YES, Field.Index.ANALYZED); Field fullpath = new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED); filename.setBoost(2.0F); doc.add(contents); doc.add(parent); doc.add(filename); doc.add(fullpath); return doc; } 

How to give lucene index also file names?

+4
source share
1 answer

To enable wildcards, you must find apache* , which will also match your apache_empty file apache_empty for the full syntax, see also Apars Lucene Query Parser .

An alternative would be to include underscores as word separators in your analyzer.

+6
source

Source: https://habr.com/ru/post/1436351/


All Articles