Lucene - Creating an index using FSDirectory

first publication; long time reader. I apologize for the head if this has already been asked here (I am also new to lucene!). I did a lot of research and could not find a good explanation / example for my question.

First of all, I used IKVM.NET to convert lucene 4.9 java to be included in my .net application. I decided to do this so that I can use the latest version of lucene. No questions.

I am trying to create a basic example to start learning lucene and applying it to my application. I have done countless google searches and read many articles, apache website, etc. My code follows mainly here: http://www.lucenetutorial.com/lucene-in-5-minutes.html p>

My question is: I do not believe that I want to use RAMDirectory. right? Since I will index the database and allow users to search it through the website. I decided to use FSDirectory because I did not think that it should be stored in memory.

When IndexWriter is created, it creates new files every time (.cfe, .cfs, .si, segment.gen, write.lock, etc.) It seems to me that you will create these files once, and then use them while the index will not be rebuilt?

So how do I create an IndexWriter without recreating the index files?

The code:

StandardAnalyzer analyzer; Directory directory; protected void Page_Load(object sender, EventArgs e) { var version = org.apache.lucene.util.Version.LUCENE_CURRENT; analyzer = new StandardAnalyzer(version); if(directory == null){ directory= FSDirectory.open(new java.io.File(HttpContext.Current.Request.PhysicalApplicationPath + "/indexes")); } IndexWriterConfig config = new IndexWriterConfig(version, analyzer); //i found setting the open mode will overwrite the files but still creates new each time config.setOpenMode(IndexWriterConfig.OpenMode.CREATE); IndexWriter w = new IndexWriter(directory, config); addDoc(w, "test", "1234"); addDoc(w, "test1", "1234"); addDoc(w, "test2", "1234"); addDoc(w, "test3", "1234"); w.close(); } private static void addDoc(IndexWriter w, String _keyword, String _keywordid) { Document doc = new Document(); doc.add(new TextField("Keyword", _keyword, Field.Store.YES)); doc.add(new StringField("KeywordID", _keywordid, Field.Store.YES)); w.addDocument(doc); } protected void searchButton_Click(object sender, EventArgs e) { String querystr = ""; String results=""; querystr = searchTextBox.Text.ToString(); Query q = new QueryParser(org.apache.lucene.util.Version.LUCENE_4_0, "Keyword", analyzer).parse(querystr); int hitsPerPage = 100; DirectoryReader reader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; if (hits.Length == 0) { label.Text = "Nothing was found."; } else { for (int i = 0; i < hits.Length; ++i) { int docID = hits[i].doc; Document d = searcher.doc(docID); results += "<br />" + (i + 1) + ". " + d.get("KeywordID") + "\t" + d.get("Keyword") + " Hit Score: " + hits[i].score.ToString() + "<br />"; } label.Text = results; reader.close(); } } 
+5
source share
2 answers

Yes, RAMDirectory great for quick on-the-fly tests and tutorials, but during production, you usually want to keep your index on the file system using FSDirectory .

The reason it overwrites the index each time the record is opened is because you set OpenMode to IndexWriterConfig.OpenMode.CREATE . CREATE means that you want to delete any existing index in this place and start from scratch. You probably want IndexWriterConfig.OpenMode.CREATE_OR_APPEND , which will open an existing index if it is found.


One note:

You should not use LUCENE_CURRENT (deprecated), use the real version instead. You also use LUCENE_4_0 in your QueryParser. None of them are likely to cause any serious problems, but in any case can be consistent.

+3
source

When we use RAMDirectory , it loads the whole index or large parts of it into "memory", which is virtual memory. Since physical memory is limited, the operating system may, of course, decide to swap our large RAMDirectory . Therefore, RAMDirectory not recommended to optimize index load times.

On the other hand, if we do not use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory , we have to pay a different price: our code must make many system calls to the O / S kernel to copy data blocks between the disk cache or file system and our buffers located in the Java heap. This must be done for each search query again and again.

To solve all of the above problems, MMapDirectory uses virtual memory and a kernel function called "mmap" to access files on disk.

Also check out this link .

0
source

Source: https://habr.com/ru/post/1200277/


All Articles