I am working on a system in which you will need to store many documents (PDF files, Word files, etc.). I use Solr / Lucene to search for revelant information extracted from these documents, but I also need a place to store the source files so that they can be opened / downloaded by users.
I was thinking of several possibilities:
- file system - perhaps not such a good idea to store 1m documents.
- sql database - but I don’t need most of its relational functions, since I only need to store the binary document and its identifier so that this might not be the fastest solution.
- no-sql database - do not have experience with them, so I'm not sure that they are good either, there are also a lot of them, so I don’t know which one to choose
The storage I'm looking for should be:
- quickly
- scallable
- open-source (not important, but nice to have)
Can you recommend what is the best way to store these files in your opinion?
source
share