How to efficiently store hundrets of thousands of documents?

I am working on a system in which you will need to store many documents (PDF files, Word files, etc.). I use Solr / Lucene to search for revelant information extracted from these documents, but I also need a place to store the source files so that they can be opened / downloaded by users.

I was thinking of several possibilities:

  • file system - perhaps not such a good idea to store 1m documents.
  • sql database - but I don’t need most of its relational functions, since I only need to store the binary document and its identifier so that this might not be the fastest solution.
  • no-sql database - do not have experience with them, so I'm not sure that they are good either, there are also a lot of them, so I don’t know which one to choose

The storage I'm looking for should be:

  • quickly
  • scallable
  • open-source (not important, but nice to have)

Can you recommend what is the best way to store these files in your opinion?

+3
source share
4 answers

The file system - as the name implies - is designed and optimized to store a large number of files in an efficient and scalable way.

+5
source

You can follow Facebook as it stores a lot of files (15 billion photos):

  • They initially started with NFS shared by commercial storage devices.
  • Then they moved to their server-side HTTP file called Haystack

facebook, http://www.facebook.com/note.php?note_id=76191543919

NFS. , NFS . ( , , b- .) , NFS-, (NetApp), , , .

, - . Ascii . , , 1234567891 /0012/3456/7891.

, .

+1

-...

, ( ), , .

Sqlite, .

0

: , . , ( LukeH)

0

Source: https://habr.com/ru/post/1770287/


All Articles