How to efficiently store hundrets of thousands of documents?

Question

How to efficiently store hundrets of thousands of documents?

I am working on a system in which you will need to store many documents (PDF files, Word files, etc.). I use Solr / Lucene to search for revelant information extracted from these documents, but I also need a place to store the source files so that they can be opened / downloaded by users.

I was thinking of several possibilities:

file system - perhaps not such a good idea to store 1m documents.
sql database - but I don’t need most of its relational functions, since I only need to store the binary document and its identifier so that this might not be the fastest solution.
no-sql database - do not have experience with them, so I'm not sure that they are good either, there are also a lot of them, so I don’t know which one to choose

The storage I'm looking for should be:

quickly
scallable
open-source (not important, but nice to have)

Can you recommend what is the best way to store these files in your opinion?

+3

database

Rayell Oct 19 '10 at 10:17

source share

4 answers

You can follow Facebook as it stores a lot of files (15 billion photos):

They initially started with NFS shared by commercial storage devices.
Then they moved to their server-side HTTP file called Haystack

facebook, http://www.facebook.com/note.php?note_id=76191543919

NFS. , NFS . ( , , b- .) , NFS-, (NetApp), , , .

, - . Ascii . , , 1234567891 /0012/3456/7891.

, .

+1

Piotr Czapla 25 . '10 14:27

-...

, ( ), , .

Sqlite, .

0

Mark Redman 19 . '10 10:24

: , . , ( LukeH)

0

Chathuranga Chandrasekara 19 . '10 10:24

Lukeh · Accepted Answer · 2010-10-19T10:21:26+0000

The file system - as the name implies - is designed and optimized to store a large number of files in an efficient and scalable way.

How to efficiently store hundrets of thousands of documents?

More articles: