Storing documents in the form of blocks in a database - any flaws?

Requirements for my document management system:

  • It should be protected from theft by simply copying directories, files, etc.
  • It must be protected from a traditional virus infection (infection of a physical file).
  • Quick receipt
  • The repository should not be visible to users (users), etc.

I decided to store all the documents (and scanned images) in the form of drops in the database, and so far my experience is wonderful, and the search for documents is also dazzlingly quick - it meets all the criteria above, and there are even a couple of additional advantages, such as autostoring documents together with the object to which it relates, easy and quick fixing of contents, removal of all types of user actions around opening and naming documents, etc. etc.

My question is: are there any serious risks or things that I have overlooked with this design and implementation?

EDIT Note: DB is PostgreSQL, does a great job with BLOBS, and scales well. The environment is multi-user.

+43
performance security document blob document-management
Oct 17 '08 at 12:09
source share
8 answers

As your database grows more and more, backing up becomes more difficult. Restoring a backup of a table with more than 100 GB of data is not something that pleases you.

Another thing is that all table management functions become slower and slower as the data set grows.
But this can be overcome if your data table contains only 2 fields: ID and BLOB.

Retrieving data (by primary key) is likely to become a problem only after you press the wall with a backup of the data set.

+31
Oct 18 '08 at 15:03
source share

The main disadvantage that I often hear from using blob is that a file system that is larger than a certain size is much more efficient to store and retrieve large files. It looks like you have already included this in your list of requirements.

There's a nice link (PDF) here that covers the pros and cons of blobs.

+28
Oct 17 '08 at 12:16
source share

In my experience, some problems were:

  • speed and availability of files in the file system.

  • Caching

    . The IMO web server will be better at caching static content. The database will do a good job too, but if the database is also sending all kinds of other requests, don't expect these large documents to remain in the cache for a long time. You essentially have to transfer files twice. Once from a DB to a web server, then a web server for the client.

  • Memory limitations. At my last job, we had 40 MB of PDF in the database and continued to get Java OutOfMemoryErrors in the log file. In the end, we realized that the entire 80 MB PDF was read into the heap more than once, but TWICE thanks to the setting in Hibernate ORM (if the object is changed, it makes a copy for editing in memory). After the PDF file was transferred back to the user, the heap was cleared, but it was a big hit to suck 80 MB from the heap right away to flush the document. Know your code and how memory is used!

Your web server should be able to deal with most security problems, but if the documents are small and the database has not yet been uploaded, then I really do not see a big problem with using them in the database.

+12
Oct 18 '08 at 15:46
source share

I was just starting to learn SQL Server 2008 FILESTREAMing for BLOB and ran into a huge limitation (IMO) - it only works with built-in security. If you are not using Windows Authentication to connect to the DB server, you cannot read / write blobs. Many applications cannot use Windows authentication. Of course, not in heterogeneous environments.

A better BLOB storage solution must exist. What are the best practices?

+4
Nov 18 '09 at 19:57
source share

This article covers most issues. If you are using SQL Server 2008, check out the new FILESTREAM type, as described by Paul Randal here .

+2
Oct. 17 '08 at 12:12
source share

It depends on the type of database. Oracle or SQLServer? Remember one drawback - restoring one document.

+2
Oct 17 '08 at 12:22
source share

Sorry - the answer I suggested was based on SQL Server, so the maintenance part is not suitable. But file I / O is done at the hardware level, and any database adds extra processing steps.

When retrieving a document, the database will impose additional overhead. When the file is on disk, you are only slower or faster than server I / O. You certainly need to manage your meta in the database, but in the end you want the UNC file to point the user to the source and get out of the way.

In terms of service and administration, you will limit yourself to SAN when working with MS SQL Server. Solutions such as Documentum take a different approach with simple disk storage and allow you to implement a data storage solution of your choice.

EDIT

Let me clarify my claim - with SQL Server you have limited options if you exceed the physical storage capacity. In fact, this is one of Sharepoint's biggest drawbacks that you cannot just plug in any kind of cache.

0
Oct 17 '08 at 12:17
source share

From what I experienced, storing content files as blocks, both in SQL Server and in Oracle, it works fine with a small database and with a low number of registered users. The ECM system separates them and uses separate services for streaming content. Depending on the size of the files, server searches may be affected by the simultaneous search for large files. An archive of databases with large sets of files becomes problematic due to the recovery time and the inability to extract documents from the archive.

If these files are corporate records, and this is an authoritative copy of the records, you may have problems managing and saving, especially if you are archiving files. In addition, search and version control can be a huge challenge to move forward.

You might want to explore the ECM system with some kind of API, rather than reinvent the wheel.

0
Nov 12 '15 at 18:50
source share



All Articles