Recommendations for the mixed use of RDBMS and files in the file system

In one of the tables in the scheme I'm working on, I need to have a couple of thousand “data sheets”, which are mostly PDF documents, and sometimes graphic image files such as PNG, JPG, etc. The diagram simulates an electronics distributor portal where new products are often added to their portfolio.

These documents (data sheets) are added during the introduction of a new product, but they need updates from time to time (due to a new version of the document, not the product itself), so I'd think that updating is an asynchronous procedure.

Given this, should I only store the file name / path to the data sheets (and similar documents) in my table, with the actual file located in the file system, or should I use the blob approach. I am almost sure that this should be the previous approach, but I still wanted to accept the advice of the community and see if there were any pitfalls for observation.

+1
source share
1 answer

For completeness, let me mention that some databases allow you to have a “hybrid” of these two approaches, for example Oracle BFILE or MS SQL Server FILESTREAM .

There is also an interesting discussion at Ask Tom about storing files in Oracle BLOB (in a nutshell: "BLOB files are better than files").


By the way, you don’t have to choose one by one ... If you can afford the overhead of storage and you work in a reading environment mainly, you can save the "master" data in the BLOB for integrity, but "cache" the same data in a file for read-only quick access. Some considerations:

  • You need to make sure the file is updated / deleted if the BLOB is updated / deleted.
  • Consider creating / updating a file on request.
  • Consider evicting old files from the cache, even if the corresponding BLOB files still exist.
  • Consider using multiple "caches" (for example, if you have a medium level and apply to several physical machines, each machine may have its own file cache).
  • And finally, you need to make sure that all this works in a parallel environment.

So this is not the easiest approach, but depending on your needs, it can be a good compromise between honesty, efficiency, and implementation.

+1
source

Source: https://habr.com/ru/post/1388490/


All Articles