Text field sql vs flat file vs nosql document store

I plan to have an SQL fact table that has a text field that I don't expect to index (I will only read the data and very rarely update it). I think this table can become quite large, primarily because of this text box. The rest of the data in my database makes sense to be relational, but I believe that I can scale much more easily and cheaply if I instead point to pointers to flat files (where each pointer refers to a different text file stored in something like S3) instead of using a text box.

An alternative that seems to be gaining popularity is a fully document-based NoSQL solution (e.g. CouchDB, MongoDB, etc.). I am wondering what tradeoffs (scalability / reliability / security / performance / ease of implementation / ease of maintenance / cost) are simply by using an SQL text field, pointing to flat files, or completely rethinking the entire system in the context of NoSQL document storage?

+4
source share
1 answer

A better approach is to use relational db for normal (non-text) data and store large (text) data "somewhere else" that can handle big data better than a relational database.

First, discuss why it's a bad idea to save big data in a relational database: '

  • line sizes become much longer, so I / O is required for reading on disk pages with balls of specified lines.
  • the size of the backups and, more importantly, the backup time increases to such an extent that they can damage the DBA tasks and even bring the systems offline (then the backups are disabled, then the disk fails, oops)
  • you usually do not need to search for text, so there is no need to use it in the database
  • relational databases and libraries / drivers are usually not suitable for processing unusually large data, and the way they are processed often depends on the provider, which makes any solution not portable.

Your choice of “somewhere else” is wide, but includes:

  • Great storage software such as Cassandra, MongoDB, etc.
  • NoSQL Databases such as Lucene
  • File system

Do what works easiest - they are all valid as long as you do your needs calculations:

  • peak recording performance
  • maximum read performance
  • long term storage

Another tip: do not store anything about text in a relational database. Instead, specify / index the text using the relational database row identifier. That way, if you change your implementation, you won’t have to reuse your data model.

+9
source

Source: https://habr.com/ru/post/1384650/


All Articles