Column-oriented and row-oriented databases

I used a row-oriented database design for a long time, and with the exception of data projects and large data samples, I did not use a column-bound database design for an OLTP application.

A row-oriented table looks like

ID, Make, Model, Month, Miles, Cost 1 BMW Z3 12 12000 100 

Some people in our team support the design of a column-oriented database. They assume that all column names must be property names in the property table. Then another Quote table will have two columns PropertyName and PropertyValue.

In the .net code, we read each key and compare and convert it to a strongly typed object. The code is really getting messy.

 if (qwi.DomainCode == typeof(CoreBO.Base.iQQConstants.MBPCollateralInfo).Name) { if (qwi.RefCode == iQQConstants.MBPCollateralInfo.ENGINETYPE) { Aspiration = qwi.Value; } else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.FUELTYPE) { FuelType = qwi.Value; } else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MAKE) { Make = qwi.Value; } else if (qwi.RefCode == iQQConstants.MBPCollateralInfo.MILEAGE) { int reading = 0; bool success = int.TryParse(qwi.Value, out reading); if (success) { OdometerReading = reading; } } } 

The assertion for this column-oriented construct is that we do not have to change the table schema and stored procedure (we still use the stored procedure instead of the Entity Framework).

Looks like we're heading for a real problem. Is the column-oriented design well accepted in the industry.

+6
source share
5 answers

I'm having problems with your terminology. You describe the EAV structure (for the Entity-Attribute-Value attribute).

In addition: a column-oriented database usually refers to a database in which each column is stored separately from the others (when I learned about databases, it was called β€œvertical partitioning,” but I don’t think it happened). Examples include Paracel and Vertica.

The database of attribute entities stores each attribute for an object as a separate row.

The first problem you have with your particular structure is typing. Some attributes are strings, and some are numbers. This becomes a management nightmare in the EAV world. Either you store everything as strings (losing the ability to enter control values ​​and guaranteeing arithmetic words), or add several columns for different types with a type column (making queries much more complicated).

Similarly, restrictions and references to foreign keys are much more difficult to implement. In addition, since you repeat the object identifier and attribute identifier on each row, data often takes up more space. NULL values ​​are usually quite space efficient.

On the OLTP side, another problem arises. When you want to insert an object, you usually want to insert a bunch of attributes. One insertion has now turned into many inserts, and you will want to start wrapping them in transactions, affecting performance.

Given all these drawbacks, you might think that you should never use EAV models. There is a place for them. They are especially useful when attributes change over time. Say, if you have an application in which users can put their own information using tags. In such cases, the best solution is a hybrid approach. Use a regular relational table with many columns for general information. Use the EAV table for more information for each object.

+10
source

Source: WIKI

  • Column-oriented organizations are more efficient when an aggregate needs to be calculated over many rows, but only for a significantly smaller subset of all data columns, since reading this smaller subset of data can be faster than reading all data.
  • Column-oriented organizations are more efficient when new column values ​​are provided for all rows at the same time, since column data can be efficiently written and replaced with old column data without touching other columns for the rows.
  • String organizations are more efficient when many columns of one row are required, and when the row size is relatively small, since the entire row can be obtained using a single disk search.
  • String organizations are more efficient at writing a new line if all the column data is delivered at the same time, since the entire line can be written with one disc.

In practice, row-oriented storage layouts are well suited for workloads like OLTP, which are more loaded with interactive transactions. Column-oriented storage layouts are well suited for OLAP-like workloads (such as data warehouses), which typically include fewer complex queries across all data (possibly terabytes).

+4
source

In addition to the problems that Gordon Linoff mentions, EAV data models are also terribly difficult to query - find all the cars where BMW is located and months between 12 and 24 years old, 10,000 becomes a huge mess of nasty SQL, especially if you are doing string comparisons by numbers ...

+3
source

In general, row-oriented and column-oriented low-level storage (disk) storage. The viability of each store depends on your requirement. In some scenarios, column-oriented storage will be better and in some scenarios, row-oriented storage.

In the Hbas database, they use the concept of a column family, which is a group of columns.

The difference between row oriented is that a logical table consisting of rows is stored on one row for each row block, while a column oriented column stores one column per column block.

A row-oriented result results in poor performance when running a query that is analytic (e.g., salary, average salary), but works fine when we need to access the odd details of a row or insert a new record. While column orientation works great in analytic queries, but results in poor performance for inserting a single record or accessing all the details of a row.

You can visit this link, which describes various scenarios of its pros and cons with an example and their total difference.

click here: http://geekrandomstuff.blogspot.tw/2014/04/row-oriented-database-vs-column.html

0
source

From my experience, EAV is great for storing application settings, i.e. relatively static data without the additional need for connecting and converting data, nothing more.

0
source

Source: https://habr.com/ru/post/953931/


All Articles