SQL vs NoSQL for data that will be presented to the user after adding several filters

I am about to start a project for work that is very out of line with my usual responsibilities. Like SQL DBA, my initial bias was to approach a project using a SQL database, but the more I learn about NoSQL, the more I think this is the best option. I was hoping I could use this question to describe the project at a high level in order to get some feedback on the pros and cons of using each option.

The project is relatively simple. I have a set of objects that have different attributes. Some of these attributes are common to all objects, while some are common only to a subset of objects. What I am tasked with building is a service in which the user selects a series of filters that are based on the attributes of the object, and then a list of objects that matches all the filters is returned. When the user selects a filter, he or she can filter a common attribute or subset, but this is abstracted on the front side.

^ Depending on user feedback, there is a possibility that the list of objects can correspond only to some of the filters, and the quality of the match will be displayed to the user through an estimate indicating how many criteria were matched.

After watching this conversation by Martin Voller ( http://www.youtube.com/watch?v=qI_g07C_Q5I ), it looks like the document-style NoSQL database should fit my needs, but given that I have no experience with this approach, it is also possible that I am missing something obvious.

Additional Information. Initially, the database will contain about 5000 objects with each object containing from 10 to 50 attributes, but the number of objects will grow over time and the number of attributes may increase depending on user feedback. In addition, I hope that I will have the opportunity to quickly make changes to the product, as I receive user reviews, so flexibility is very important.

Any feedback would be greatly appreciated, and I would be happy to provide additional information if I left anything critical from my discussion. Thank you

+6
source share
3 answers

You can also answer this question. I have to comment that I'm not strong in NoSQL, so I'm leaning towards SQL.

I would do it as a set of three tables. You will see that this is called the logic of a pair of entity values ​​on the Internet ... it is a way of handling multiple dynamic attributes for elements. Let's say you have a bunch of products and each has a few attributes.

Prd 1 - a,b,c Prd 2 - a,d,e,f Prd 3 - a,b,d,g Prd 4 - a,c,d,e,f 

So, here are 4 products and 6 attributes ... the same theory will work for hundreds of products and thousands of attributes. The standard way to keep this in one table requires product information along with 6 columns for storing data (in this setting, at least one third of them are zero). The added new attribute means changing the table to add another column to it, and come up with a script to populate the existing one or just leave it zero for all existing ones. Not the funniest, maybe a headache.

An alternative to this is to set a pair of name values. You want the header table to contain common values ​​among your products (for example, name or price ... everything rpoducts always has). In our example above, you will notice that the attribute β€œa” is used for each record ... this means that attribute a can also be part of the header table. We will call the key column here 'header_id'.

The second table is a look-up table that simply stores the attributes that can be assigned to each product and assigns an identifier to it. We call the table attribute with atrr_id for the key. Rather straight forward, each attribute above will be one line.

Quick example:

 attr_id, attribute_name, notes 1,b, the length of time the product takes to install 2,c, spare part required etc... 

This is just a list of all your attributes and what this attribute means. In the future, you will add a row to this table to open a new attribute for each heading.

A final table is a mapping table that actually contains information. You will have your product id, attribute id, and then value. Commonly called a detail table:

 prd1, b, 5 mins prd1, c, needs spare jack prd2, d, 'misc text' prd3, b, 15 mins 

See how data is stored as a product key, value label, value? Any future product may have any combination of any attributes stored in this table. Add new attributes - add a new row to the attribute table, and then populate the details table as needed.

I believe there is also a wiki ... http://en.wikipedia.org/wiki/Entity-attribute-value_model

After that, it just calculates the best methodology for outputting your data (I would recommend Postgres as the db open source option)

+1
source

This problem can be solved using two separate parts of the technology. First, use a well-designed database schema with a modern DBMS. By modeling the application using normal normalization principles, you will get a really good response from the repository for individual CRUD applications.

Finding this scheme, you guessed it, will be a nightmare on a scale. Do not do this. Instead, look at Solr / Lucene as a full-text search engine. Solr support for dynamic fields means that you can add new properties to your documents / objects on the fly and immediately be able to search inside your data if you have developed your Solr schema correctly.

+3
source

I am not an expert in NoSQL, so I will not protect it. However, I have a few points that can help you solve your questions regarding the structure of the relational database.

The first thing I see right away is talking about inheritance (at least conceptually). Your objects are inherited from each other, so you have additional attributes for derived objects. Suppose you add a new type of object, the first thing you need to do (conceptually) is to find a base / super (parent) object type for it that has a subset of attributes and you add on top of them (expanding the base type of the object).

Once you get used to the idea, as stated above, the next thing is about inheritance matching patterns for relational databases. I will steal the terms from Martin Fowler to describe him here.

You can save the inheritance chain in the database by doing one of three methods:

1 - Inheritance of a single table . The whole chain of inheritance is in one table. Thus, all new types of objects fall into the same table.

Advantages: in your search query there is only one table to search for, and it should be faster than a join, for example.

Disadvantages: the table grows faster than, for example, with option 2; you need to add a type column that says what type of object the string is; some rows have empty columns because they belong to other types of objects.

2 - Inheritance of concrete tables . Separate the table for each new type of object.

Advantages: if the search affects only one type, you only search one table at a time; each table grows slower than, for example, in option 1.

Disadvantages: you need to use query aggregation while searching for multiple types.

3 - Inheritance of the class table : one table for the object of the base type with its attributes, additional tables with additional attributes for each type of child objects. Thus, the child tables refer to the base table with PK / FK relationships.

Advantages: all types are present in one table, so they are easy to find together using common attributes.

Disadvantages: the base table is growing rapidly, since it also contains part of the child tables; you need to use join to search for all types of objects with all attributes.

Which one to choose?

This is a compromise, obviously. If you expect to add many types of objects, I would go with the inheritance of the Concrete table, which gives reasonable query and scaling parameters. Class table inheritance doesn't seem very friendly with fast queries and scalability. Inheriting from individual tables seems to work with a small number of types.

Your call, my friend!

+2
source

Source: https://habr.com/ru/post/956282/


All Articles