So ... this thing is NoSQL

Question

So ... this thing is NoSQL

I looked at MongoDB and I am fascinated. It seems (although I should be suspicious) that in exchange for organizing my database a little differently, I get as much performance as I have processors and RAM for free? It seems elegant and flexible, but I don't trade as fast as I do with Rails. So what is the catch? What does the relational database give me that I also cannot do with Mongo? In other words, why (apart from the immaturity of existing NoSQL systems and resilience to changes) do not all industries switch from MySQL?

As I understand it, when scaling, you get MySQL to serve Memcache. Now it seems that I can start with something that is equally fulfilled from the very beginning.

I know that I can’t do transactions through relationships ... when will this be a big problem?

I read http://teddziuba.com/2010/03/i-cant-wait-for-nosql-to-die.html , but as far as I understand, his argument basically is that to real enterprises that use real tools, no need to avoid SQL, so people who feel the need to cut do it wrong. But no “enterprise” deals with almost the same number of simultaneous users as Facebook or Google, so I don’t see its point. (Walmart has 1.8 million employees; Facebook has 300 million users.)

I am sincerely interested in this ... I promise that I am not trolling.

+46

mongodb nosql

jacobbaer Jul 06 2018-10-06T00:

source share

8 answers

I write this, but as a dispute to Rex's answer.

I dispute the idea that nosql is indifferent and fuzzy.

I worked with CODASYL many years ago with C and Cobol - entity relationships are very tough in CODASYL.

In contrast, relational database systems have a very liberal relationship policy. While you can identify a foreign key, you can create a relationship binding.

It is often taken for granted that SQL is synonymous with RDBMS, but people write SQL drivers for CODASYL, XML, inverted sets, etc.

RDBMS / SQL does not match the accuracy of data or relationships. In fact, an RDBMS is a constant cause of inaccuracies and misperceptions. I do not see how RDBMS offer better data and relationship integrity than, for example, hasoop. Put on a JDO layer - and we can build a network of good and clean relationships between entities in hadoop.

However, I like working with SQL because it gives me the opportunity to script adhoc relations, although I understand that adhoc relations are a constant cause of falsification of relations and problems.

Being able to work with statistical analysis of business processes and production processes, SQL gave me the opportunity to explore relationships in which no relationships were previously found. Being able to work with statistical analysis gave me ideas that usually didn't appear in the way of SQL programmers.

For example, you should design and normalize your circuit to reflect a set of processes. What you may not understand is that relationships change over time. Statistical characteristics would show that a circuit can no longer be “properly normalized” as it once was. That the main components of the processes have mutated over time. But non-statistical programmers do not understand this and continue to advertise RDBMSs as an ideal solution for data integrity and relationship accuracy.

However, in a database linking relationship, you can link objects in relationships as they appear. When relationships mutate, the relationship naturally mutates with data. Relations and their mutation are documented in a database system without the expensive need to renormalize a schema. At this point, the RDBMS is only good as temporary dbs.

But then you can resist the fact that RDBMS also allows you to flexibly mutate your relationship, since that is what SQL does. True, it is very true - while you are doing BCNF or even 4NF. Otherwise, you will begin to see that your queries and data loaders perform replicated operations. But then, your long years in the RDBMS business still, at least, made you realize that BCNF is very expensive and operationally inefficient, and that we are constantly to blame for 2.5 NFing of our circuits.

To say that RDBMS and SQL promotes data and relationship integrity is a gross, false expression. Either you work for a company that is so tiny, or you don’t stay in your position for more than two years — you won’t see the amount of data or information mutation and problems caused by the RDBMS. Violation of the DBMS is the reason that managers are limited in presentation by computer applications and the reason for financial failures of companies that do not see changes in market behavior, because their views were limited to programmers whose views were limited to their respect for their favorite RDBMS Schemes.

That’s why SQL programmers don’t understand why your company’s statistics refuses to use your application, which you developed carefully, but they used a college intern to write SQL to upload data to their personal servers and to let your company managers trust accountants and statistics, and not your elegant multi-tier applications due to the inability of your applications to mutate with processes.

This may not be possible, but I still urge you to get some statistical insight to understand how processes mutate over time so that you can make the right technological decision.

The reason people don't switch to SQL-less is the lack of a good scripting environment, such as SQL, to do adhoc relationship analysis. Not because SQL-less technology does not match accuracy or integrity. Adhoc relationship analysis is very important these days thanks to the fast and flexible approaches and application development strategies we currently have.

+14

Blessed Geek Jul 6 '10 at 4:15

source share

Let me ask you questions one at a time:

I know that I can’t do transactions through relationships ... when will this be a big problem?

Cascading image deletion. Or even just basic referential integrity. The concept of "foreign keys" really cannot be applied to "collections" (the Mongolian term for tables). You can make atomic records for only one “document” (AKA record). Therefore, if you have a problem with the database, you can orphan the data in the database.

Am I getting as much performance as I have processors and RAM for free?

Not for free, but definitely with a different set of compromises. For example, Mongo works great when executing single-task, key / value queries. However, Mongo does not work well on relational queries. For many of them, you will need to use map-reduce. Mongo is a "RAM slut." Mongo basically requires 64-bit for any significant data set. Mongo will take up disk space, load 140 GB of DB, and you can end up using 200+ GB as the page file grows during use.

And you still need a fast drive.

In fact, I think it's safe to say that MongoDB is really a database system that serves advanced hardware (64-bit, lots of RAM, SSD). I mean, the entire database is centered around searching for data index data in RAM (hi 64-bit), and then doing a targeted random search on the disk (hi SSD).

why ... isn't the whole industry jumping from a ship from MySQL?

It does not match ACID . This is probably pretty bad for the banking system (of course, most of them still process flat files, but this is another problem). However, note that you can forcefully "safely" write with Mongo and ensure that data gets to disk, but only one "document" at a time.
He is still very young . Many large companies still use older versions of Crystal Reports in their SQL Server 2000 application written in VB6. Or they build company buses to manage the crazy heterogeneous environments that they have created over the years.
This is a very different paradigm . Perhaps 30% of the questions I regularly review on the Mongo mailing lists (and here) are mainly related to "how can I make an X request?" or "how do I structure this data?". Using MongoDB usually requires prior denormalization. It is not only a little complicated, but also untrained. Most people only learn to "normalize" at school; nobody teaches us how to denormalize work.
This is not the right tool for everything . Honestly, I think MongoDB is a great tool for reading and writing transactional data. This simple one-time CRUD that includes many modern applications. However, MongoDB is actually not very good at reporting. In fact, I honestly believe that the next step is not Mongo for everything, this is Mongo for transactional and MySQL for reporting. When your data gets so big that you throw out “real-time reports,” using Map-Reduce to populate the report database doesn’t look so bad.

As I understand it, when scaling, you get MySQL to serve Memcache. Now it seems that I can start with something that is equally fulfilled from the very beginning.

Honestly, I am working on this in several of my projects. Again, I think MongoDB really does an acceptable level of caching. In fact, it creates a file-backed caching layer. Therefore, if you are able to redirect MySQL to Mongo, you get Memcached without a cache miss. It also makes it easier to "heat the cache" on the new server, just copy the files and start Mongo, pointing to the desired folder, it really is that simple.

+10

Gates VP Jul 07 '10 at 5:11

source share

How often do you think Facebook makes arbitrary requests against its data warehouse? Not all of this is a web application, and, conversely, not every data analysis should be deeply analyzed.

NoSQL, in my opinion, is largely a reactionary response to what mostly consisted of people using RDBMS for tasks that they didn’t really like, because people didn’t take active decisions based on their needs and didn’t choose any defaulted. To "switch from MySQL to the ship" (or RDBMS in general) across the entire industry, it would be to repeat the same mistake again, and the pendulum would return in the other direction.

If MongoDB works for your use case, be sure to do it. Just don’t assume that your use case is all use cases. There is no technology that matches all the scenarios. The invention of supersonic jets did not eliminate the use of freight trains.

+7

Logan Capaldo Jul 06 '10 at 2:40

source share

A great reaction to NoSQL is rooted in the mentality of many NoSQL supporters. In particular, the relationship that best sums up as "SQL is too complicated, I don't need to do this." I don't like NoSQL because in many cases it seems like it raises ignorance.

I know that I can’t do transactions through relationships ... when will this be a big problem?

More often than you might expect. There are many things that can go wrong when you cannot accept a consistent dataset.

+2

Kalium Jul 06 '10 at 2:08

source share

I used MongoDB, Redis (more than key-value pairs support list, set and sorted sets), Tokyo Tyrant, Memcached and MySql and PostgreSQL.

The arguments between the NoSQL DBMS and the database-based SQL are completely unfounded. You need to choose the right model based on your use case. If you need ACID compatibility, continue with SQL DB such as PostgreSQL, Oracle, etc. You need high performance, but you care less about data, then you can consider the noSQL database. These are fundamentally different technologies. You can even use a combination of models. With NoSQL, you will have no relationships, restrictions, and sometimes a transaction. In fact, this is the reason NoSQL is faster.

As soon as I lost two months of aggregate data with MongoDB .. I don’t know how I lost them. But I had a backup, and I lost a few minutes of data. I returned MongoDB with backup. If you are using NoSQL, back up or schedule cron jobs to back up your database. This is also applicable for SQL DB.

Compared to SQL RDBMS, NoSQL databases are younger, and they are currently in full development, but NoSQL DBMSs have matured in their field, that is, they are designed for high performance and easy replication.

On my website (stacked.in) I used only redis DB, it works much faster than MySQL.

+2

user90150 Jul 6 '10 at 2:23

source share

Remember that NoSQL is not entirely new. In the end, they needed to use something in front of SQL and relational databases, right? In fact, systems like MUMPS and CODASYL have been working the same way for decades. Relational databases give you the ability to query data arbitrarily.

Say that you have a database with customers, their purchases and what items they purchased. A NoSQL database may have customers containing purchases and purchases containing items. This makes it easy to know which items a given customer purchases, but it is difficult to know which customers purchased a given product. A relational database would have tables for customers, purchases, items, and tables linking items to purchases. In SQL, both queries are trivial to formulate, and the database engine does all the hard work for you.

Also, keep in mind that part of the NoSQL trend is to sacrifice consistency or reliability for speed, scalability, and cost. Relational databases can scale, but it's not cheap. If you go to http://tpc.org , you can find RDBMS that run on hundreds of cores at the same time to deliver millions of transactions per minute, but they cost millions of dollars.

+2

Gabe Jul 06 2018-10-06T00:

source share

If your data does not use relational algebra and you do not need ACID guarantees, you will not get anything using languages that are designed exclusively for this purpose.

0

Arafangion Jul 06 2018-10-06T00:

source share

Rex M · Accepted Answer · 2010-07-06 02:03

I am also a big fan of MongoDB. This, as was said, is absolutely not a wholesale replacement for an RDBMS. Facebook has 300 million users, but if some of your friends do not appear on the list once, or one of the albums is missing from a random request, would you notice? Probably no. If your status update does not leak to all your friends in a few minutes, does it matter? Hardly. If Wal-Mart's balance is not synchronized, will someone lose their heads? Definitely.

NoSQL databases are excellent in “fuzzy” environments where relationships are not strict and data integrity can afford to not be synchronized. DBMSs are still important when datasets are extremely complex and relational (hence the name) and must be kept clean.

The big push for NoSQL comes from the fact that over the past 30 years we have used RDMBS systems for both scenarios. We now have a more suitable tool for many situations. In fact, some will argue the most. But no one will argue with everyone.

So ... this thing is NoSQL

More articles: