Does mysql cluster catch up with cassandra?

Question

Does mysql cluster catch up with cassandra?

I recently looked at nosql solutions for our fairly large future database and found that cassandra is good, but there are very few resources available on the new cassandra releases on the Internet, and most blogs and articles are related to version 0.6, and now it also has implemented support for houop and hive . Although, on the other hand, the clustered version of mysql is also specifically designed to run on a horizontal scaled configuration using commodity servers.

Since we have become accustomed to the relational model for many years, the transition to cassandra will require decompilation of the brain, while the product is still not very mature, and the community is not so large as to quickly respond to any specific problem. I checked datastax (on of professional support providers) and their forums are pretty much dead.

So, how do you compare the mysql vs cassandra cluster when moving relational and non-relational comparisons to the side?

Although cassandra is smaller than a schema, it still demonstrates quite a few table functions, such as a supercolumn and an extra column, so a record can be found from several column values.

I also tried my best to find out how cassandra physically stores updated queries, for example, for a row, when an extra column is edited and a rather large piece of data is added, and then how does it physically store this record and how quickly does it reach this record? Because the fixed length is allocated in mysql columns, so this is not a big problem.

+6

mysql cassandra cluster-computing

Gary lindahl Aug 22 '11 at 10:36

source share

4 answers

Theodore hong · Answer 1 · 2011-08-23T10:04:19+0000

To answer the question about physical storage, the key feature that makes Cassandra fast is that they are add- ons only . That is, Cassandra only writes consecutive blocks to disk; he should not make slow attempts at random disk locations during recording.

When a column is updated, two things happen: the record is added to the commit log (to repair the failure) and the Memtable in memory is updated. When the Memtable is full, it is unloaded to disk as a new SSTable. Thus, the length of the data does not matter, since you are not trying to fit it into a fixed-length disk structure.

SSTables are read-only - you never go back or overwrite the old value when updating, you just write new ones. On read, Cassandra first looks in Memtable for the key. If it does not find it, Cassandra scans the SSTables in order from the newest to the oldest and stops when it finds the key. This gives you the latest value.

There are several optimizations. Each SSTable has an associated Bloom filter for its keys, which is a compact probabilistic index that can create false positives, but never false negatives. If the key is not included in the Bloom filter, you can safely skip this SSTable because it does not contain the key, although you can sometimes read the SSTable that you do not need.

When you get too many SSTables, they combine together into a larger process with compaction . Essentially, this is a great merge option on SSTables. This allows Cassandra to return space for values that have been overwritten or deleted, and to defragment rows that have been distributed across multiple SSTables.

See http://www.mikeperham.com/2010/03/13/cassandra-internals-writing/ and http://wiki.apache.org/cassandra/MemtableSSTable for more information.

jbellis · Answer 2 · 2011-08-22T16:33:04+0000

Here are some areas where I suspect Cassandra has an advantage:

Great support for large datasets
Replication: Cassandra supports an arbitrary number of fully distributed replicas instead of just split replicas (so you don't need to have the number of nodes dividing by your replica counter in Cassandra and there are no angular cases to handle around primary fault tolerance), best-in-class multi-center support data processing, support for synchronous replication, asynchronous (important if you are concerned about complete longevity), and reliable self-healing (scheduled handover, reading, anti-entrop I), to make sure that you never have to blow away the backup and rebuild it from scratch.
No lock during ALTER TABLE, index creation, etc.
Significantly simplified and less error prone administration (compare http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-online-add-node.html and http://wiki.apache.org/cassandra / Operations # Bootstrap ). In particular, I would like to draw your attention to how many clients or other nodes need to be restarted in the Cassandra: none script.

To give a detailed account of the latter, most people who did not actually run Cassandra in a multi-user node cluster do not understand how well Cassandra was designed for this. For a two minute taste, see the Jake Luciani demo .

Mat keep · Answer 3 · 2011-08-22T12:13:21+0000

1st disclaimer - I work as part of the MySQL Cluster product group

If you are looking for Cluster, you should start with the latest version of Release Release 7.2, which includes new features to significantly improve the performance of JOIN, as well as the new memcached interface, bypassing the SQL level http://dev.mysql.com/tech-resources/articles/mysql -cluster-labs-dev-milestone-release.html

If you are already familiar with MySQL, the following documentation identifies the differences between InnoDB and the current release of GA 7.1: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndb-innodb-workloads.html

Although they do not provide direct comparisons with Cassandra, they at least provide the latest cluster information from which you can find any comparison

Dean hiller · Answer 4 · 2012-09-05T12:18:12+0000

Another option these days is the relational model in cassandra with playORM, and as long as you share your really large tables, you can do joins and everything you are familiar with using Scalable SQL as such

@NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS p(:partId) select p FROM TABLE as p INNER JOIN p.security as s where s.securityType = :type and p.numShares = :shares"),

NOTE. TABLE is a Trades table, and p.security refers to a security table. The trades are divided so that they can have unlimited partitions, and the security table is smaller, so it is not partitioned, but you can do all Scalabla SQL using the connections you want.

Does mysql cluster catch up with cassandra?

More articles: