Cassandra Client Java API

I recently started working with the Cassandra Database. Now I am in the process of evaluating which Cassandra client we should go forward with.

I saw a different entry in stackoverflow about which client to use for Cassandra, but no one has a clear answer.

My team asked me to do some research on this and come up with a specific pros and cons for every Cassandra Client API's in Java.

As I mentioned, I recently contacted Cassandra , so I don’t have many ideas why some people choose Pelops client and why some people go with Astyanax and some other clients.

I know brief information about each of the Cassandra clients, by which I mean that I can do this work and start reading and writing to the Cassandra database.

Below is the information that I have so far.

CASSANDRA APIS

  • Hector (finished product)
    The most stable of the Java API, ready for prime time.

  • Astyanax (The Up and Comer)
    Net Java API from Netflix. It is not as widely used as Hector, but it is solid.

  • Kundera (NoSQL ORM)
    JPA, this is convenient when you want to interact with Cassandra through objects.
    This holds you back a bit in that you cannot have a dynamic number of columns / names, etc. But it allows you to port ORM or centralize storage to Kassandra for more traditional purposes.

  • Pelops
    I just used Pelops. It was a direct API, but it seems to have an impulse behind it.

  • PlayORM (ORM without limits?)
    I just heard about it. It looks like he is trying to solve the impedance mismatch between traditional JPA-based ORMs and NoSQL by introducing JQL. It looks promising.

  • Save (Avoid me!)
    This is the "low level" API.

Below are our priorities when choosing a Cassandra client -

  • The first priorities are: low overhead, Asynch APIs and reliability / stability for the production environment.
    (for example, more user-friendly APIs that can be used in the DAL that wraps the client).
  • Connection pooling and partition recognition is another good feature.
  • The ability to detect any new nodes that have been added.
  • Good support (as below).

Can anyone reflect on this? As well as any pros and cons for each Cassandra client , as well as someone who can fulfill my requirements, will also be useful.

I suppose I will mainly revolve around the Astyanax client or New Datastax client that uses Binary protocol I think, based on my research so far. But do not have specific information to support my research and present it to my team.

Any comparison between the Astyanax client and the New Datastax client (which uses the new binary protocol) will be very useful.

It will be very useful for me in my research and will get a lot of knowledge about it from different people who used to be different clients.

+45
java cassandra hector astyanax pelops
Apr 13 '13 at 1:25
source share
5 answers

Thrift is becoming an increasingly obsolete API:

First, you should be aware that the Thrift API will not receive new functions; it is there for backward compatibility and is not recommended for new projects.
- the paul

Therefore, I would avoid Trrift-based APIs (thrift is only supported for backward compatibility).

Saying that if you need to use the lean API, I would go for Astyanax. Astyanax is very easy to use (compared to other APIs, but my personal experience is that the Datastax driver is even simpler).

So you should take a look at the Datastax API ( and the GitHub repo )? I'm not sure if there are any compiled versions of the API to download, but you can easily create it with Maven. Also, if you look at the GitHub repository logs, they undergo very frequent updates.

The driver works exclusively with CQL3 and is asynchronous, but it should be warned that Cassandra 1.2 is the earliest supported version.

Performance
Astyanax is lean and the Datastax drive is a binary protocol. Here are the latest tests I could find between lean and CQL (note that they are definitely out of date). But with a high degree of probability, the small difference in performance shown in these tests will rarely matter.

Asynch Support
tried to implement , but decided not to).

Documentation
I can't really mind the Netflix wiki . The documentation is excellent and updated quite often. Their wiki includes code examples, and you can find tests in the source code if you need to see the code at work. I struggled to find the Datastax driver documentation, however the test is provided in the GitHub repository, so this is the starting point.

Also see this answer (well ... not mine) anyway) It addresses some of the advantages / disadvantages of Thrift and CQL.

+23
Apr 13 '13 at 19:00
source share

I would recommend the Datastax java driver for Cassandra http://www.datastax.com .

For JPA support, try my mapping tool. http://valchkou.com/cassandra-driver-mapping.html

No mapping files, no scripts, no configuration files. No need for DDL scripts. The schema is automatically synchronized with the entity definition.

Usage example:

  Entity entity = new Entity(); mappingSession.save(entity); entity = mappingSession.get(Entity.class, id); mappingSession.delete(entity); 

available on maven central server

  <dependency> <groupId>com.valchkou.datastax</groupId> <artifactId>cassandra-driver-mapping</artifactId> </dependency> 
+8
Jan 19 '14 at 6:13
source share

I would also add decent support. We constantly send replies to playORM on stack overflow;). He is also about to start supporting mongodb (the work is almost done), so any clients can work on mongodb or cassandra. It has its own query language, so this port works fine. You always have access to the raw astyanax interface when you really need speed.

Also, your note on asynch ... thrift did not previously support asynch, so no client did this because they generated a lean code. Since this has changed, I do not know about the client who added asynchronous stuff.

I know hbase has an asynchronous client. In any case, I just thought that I would add my 2 cents in case this helps a bit.

EDIT: I was recently in the cassandra-thrift source code, and this is not a good api for asynchronous development with the send and recv () method, but you don't know when to call the recv method. Aaron morton has a blog on the cassandra user list about how you can really do it, but it's not clean at all ... you need to grab the selector from thrift in depth and do something, so you know when to call the recv method. Pretty unpleasant stuff.

later, Dean

+3
Apr 13 '13 at 17:42 on
source share

I used Hector, Astyanax and Thrift directly. I also used the Python client PyCassa.

The functions that I found important and differentiable were:

  • Ease of use API
  • Composite Column Support
  • Connection pool
  • Delay
  • Documentation

One of the main problems is type validity. You want to be able to pass in longs, strings, byte [], etc. Both Hector and Astyanax solve this with Serializer objects. In Astyanax, you specify them up the chain, so you have to specify them less often. In Hector, the syntax is often very awkward and difficult to adapt if you change the scheme.

Since Python has dynamic types, PyCassa is much easier to handle. Since this is not an option for you, I will not talk much about it, but it was easier for me to use (of course), but also quite slowly.

Support for composite columns in Hector is very confusing. Astyanax has annotations to greatly simplify this.

As far as I know, the connection pool is the same for Hector and Astyanax. Both will avoid downed hosts and discover new ones added to the ring. Both of these features are critical to reliability and maintainability. Pelops seems to have these features, but I never tried.

The key difference between Astyanax and Hector is latency optimization. Astyanax has the ability to route read and write requests to a replica node, potentially avoiding the additional network hop. This can reduce the delay by a few milliseconds.

Astyanax finally had poor documentation, but now it has improved a lot.

Hector's only advantage that I see today is that it is much more widely used, so it’s probably less buggy. But Astyanax has a better feature set.

+2
Apr 16 '13 at 11:47 on
source share

I have a similar recommendation like Valchkou. DataStax java CQL driver, very good. I tried astyanax, kundera and a buffalo toy. Astyanax - a very low level and some kind of complex. Kundara and playorm are common ORMs for nosql databases and are difficult to set up and get started.

Datastax apis are pretty much like the JDBC driver, and you need to embed the CQL statements in your DAO and write a few lines of code to load and save your objects. To solve this problem, I wrote a mapper cassandra-jom java object built around the datastax cql driver. Cassandra-jom annotations are very similar to JPA / Hibernate annotations and can even create / update a column family diagram from your object model. It is easy to use and reliable and is used in my other web applications. Check it out on the github page.

https://github.com/w3cloud/cassandra-jom

+1
Oct 02 '14 at 18:14
source share



All Articles