Efficient way to store a JSON string in a Cassandra column?

Cassandra newbie question. I am collecting data from a social networking site using REST calls. This way I get the data returned in JSON format. JSON is just one of the columns of my table. I am trying to figure out what is โ€œbest practiceโ€ for storing a JSON string. At first I thought about using the map type, but JSON contains a combination of strings, numeric types, etc. It doesn't look like I can declare substitution types for a map key / value. The JSON string can be quite large, possibly more than 10 KB. I could save it as a string, but it looks like it would be inefficient. I would suggest that this is a common task, so I am sure that there are some general recommendations on how to do this. I know that Cassandra has built-in JSON support, but from what I understand it is mostly used when the entire JSON map matches 1-1 using the database schema. This is not for me. There are a bunch of columns in the schema, and a JSON string is a kind of "payload." Is it better to store the JSON string as a drop or as "text"? BTW, Cassandra version is 2.1.5. Any hints appreciated. Thank you in advance.

+5
source share
2 answers

In the Cassandra Storage engine, there really isnโ€™t much difference between blob and text, since Cassandra stores text as drops in essence. And yes, the "native" JSON support that you are talking about depends only on when your data model matches your JSON model, and only in Cassandra 2.2 +.

I would save it as a text type, and you did not need to inject anything to compress your JSON data when sending data (or handle decompression). Because the Cassandra Binary Protocol supports transport compression . Also, make sure your table stores the compressed data with the same compression algorithm (I suggest using LZ4, since this is the fastest algorithm) to save on the compression for each read request. Thus, if you configure storage of compressed data and use transport compression, you do not even need to implement yourself.

You did not specify which client driver you are using, but here is the documentation on how to configure transport compression for the Java Datastax Client Driver .

+8
source

It depends on how you request your JSON. There are 3 possible strategies:

  • Save as string
  • Save as compressed blob
  • Save as blob

Option 1 has the advantage that it is readable by a person when requesting data on the command line using cqlsh or if you want to debug the data directly live. The downside is the size of this JSON column (10k)

Option 2 has the advantage that the JSON payload is small because text elements have a pretty decent compression ratio. The disadvantages are: a. you need to take care of the client side compression / decompression and b. it is not read by a person directly

Option 3 has the disadvantages of option 1 (size) and 2 (not human readable)

+2
source

Source: https://habr.com/ru/post/1246185/


All Articles