Kassandra: difference in b / w TEXT (VARCHAR) and ASCII

Question

Kassandra: difference in b / w TEXT (VARCHAR) and ASCII

I understand that text and varchar are aliases in which UTF-8 strings are stored. What about ASCII, which the documentation says is "US-ASCII character string"? What is the difference besides coding?

Is there any difference in size? Is the preferred choice between the two when I store large strings (~ 500KB)?

+4

string cassandra utf-8 ascii cql

Teddy ding Jul 10 '17 at 16:54

source share

1 answer

ruhul · Accepted Answer · 2017-07-11T04:22:25+0000

Regarding this anwer :

If the data is a piece of text, such as a String in Java, which is encoded in UTF-16 at runtime, but when serialized in Cassandra with a text type, UTF-8 is used. UTF-16 always uses 2 bytes per character and sometimes 4 bytes, but UTF-8 is spatially efficient and depending on the character can be 1, 2, 3 or 4 bytes long.
This means that the CPU works to serialize such data for encoding / decoding purposes. Also, depending on the text, for example, 158786464563, the data will be stored with 12 bytes. This means that more space and more I / O is being used.
Note. cassandra offers an ascii type that follows the US-ASCII character set and always uses 1 byte per character.

Is there any difference in size?

Yes

, (~ 500 )?

ascii , UTF-8, UTF-8 , UTF-16. , // . " what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8"

Kassandra: difference in b / w TEXT (VARCHAR) and ASCII

More articles: