Kassandra: difference in b / w TEXT (VARCHAR) and ASCII

I understand that text and varchar are aliases in which UTF-8 strings are stored. What about ASCII, which the documentation says is "US-ASCII character string"? What is the difference besides coding?

Is there any difference in size? Is the preferred choice between the two when I store large strings (~ 500KB)?

+4
source share
1 answer

Regarding this anwer :

If the data is a piece of text, such as a String in Java, which is encoded in UTF-16 at runtime, but when serialized in Cassandra with a text type, UTF-8 is used. UTF-16 always uses 2 bytes per character and sometimes 4 bytes, but UTF-8 is spatially efficient and depending on the character can be 1, 2, 3 or 4 bytes long.

This means that the CPU works to serialize such data for encoding / decoding purposes. Also, depending on the text, for example, 158786464563, the data will be stored with 12 bytes. This means that more space and more I / O is being used.

Note. cassandra offers an ascii type that follows the US-ASCII character set and always uses 1 byte per character.


Is there any difference in size?

Yes

, (~ 500 )?

ascii , UTF-8, UTF-8 , UTF-16. , // . " what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8"

+6

Source: https://habr.com/ru/post/1681134/


All Articles