I am afraid that I have a question about the details of a rather oversaturated topic, I searched a lot, but could not find a clear answer to this specific obvious - a very important problem:
When converting byte [] to String using UTF-8, each byte (8 bits) becomes an 8-bit character encoded by UTF-8, but each UTF-8 character is stored as a 16-bit character in java. It's right? If so, does this mean that every stupid java character uses only the first 8 bits and consumes twice as much memory? Is that right too? I wonder how this wasteful behavior is acceptable.
Are there any tricks to have a pseudo-string string that has 8 bits? Will this lead to less memory consumption? Or maybe there is a way to store> two <8 bits in a single 16-bit Java character to avoid memory loss?
thanks for any answers to deconfusion ...
EDIT: hi thanks everyone for the answer. I knew about variable length variable UTF-8. However, since my source is a byte that is 8 bits, I realized (apparently wrongly) that it only needs 8-bit UTF-8 words. Does UTF-8 conversion actually save the weird characters that you see when you do "cat somebinary" on the CLI? I thought that UTF-8 was just somehow used to map each of the possible 8-bit byte words to one specific 8-bit UTF-8 word. Wrong? I was thinking about using Base64, but this is bad because it uses only 7 bits.
the questions are reformulated: is there a smarter way to convert a byte to something String? May my favorite was just toss byte [] on char [], but then I still have 16-bit words.
Additional Use Information:
I am adapting Jedis (java client for NoSQL Redis) as a "primitive storage level" for hypergraphDB. So jedis is the database for another "database". My problem is that I have to constantly transmit jedis with bytes [], but internally,> Redis <(actual server) deals only with binary safe strings. Since Redis is written in C, a char is 8 bits long, AFAIK is not ASCIII, which is 7 bits. In Jedis, however, in the java world, each character has an internal 16-bit length. I donβt understand this code (yet), but I suppose that jedis then converts these 16-bit java strings to the corresponding 8-bit Redis string ((here [3]). It says that it extends FilterOutputStream. Full string conversion byte [] β and use this filter Outputoutstream ...?)
now I'm wondering: if I had to convert bytes [] and String all the time, with data sizes from very small to potentially very large, there is no huge waste of memory so that every 8-bit character goes around 16 bits in java?