Reading Chinese characters in a string from a byte buffer

So, I have a char [] array containing text and other data.

How to extract Chinese text from char [] array? Right now i can get english using

public String getString(int index, int length) { String str = ""; for (int i = 0; i < length && this.data[index + i] != 0; i++) str = str + this.data[index + i]; return str; } 

then I try this:

 try { String charset = "GB18030"; String str = new String(m.target.getBytes("UTF-16"), "GB18030"); System.out.println(str); System.out.println(str.equals("大家")); } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); } 

m.target is the string I got from the byte [] array using getString () above. I tried various encodings and their combinations, and none of them displays the text correctly (大家), and no one returns true for str.equals ("大家")

EDIT

Using this method, I can successfully get Chinese characters.

 public String test(int index, int length) { byte[] t = new byte[this.data.length]; for (int i = 0; i < this.data.length; i++) t[i] = (byte) this.data[i]; try { return new String(t, index, length, "GB18030"); } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); } return null; } 

But now my question is ... I thought the maximum byte could be 127? How can an array of bytes contain high byte characters? Can I safely change the buffer to byte [] instead of char []?

+4
source share
1 answer

Both char and String in Java are Unicode. You do not need to worry about this while you work with it inside Java code. You specify the encoding when converting to / from the byte [] array or reading / writing to / from the input / output stream .

To declare a string containing Chinese characters, you can use escaped sequences or simply write them to code, but you have to take care of the encoding of the file. The UTF-8 format is now quasi-standard, it is supported by both IDEs (for example, Eclipse) and build tools (maven, ant).

So you just write

 char ch = '大'; char[] chrs = new char[]{'大','家'}; String str = "大家"; 

To read Chinese characters from, for example, UTF-16 , you use InputStreamReader , defining the correct encoding, and you can read lines, i.e. using BufferedReader

  BufferedReader reader = new BufferedReader(new InputStreamReader( new FileInputStream("myfile.txt"), "UTF-16")); 
+3
source

Source: https://habr.com/ru/post/1391154/


All Articles