Java converts a character stream to a "readable" string

I have a character group that looks something like this:

Комуникационна кабелна система 

and sometimes I have this mix:

 Généralités 

The first means:

Communication cable system

and second -:

Généralités

I see this with a browser and put it in the body.

But how can I get java to output "real" characters? What is called the above encoding?

I tried a couple of things, and finally this (which did not work):

 import java.nio.charset.*; import java.nio.ByteBuffer; import java.nio.CharBuffer; List<String> lst = new ArrayList<String>(); lst.add("&#1050;"); lst.add("&#1086;"); for ( String s : lst ) { Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1"); ByteBuffer inputBuffer = ByteBuffer.wrap( s.getBytes() ); // decode UTF-8 CharBuffer data = utf8charset.decode(inputBuffer); // encode ISO-8559-1 ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array(); System.out.println ( new String(outputData) ) } 
+6
source share
1 answer

You can use commons-lang to free this kind of thing. In Groovy:

 @Grab( 'commons-lang:commons-lang:2.6' ) import org.apache.commons.lang.StringEscapeUtils as SEU def str = 'G&#233;n&#233;ralit&#233;s' println SEU.unescapeHtml( str ) 
+7
source

Source: https://habr.com/ru/post/910716/


All Articles