Java, how can I add an accented "e" to a string?

Using tucuxi from existing mail, Java removes HTML from String without regular expressions. I built a method that will parse any basic HTML tags from a string. Sometimes, however, the original string contains html hexadecimal characters such as & # x00E9 (which is an e accent) . I started adding functionality that translates these escaped characters into real characters.

You are probably asking: why not use regular expressions? Or a third-party library? Unfortunately, I cannot, because I am developing a BlackBerry platform that does not support regular expressions, and I could never successfully add a third-party library project to the project.

So, I got to the point that somewhere & # x00E9 is replaced by "e". My question now is how to add the actual "accented e" to the string?

Here is my code:

public static String removeHTML (string syntax) {

char[] cs = synopsis.toCharArray(); String sb = new String(); boolean tag = false; for (int i = 0; i < cs.length; i++) { switch (cs[i]) { case '<': if (!tag) { tag = true; break; } case '>': if (tag) { tag = false; break; } case '&': char[] copyTo = new char[7]; System.arraycopy(cs, i, copyTo, 0, 7); String result = new String(copyTo); if (result.equals("&#x00E9")) { sb += "e"; } i += 7; break; default: if (!tag) sb += cs[i]; } } return sb.toString(); } 

Thanks!

+4
source share
3 answers

Java strings are unicode.

 sb += '\u00E9'; # lower case e + ' sb += '\u00C9'; # upper case E + ' 
+4
source

You can print almost any character you like in Java, since it uses a Unicode character set.

To find the character you need, take a look at the charts here:

http://www.unicode.org/charts/

In the Latin additional document, you will see all Unicode numbers for accented characters. You should see the hexadecimal number 00E9 indicated for Ć©, for example. The numbers for all Latin accented characters are given in this document, so you should find this pretty useful.

To print the use character in String, simply use the Unicode \ u escape sequence and then the character code:

 System.out.print("Let go to the caf\u00E9"); 

Produces: "Let go of the cafe"

Depending on which version of Java you are using, you may find StringBuilders (or StringBuffers if you are multithreading) more efficiently than using the + operator to concatenate strings.

+2
source

try the following:

  if (result.equals("&#x00E9")) { sb += char(130); } 

instead

  if (result.equals("&#x00E9")) { sb += "e"; } 

The fact is that you are not adding emphasis to the top of the ā€œeā€ character, but rather it is a separate character. This site contains ascii lists for characters.

0
source

Source: https://habr.com/ru/post/1307250/


All Articles