How to read a French character using BufferedInputStream

I am trying to read some French character from a file, but some characters appear if the letter contains à é è. Can anyone guide me how to get the actual nature of the file. Here is my main method

public static void main(String args[]) throws IOException

    {
    char current,org;

    //String strPath = "C:/Documents and Settings/tidh/Desktop/BB/hhItem01_2.txt";

    String strPath = "C:/Documents and Settings/tidh/Desktop/hhItem01_1.txt";
    InputStream fis;

    fis = new BufferedInputStream(new FileInputStream(strPath));

    while (fis.available() > 0) {
    current= (char) fis.read(); // to read character
                                                            // from file
                            int ascii = (int) current; // to get ascii for the
                                                        // character
                            org = (char) (ascii);
                            System.out.println(org);
    }
+4
source share
4 answers

You are trying to read the UTF-8 character, actually using ASCII. Here is an example of how to implement your function:

public class Test {
    private static final FILE_PATH = "c:\\temp\\test.txt";
    public static void main(String[] args){

    try {
        File fileDir = new File(FILE_PATH);

        BufferedReader in = new BufferedReader(
           new InputStreamReader(
                      new FileInputStream(fileDir), "UTF8"));

        String str;

        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }

                in.close();
        } 
        catch (UnsupportedEncodingException e) 
        {
            System.out.println(e.getMessage());
        } 
        catch (IOException e) 
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
    }
}

Link: How to read UTF-8 encoded data from a file

+2
source

You can download a single jar file for Apache Commons IO and try to implement it by reading each line and not reading char char.

 List<String> lines = IOUtils.readLines(fis, "UTF8");

        for (String line: lines) {
          dbhelper.addDataRecord(line + ",'" + strCompCode + "'"); 
        }
+1
source

, Windows Latin-1, UTF-8.

private static final String FILE_PATH = "c:\\temp\\test.txt";

Path path = Paths.get(FILE_PATH);
//Charset charset = StandardCharset.ISO_8859_1;
//Charset charset = StandardCharset.UTF_8;
Charset charset = Charset.forName("Windows-1252");
try (BufferedReader in = Files.newBufferedReader(path, charset)) {
    String line;
    while ((line = in.readLine()) != null) {
        System.out.println(line);
    }
}

line . , System.out Unicode , Unicode.

System.out.println("My encoding is: " + System.getProperty("file.encoding"));

, , ? char. char, UTF-8 - .

Unicode .

é:

String e = "\u00e9";
String s = new String(Files.readAllBytes(path), charset);
System.out.println("Contains e´ : " + s.contains(e));

:

Files.newBufferedReader( ), .

try (BufferedReader in = new BufferedReader(
         new InputStreamReader(
             new FileInputStream(file), charset))) {

These are buffers for faster reading, and InputStreamReader uses InputStream binary data plus encoding to convert to Unicode Reader.

0
source

a specific encoding for the French language is provided by IBM CP1252 (preferable because it runs on all operating systems).

Hello,

Fake guy

0
source

Source: https://habr.com/ru/post/1609010/


All Articles