Check string for non-printable characters when reading a text file

My program should read text files - line by line. Files in UTF-8. I'm not sure if the files are correct - may contain non-printable characters. Is it possible to check it without going to the byte level? Thank.

+48
java file file-io
Sep 14 2018-11-11T00: 00Z
source share
8 answers

If you want to check that the string has non-printable characters, you can use regular expression

[^\p{Print}] 
+15
Sep 14 '11 at 9:19
source share

Open the file with FileInputStream , then use the InputStreamReader using UTF-8 Charset to read the characters from the stream and use the BufferedReader to read the lines, for example through BufferedReader#readLine , which will give you a line. Once you have a string, you can check for characters that are not what you consider suitable for printing. A.

eg. (without error checking) using try-with-resources (which is in a dimly modern version of Java):

 String line; try ( InputStream fis = new FileInputStream("the_file_name"); InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8")); BufferedReader br = new BufferedReader(isr); ) { while ((line = br.readLine()) != null) { // Deal with the line } } 
+121
Sep 14 '11 at 9:12
source share

Although this is not difficult to do manually using BufferedReader and InputStreamReader , I would use Guava :

 List<String> lines = Files.readLines(file, Charsets.UTF_8); 

Then you can do whatever you want with these lines.

EDIT: Note that this will read the entire file into memory at a time. In most cases, this is really good - and this, of course, is easier than reading line by line, processing each line when you read it. If this is a huge file, you might have to do it just like TJ Crowder's Answer.

+49
Sep 14 2018-11-11T00:
source share

Just found that using Java NIO ( java.nio.file.* ) You can easily write:

 List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8); for(String line:lines){ System.out.println(line); } 

instead of dealing with FileInputStream and BufferedReader with ...

+42
Oct 11 '12 at 11:17
source share

How about below:

  FileReader fileReader = new FileReader(new File("test.txt")); BufferedReader br = new BufferedReader(fileReader); String line = null; // if no more lines the readLine() returns null while ((line = br.readLine()) != null) { // reading lines until the end of the file } 

Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html

+11
Oct 21 '13 at 10:37
source share

I can find the following ways.

 private static final String fileName = "C:/Input.txt"; public static void main(String[] args) throws IOException { Stream<String> lines = Files.lines(Paths.get(fileName)); lines.toArray(String[]::new); List<String> readAllLines = Files.readAllLines(Paths.get(fileName)); readAllLines.forEach(s -> System.out.println(s)); File file = new File(fileName); Scanner scanner = new Scanner(file); while (scanner.hasNext()) { System.out.println(scanner.next()); } 
+5
Apr 15 '16 at 7:49
source share

The answer from @TJCrowder is Java 6 - in java 7 the valid answer is one from @McIntosh - although its using Charset for a name for UTF -8 is not recommended:

 List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8); for(String line: lines){ /* DO */ } 

It recalls the many Guava paths published by Skeet above - and, of course, the same caveats apply. That is, for large files (Java 7):

 BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8); for (String line = reader.readLine(); line != null; line = reader.readLine()) {} 
+2
Jun 17 '14 at 17:41
source share

If each char in the file is correctly encoded in UTF-8, you will not have problems reading it using a reader with UTF-8 encoding. You can check each char file and see if you think it is printable or not.

0
Sep 14 '11 at 9:13
source share



All Articles