Checking the txt file format in java

What is the best way to check if there is a .txt file:

  • Actually a .txt file, not another type of file with only the extension.

  • The format of the .txt file corresponds to the specified format (therefore, it can be correctly analyzed, contains all the necessary information, etc.)

All this is done in Java, where the file will be extracted, and then it must be checked to make sure that this is what it should be. So far I only found JHOVE (and now JHOVE2) as tools for this task, but did not find much sense in the documentation for its implementation in Java code, and not on the command line. Thank you for your help.

+4
source share
1 answer

It looks like you're looking for a formatting option in general, can I recommend regular expressions for you? You can perform various types of matching using regular expressions. I wrote a simple example below [for all those regex experts, have mercy on me if I don't use the perfect expression;)]. You can put the REGEX and MAX_LINES_TO_READ constants in the properties file and modify this to make it even more generalized.

Basically you would check your ".txt" file for the maximum number of lines (although many lines are necessary for proper formatting), you can also use regular expressions for the header line or if necessary use several different regular expressions to check the formatting), and if all these lines match, the file will be marked as "valid".

This is just an example of how you might work. You must implement proper exception handling, except for just the β€œexception” for one.

For testing your regular expressions in Java, http://www.regexplanet.com/simple/index.html works very well.

Here is the source of "ValidateTxtFile" ...

import java.io.*; public class ValidateTxtFile { private final int MAX_LINES_TO_READ = 5; private final String REGEX = ".{15}[ ]{5}.{15}[ ]{5}[-]\\d{2}\\.\\d{2}[ ]{9}\\d{2}/\\d{2}/\\d{4}"; public void testFile(String fileName) { int lineCounter = 1; try { BufferedReader br = new BufferedReader(new FileReader(fileName)); String line = br.readLine(); while ((line != null) && (lineCounter <= MAX_LINES_TO_READ)) { // Validate the line is formatted correctly based on regular expressions if (line.matches(REGEX)) { System.out.println("Line " + lineCounter + " formatted correctly"); } else { System.out.println("Invalid format on line " + lineCounter + " (" + line + ")"); } line = br.readLine(); lineCounter++; } } catch (Exception ex) { System.out.println("Exception occurred: " + ex.toString()); } } public static void main(String args[]) { ValidateTxtFile vtf = new ValidateTxtFile(); vtf.testFile("transactions.txt"); } } 

Here is what in "transaction.txt" ...

 Electric Electric Co. -50.99 12/28/2011 Food Food Store -80.31 12/28/2011 Clothes Clothing Store -99.36 12/28/2011 Entertainment Bowling -30.4393 12/28/2011 Restaurant Mcdonalds -10.35 12/28/11 

The way out when I ran the application was ...

 Line 1 formatted correctly Line 2 formatted correctly Line 3 formatted correctly Invalid format on line 4 (Entertainment Bowling -30.4393 12/28/2011) Invalid format on line 5 (Restaurant Mcdonalds -10.35 12/28/11) 


EDIT 12/29/2011 around 10:00
Not sure if this is due to a performance problem or not, but just like FYI, I duplicated the entries in the "transaction.txt" file several times to create a text file with approximately 1.3 million lines in it, and I was able to go through through the entire file in 7 seconds on my pc. I modified System.out to simply show the total number at the end of invalid (524,288) and valid (786,432) formatted entries. "transaction.txt" was about 85 MB in size.

+4
source

Source: https://habr.com/ru/post/1388350/


All Articles