Pattern Analysis in Java

I want to parse the lines of a file. Using parsingMethod

test.csv

Frank George,Henry,Mary / New York,123456 ,Beta Charli,"Delta,Delta Echo ", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha 

So I read the line

  public static void main(String[] args) throws Exception { File file = new File("C:\\Users\\test.csv"); BufferedReader reader = new BufferedReader(new FileReader(file)); String line2; while ((line2= reader.readLine()) !=null) { String[] tab = parsingMethod(line2, ","); for (String i : tab) { System.out.println( i ); } } } public static String[] parsingMethod(String line,String parser) { List<String> liste = new LinkedList<String>(); String patternString ="(([^\"][^"+parser+ "]*)|\"([^\"]*)\")" +parser+"?"; Pattern pattern = Pattern.compile(patternString); Matcher matcher =pattern.matcher(line); while (matcher.find()) { if(matcher.group(2) != null){ liste.add(matcher.group(2).replace("\n","").trim()); }else if(matcher.group(3) != null){ liste.add(matcher.group(3).replace("\n","").trim()); } } String[] result = new String[liste.size()]; return liste.toArray(result); } } 

Output:

 Frank George Henry Mary / New York 123456 Beta Charli Delta Delta Echo " 25/11/1964 15/12/1964 40,000,000.00 0.0975 2 King Lincoln " Alpha Delta Delta Echo 

I want to remove this, "Can someone help me improve my pattern.


Expected Result

 Frank George Henry Mary / New York 123456 Beta Charli Delta Delta Echo 25/11/1964 15/12/1964 40,000,000.00 0.0975 2 King Lincoln Alpha Delta Delta Echo 

Output for line 3

 25/11/1964 15/12/1964 40 000 000.00 0.0975 2 King Lincoln 
+4
source share
3 answers

Your code did not compile properly, but this was because some of the " were not escaped.

But this should do the trick:

 String patternString = "(?:^.,|)([^\"]*?|\".*?\")(?:,|$)"; Pattern pattern = Pattern.compile(patternString, Pattern.MULTILINE); 

(?:^.,|) is not an exciting group that matches one character at the beginning of a line

([^\"]*?|\".*?\") - a capture group that either matches everything, but" OR something in between "

(?:,|$) is a non-capturing group that matches the end of a line or comma.

Note: ^ and $ only work as indicated when the pattern is compiled with the Pattern.MULTILINE flag

+2
source

I cannot reproduce your result, but I think you want to leave quotes from the second captured group, for example:

 "(([^\"][^"+parser+ "]*)|\"([^\"]*))\"" +parser+"?" 

Edit: Sorry, this will not work. Perhaps you want to specify any number ^\" in the first group, for example: (([^,\"]*)|\"([^\"]*)\"),?

+1
source

As I see the strings are connected, so try the following:

  public static void main(String[] args) throws Exception { File file = new File("C:\\Users\\test.csv"); BufferedReader reader = new BufferedReader(new FileReader(file)); StringBuilder line = new StringBuilder(); String lineRead; while ((lineRead = reader.readLine()) != null) { line.append(lineRead); } String[] tab = parsingMethod(line.toString()); for (String i : tab) { System.out.println(i); } } public static String[] parsingMethod(String line) { List<String> liste = new LinkedList<String>(); String patternString = "(([^\"][^,]*)|\"([^\"]*)\"),?"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(line); while (matcher.find()) { if (matcher.group(2) != null) { liste.add(matcher.group(2).replace("\n", "").trim()); } else if (matcher.group(3) != null) { liste.add(matcher.group(3).replace("\n", "").trim()); } } String[] result = new String[liste.size()]; return liste.toArray(result); } 

Ouput:

 Frank George Henry Mary / New York 123456 Beta Charli Delta,Delta Echo 25/11/1964 15/12/1964 40,000,000.00 0.0975 2 King, Lincoln Alpha 

like Delta, Delta Echo is in the quote, which should appear on the same line! like king, lincoln

+1
source

Source: https://habr.com/ru/post/1480883/


All Articles