Parse csv, do not split in single or double quotes

I am trying to parse csv using java and have the following problem: the second column is a String (which may also contain a comma), enclosed in double quotes, except that the string itself contains a double quote, then the whole string is attached with one quote. eg.

The lines may look like this:

someStuff,"hello", someStuff
someStuff,"hello, SO", someStuff
someStuff,'say "hello, world"', someStuff
someStuff,'say "hello, world', someStuff

someStuff are placeholders for other elements, which may also include quotes in the same style.

I am looking for a general way to separate strings with UNLESS commas, enclosed in single double quotes OR, to get the second column as String. With the second column, I mean the fields:

  • Hello
  • hi SO
  • say hello world
  • say hello world

OpenCSV, , :

public class CSVDemo {

public static void main(String[] args) throws IOException {
    CSVDemo demo = new CSVDemo();
    demo.process("input.csv");
}

public void process(String fileName) throws IOException {
    String file = this.getClass().getClassLoader().getResource(fileName)
            .getFile();
    CSVReader reader = new CSVReader(new FileReader(file));
    String[] nextLine;
    while ((nextLine = reader.readNext()) != null) {
        System.out.println(nextLine[0] + " | " + nextLine[1] + " | "
                + nextLine[2]);
    }
}

}

opencsv , , :

someStuff | hello |  someStuff
someStuff | hello, SO |  someStuff
someStuff | 'say "hello, world"' |  someStuff
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
+4
5

CSV, . , , , , , , , .

public void test() {
    String[] tests = {"numeStuff,\"hello\", someStuff, someStuff",
        "numeStuff,\"hello, SO\", someStuff, someStuff",
        "numeStuff,'say \"hello, world\"', someStuff, someStuff"
    };
    /* Matches a field and a potentially empty separator.
     *
     *  ( - Field Group
     *     \"  - Start with a quote
     *     [^\"]*? - Non-greedy match on anything that is not a quote
     *     \" - End with a quote
     *   | - Or
     *     '  - Start with a strop
     *     [^']*? - Non-greedy match on anything that is not a strop
     *     ' - End with a strop
     *   | - Or
     *    [^\"'] - Not starting with a quote or strop
     *    [^,$]*? - Non-greedy match on anything that is not a comma or end-of-line
     *  ) - End field group
     *  ( - Separator group
     *   [,$] - Comma separator or end of line
     *  ) - End separator group
     */
    Pattern p = Pattern.compile("(\"[^\"]*?\"|'[^\']*?\'|[^\"'][^,\r\n]*?)([,\r\n]|$)");
    for (String t : tests) {
        System.out.println("Matching: " + t);
        Matcher m = p.matcher(t);
        while (m.find()) {
            System.out.println(m.group(1));
        }
    }
}
+2

, opencsv . com.opencsv.CSVParser . , , , .

class MyCSVParser extends CSVParser{
    @Override
    private String[] parseLine(String nextLine, boolean multi) throws IOException{
        //Your algorithm here
    }
}
+1

," ,' ( , ).

, (, singleQuoteOpen, doubleQuoteOpen) true, , , -.

, reset .

, ( "-" ) char ( , ).


: , , ( , ).

+1

, opencv . , , : fooobar.com/questions/139915/...

, , notInsideComma " ". , .

public static ArrayList<String> customSplitSpecific(String s)
{
    ArrayList<String> words = new ArrayList<String>();
    boolean notInsideComma = true;
    int start =0, end=0;
    for(int i=0; i<s.length()-1; i++)
    {
        if(s.charAt(i)==',' && notInsideComma)
        {
            words.add(s.substring(start,i));
            start = i+1;                
        }   
        else if(s.charAt(i)=='"')
        notInsideComma=!notInsideComma;
    }
    words.add(s.substring(start));
    return words;
}   
0

If the use of single and double quotes is consistent on each line, you can select the appropriate type of quote per line:

public class CSVDemo {
    public static void main(String[] args) throws IOException {
        CSVDemo demo = new CSVDemo();
        demo.process("input.csv");
    }

    public void process(String fileName) throws IOException {
        String file = this.getClass().getClassLoader().getResource(fileName)
                .getFile();

        CSVParser doubleParser = new CSVParser(',', '"');
        CSVParser singleParser = new CSVParser(',', '\'');

        String[] nextLine;

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String line;
            while ((line = br.readLine()) != null) {
                if (line.contains(",'") && line.contains("',")) {
                    nextLine = singleParser.parseLine(line);
                } else {
                    nextLine = doubleParser.parseLine(line);
                }

                System.out.println(nextLine[0] + " | " + nextLine[1] + " | "
                        + nextLine[2]);
            }
        }
    }
}
0
source

Source: https://habr.com/ru/post/1620152/


All Articles