What is the fastest way to get csv file sizes in java

My regular routine is when you start the task of getting the csv file size as follows:

  • Get how many lines he has:

I use a while loop to read every line and count every successful read. The disadvantages are that it takes time to read the entire file to calculate how many lines it has.

  1. then get the number of columns: I use String[] temp = lineOfText.split(",");and then I take the tempo size.

Is there a smarter method? For example:
file1 = read.csv;
xDimention = file1.xDimention;
yDimention = file1.yDimention;

+1
source share
7 answers

( ) , ( ).

CSV, , univocity-parsers.

uniVocity CSV, . 150 , 1,2 :

// Let create our own RowProcessor to analyze the rows
static class CsvDimension extends AbstractRowProcessor {

    int lastColumn = -1;
    long rowCount = 0;

    @Override
    public void rowProcessed(String[] row, ParsingContext context) {
        rowCount++;
        if (lastColumn < row.length) {
            lastColumn = row.length;
        }
    }
}

public static void main(String... args) throws FileNotFoundException {
     // let measure the time roughly
    long start = System.currentTimeMillis();

    //Creates an instance of our own custom RowProcessor, defined above.
    CsvDimension myDimensionProcessor = new CsvDimension();

    CsvParserSettings settings = new CsvParserSettings();

    //This tells the parser that no row should have more than 2,000,000 columns
    settings.setMaxColumns(2000000);

    //Here you can select the column indexes you are interested in reading.
    //The parser will return values for the columns you selected, in the order you defined
    //By selecting no indexes here, no String objects will be created
    settings.selectIndexes(/*nothing here*/);

    //When you select indexes, the columns are reordered so they come in the order you defined.
    //By disabling column reordering, you will get the original row, with nulls in the columns you didn't select
    settings.setColumnReorderingEnabled(false);

    //We instruct the parser to send all rows parsed to your custom RowProcessor. 
    settings.setRowProcessor(myDimensionProcessor);

    //Finally, we create a parser
    CsvParser parser = new CsvParser(settings);

    //And parse! All rows are sent to your custom RowProcessor (CsvDimension)
    //I'm using a 150MB CSV file with 1.3 million rows. 
    parser.parse(new FileReader(new File("c:/tmp/worldcitiespop.txt")));

    //Nothing else to do. The parser closes the input and does everything for you safely. Let just get the results:
    System.out.println("Columns: " + myDimensionProcessor.lastColumn);
    System.out.println("Rows: " + myDimensionProcessor.rowCount);
    System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");

}

:

Columns: 7
Rows: 3173959
Time taken: 1279 ms

: . ( Apache V2.0).

+1

, , , .

, ( ), , .

, , .

, , , . "hello, world", 4, 5 3 , 4.

+3

, . . cols . split , "," . , , @Vlad.

String.split , .

0

, , , . :

  • , , String Object , String.indexOf, .
  • line.split indexOf
0

, : fooobar.com/questions/37181/...

LineNumberReader  lnr = new LineNumberReader(new FileReader(new File("File1")));
lnr.skip(Long.MAX_VALUE);
System.out.println(lnr.getLineNumber() + 1); //Add 1 because line index starts at 0
lnr.close();
0
source

My solution simply and correctly handles CSV with multi-line cells or quotation marks.

For example, we have a csv file:

1,"""2""","""111,222""","""234;222""","""""","1
2
3"
2,"""2""","""111,222""","""234;222""","""""","2
3"
3,"""5""","""1112""","""10;2""","""""","1
2"

And my piece of solution:

import java.io.*;

public class CsvDimension {

    public void parse(Reader reader) throws IOException {
        long cells = 0;
        int lines = 0;
        int c;
        boolean qouted = false;
        while ((c = reader.read()) != -1) {
            if (c == '"') {
                 qouted = !qouted;
            }
            if (!qouted) {
                if (c == '\n') {
                    lines++;
                    cells++;
                }
                if (c == ',') {
                    cells++;
                }
            }
        }
        System.out.printf("lines : %d\n cells %d\n cols: %d\n", lines, cells, cells / lines);
        reader.close();
    }

    public static void main(String args[]) throws IOException {
        new CsvDimension().parse(new BufferedReader(new FileReader(new File("test.csv"))));
    }
}
0
source

Source: https://habr.com/ru/post/1629362/


All Articles