Julia: read a lot of files in the working directory

Question

Julia: read a lot of files in the working directory

I just started to study Julia, and I want to read a lot of csv files in my directory. How can i do this?

There are files in my directory below and I want to read all files from trip_data_1 to trip_data_12.

"trip_data_1.csv" "Trip_data_10.csv" "Trip_data_11.csv" "Trip_data_12.csv" Trip_data_2.csv "Trip_data_3.csv" Trip_data_4.csv "Trip_data_5.csv" Trip_data_6.csv "Trip "Trip_data_8.csv" "Trip_data_9.csv" "Trip_fare_1.csv" "Trip_fare_10.csv" "Trip_fare_11.csv" "Trip_fare_12.csv" "Trip_fare_2.csv" Trip_fare_3.csv "Trip_fare_4.csv" Trip_f "Trip_fare_6.csv" "Trip_fare_7.csv" "Trip_fare_8.csv" "trip_fare_9.csv"

Here is what I tried:

using DataFrames df = readtable(filter!(r"^trip_data", readdir()))

But I get a MethodError: no method matches readtable (:: Array {String, 1})

+5

julia-lang

Fisseha berhane Mar 01 '17 at 3:16

source share

4 answers

avysk · Answer 1 · 2017-03-01T08:56:07+0000

You can do it as follows:

 reduce(vcat, map(readtable, filter(r"^trip_data", readdir())))

Here map applies readtable to each file name matching filter (you don't need filter! ) And combines all the resulting data files ( vcat ).

The same can be written using mapreduce :

 mapreduce(readtable, vcat, filter(r"^trip_data", readdir()))

Alexander Morley · Answer 2 · 2017-03-01T09:45:40+0000

I am a big fan of syntax . broadcast ing in this type of situation.

those. df = readtable.(filter(r"^trip_data", readdir())) will give you an array of data frames (@avysk correctly, that you probably want filter not filter!

If you need one frame of data, then the mapreduce option is good.

Or you can: vcat(readtable.(filter(r"^trip_data", readdir()))

NB: all these are general solutions to the problem, I have a function (method) that applies f to x , and now I want to apply it to many instances or an array from x

So, if you get another error indicating that you cannot apply the function directly to any array or collection, but you can use one element, then map , broadcast / . and a list of concepts are your friends

Dan getz · Answer 3 · 2017-03-02T10:26:37+0000

Another method (which moves the concatenation to the input String level instead of the DataFrame level) and uses the Iterators package:

readtable(IOBuffer(join(chain([drop((l for l in readlines(fn)),i>1?1:0) for (i,fn) in enumerate(filter!(r"^trip_data", readdir()))]...))))

This can actually save time and money (in my example with a pet), but it depends on the parameters of the input files.

joanwa · Answer 4 · 2018-03-15T14:30:09+0000

You can also make it simple

 files = filter(r".csv$", readdir(path)) df = vcat([readtable(f) for f in files])

and as a continuation, I did the same with julia CSV.read(file) , and it is much slower. This is actually not part of the reading, but part of the search:

 source = CSV.Source(file) CSV.Read(source)

Julia: read a lot of files in the working directory

More articles: