Julia: read a lot of files in the working directory

I just started to study Julia, and I want to read a lot of csv files in my directory. How can i do this?

There are files in my directory below and I want to read all files from trip_data_1 to trip_data_12.

"trip_data_1.csv" "Trip_data_10.csv" "Trip_data_11.csv" "Trip_data_12.csv" Trip_data_2.csv "Trip_data_3.csv" Trip_data_4.csv "Trip_data_5.csv" Trip_data_6.csv "Trip "Trip_data_8.csv" "Trip_data_9.csv" "Trip_fare_1.csv" "Trip_fare_10.csv" "Trip_fare_11.csv" "Trip_fare_12.csv" "Trip_fare_2.csv" Trip_fare_3.csv "Trip_fare_4.csv" Trip_f "Trip_fare_6.csv" "Trip_fare_7.csv" "Trip_fare_8.csv" "trip_fare_9.csv"

Here is what I tried:

using DataFrames df = readtable(filter!(r"^trip_data", readdir())) 

But I get a MethodError: no method matches readtable (:: Array {String, 1})

+5
source share
4 answers

You can do it as follows:

 reduce(vcat, map(readtable, filter(r"^trip_data", readdir()))) 

Here map applies readtable to each file name matching filter (you don't need filter! ) And combines all the resulting data files ( vcat ).

The same can be written using mapreduce :

 mapreduce(readtable, vcat, filter(r"^trip_data", readdir())) 
+5
source

I am a big fan of syntax . broadcast ing in this type of situation.

those. df = readtable.(filter(r"^trip_data", readdir())) will give you an array of data frames (@avysk correctly, that you probably want filter not filter!

If you need one frame of data, then the mapreduce option is good.

Or you can: vcat(readtable.(filter(r"^trip_data", readdir()))

NB: all these are general solutions to the problem, I have a function (method) that applies f to x , and now I want to apply it to many instances or an array from x

So, if you get another error indicating that you cannot apply the function directly to any array or collection, but you can use one element, then map , broadcast / . and a list of concepts are your friends

+5
source

Another method (which moves the concatenation to the input String level instead of the DataFrame level) and uses the Iterators package:

readtable(IOBuffer(join(chain([drop((l for l in readlines(fn)),i>1?1:0) for (i,fn) in enumerate(filter!(r"^trip_data", readdir()))]...))))

This can actually save time and money (in my example with a pet), but it depends on the parameters of the input files.

+3
source

You can also make it simple

 files = filter(r".csv$", readdir(path)) df = vcat([readtable(f) for f in files]) 

and as a continuation, I did the same with julia CSV.read(file) , and it is much slower. This is actually not part of the reading, but part of the search:

 source = CSV.Source(file) CSV.Read(source) 
0
source

Source: https://habr.com/ru/post/1264862/


All Articles