How can I write and read a DataFrame that contains a datetime column in Julia

2nd UPDATE: Confirmed as @Matt B user error. See his answer below for more details.

UPDATE: @waTeim demonstrated that you can write and read a DataFrame that contains a column of type date (confirmed in my setup). This is important because it means that Julia can write and read some of the composite types that are in the column of the data frame. However, the case of the datetime type (which is different from the type date) still causes an error, so at this point the question remains unanswered.

In Julia, using the HDF5 and JLD packages, you can save and load DataFrames into a .jld file using, for example:

#Preamble using HDF, JLD, DataFrames filePath = "/home/colin/Test.jld"; #Save the data-frame fid1 = jldopen(FP, "w"); write(fid1, "MyDataFrame", MyDataFrame); close(fid1); #Come back later and load the data-frame fid1 = jldopen(FP, "r"); X = read(fid1, "MyDataFrame"); close(fid1); 

This works well if the columns of the data frame are all vectors of the underlying Julia type of type Float64 or Int64 . However, in practice, we often want the first column of a data frame to be a datetime , which is not a base type (although it may become one of future releases). In this situation, the code above does not work for me in the read operation with a long error message (I will add it below if someone asks for comments). Following the documentation for the JLD package, I tried the following while saving:

 #Save the data-frame fid1 = jldopen(FP, "w"); addrequire(fid1, "/home/colin/.julia/v0.2/DataFrames/src/dataframe.jl") addrequire(fid1, "/home/colin/.julia/v0.2/Datetime/src/Datetime.jl") write(fid1, "MyDataFrame", MyDataFrame); close(fid1); 

but it did not help.

Am I doing something stupid, or is this feature just not available?

Note. The HDF5 tag is included because the JLD package uses it.

+6
source share
2 answers

As I noted in my comment above, this behavior is a bug that has now been fixed . Until version 0.2.26 is checked, you can use Pkg.checkout("HDF5") to get this fix.


But in order to make this a little more answer, I will talk about the problem a little more and give a potential workaround. Both Date and DateTime types are bittypes with very similar definitions . Saving and loading bittypes in the HDF5.jl package is a relatively new feature; it was supported only last month (marked as versions 0.2.24 and 0.2.25).

In these versions there is an error in which the type names of bit files are not saved along with their module name (as a fully-functional type-name). You can see this very clearly in the difference between import and using :

 julia> using HDF5, JLD # version 0.2.25 julia> import Datetime julia> save("today.jld","t",Datetime.today()) # today() returns a `Datetime.Date` julia> load("today.jld") # But it was saved as just a `Date`, not a `Datetime.Date` # so HDF5 cannot find the definition HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0: #000: H5Dio.c line 182 in H5Dread(): can't read data … # backtrace truncated julia> using Datetime # Bring `Date` into the `Main` namespace julia> load("today.jld") # now it works! Dict{Union(UTF8String,ASCIIString),Any} with 1 entry: "t" => 2014-07-25 

So, when you go to save the DateTime object, it is parameterized by both a Calendar and the Offset time zone. But Offset types are not exported from the Datetime package ... there are so many! However, most DateTimes just use Zone0 : UTC. Therefore, if you have DateTime data saved with versions HDF5.jl 0.2.24-25, you can manually restore it by "exporting" these types to the main namespace.

 julia> save("now.jld","n",now()) julia> load("now.jld") HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0: #000: H5Dio.c line 182 in H5Dread(): can't read data … # truncated julia> const Zone0 = Datetime.Zone0; julia> load("now.jld") Dict{Union(UTF8String,ASCIIString),Any} with 1 entry: "n" => 2014-07-25T13:45:45 UTC 
+2
source

When there is no HDF5 support for a specific Julia data type, this error can be expected. In this case, it was not specifically DataFrames using Datetime, but the lack of support for the Datetime type itself. Apparently when the library cannot load the type for some reason ( see here and here for other examples too ). The exact cause and correction was different for each type, but the error message led to quick fixes (see below).

Error

 HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0: #000: H5Dio.c line 182 in H5Dread(): can't read data major: Dataset minor: Read failed #001: H5Dio.c line 438 in H5D__read(): unable to set up type info major: Dataset minor: Unable to initialize object #002: H5Dio.c line 939 in H5D__typeinfo_init(): unable to convert between src and dest datatype major: Dataset minor: Feature is unsupported #003: H5T.c line 4525 in H5T_path_find(): no appropriate function for conversion path major: Datatype minor: Unable to initialize object 

Historical

Version 0.2.25

I would advise you to upgrade to Julia 0.3 as candidate status for the release and update the package repository. My setup is different; I use different versions of HDF5, JLD, DataFrames and Datetime. But, as said, the two significant changes that I made were to simply specify the module name instead of the file name when calling addrequire , and also use @read and @write , and not the corresponding functions, since the latter seem to be errors.

 Version 0.3.0-rc1+4263 (2014-07-19 02:59 UTC) Pkg.status() - DataFrames 0.5.7 - HDF5 0.2.25 - Datetime 0.1.6 

Create a data file

 using HDF5,JLD,DataFrames,Datetime testFile = jldopen("test.jld","w") addrequire(testFile,"DataFrames") addrequire(testFile,"Datetime") df = DataFrame() df[:column1] = today() @write testFile df close(testFile) 

Rebooting Julia and reading ....

 julia> using HDF5,JLD,DataFrames,Datetime julia> testFile = jldopen("test.jld","r") Julia data file version 0.0.2: test.jld julia> @read testFile df 1x1 DataFrame |-------|------------| | Row # | column1 | | 1 | 2014-07-19 | julia> df[:column1] 1-element DataArray{Date{ISOCalendar},1}: 2014-07-19 

Version 0.2.25+ (pre-transmission)

In fact, I can confirm that the attempt to store the Datetime was unsuccessful, and using the latter from the repo fixes the problem.

  HDF5 0.2.25+ master 

if the above change only changes by changing today () to now ()

 df[:column1] = now() 

Then the following

 julia> using HDF5,JLD,DataFrames,Datetime julia> testFile = jldopen("test.jld","r") Julia data file version 0.0.2: test.jld julia> @read testFile df 1x1 DataFrame |-------|-------------------------| | Row # | column1 | | 1 | 2014-07-26T03:38:45 UTC | 

But it seems that the same general error message that occurred for Datetime also holds for the complex type, despite this fix .

 c = 1 + im; @write testFile c 

Version 0.2.26

This version of the complex was also supported. Initially, it turned out that the problem was insufficient support for the type complex in general, but rather it was a special problem of complex initialization from 1 + im; instead of 1.0 + im.

 - HDF5 0.2.26 julia> using HDF5, JLD julia> testFile = jldopen("test.jld","r") Julia data file version 0.0.2: test.jld julia> @read testFile c 1 + 1im 
+6
source

Source: https://habr.com/ru/post/972445/


All Articles